Practical-2
Aim: Perform following Data Pre-processing (Feature
Selection/Elimination) tasks using python
Theory:-
In machine learning and
statistics, feature selection, also known as variable selection, attribute
selection or variable subset selection, is the process of selecting a subset of
relevant features for use in model construction.
Why is it
important(Advantages):-
1. It enables the machine learning algorithm to train faster.
2. It reduces the complexity of a model and makes it easier to interpret.
3. It improves the accuracy
of a model if the right subset is chosen.
4. It reduces overfitting.
5. It is very effecient and fast
to compute.
Disadvantages of Feature Selection:-
1. A feature that is
not useful by itself can be very useful when combined with others. Feature
selection misses it.
Various Data pre-processing techniques:-
DataSet: Diabetes dataset
Url to direct import data: https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv
The dataset is used to predict whether
the patient is having diabetes or not.
Data reduction using variance threshold:-
It removes all features whose
variance doesn't meet some threshold. By default it removes features with zero
variance or features thar have th same value for all samples.
Univariate feature selection:-
Univariate feature selection works by selecting the best features based on univariate statistical tests. it can be seen as a preprocessing step to an estimator. Scikit-learn exposes feature selection routines as objects the implement the transform method.
Comments
Post a Comment