Few Important Feature Selection Techniques(Part-1)

SWATI SINHA Oct 05 2020 · 6 min read
Share this
What is Feature Selection ?

Sometimes there are many data in the datasets which doesn't contribute to the output variable.so, through the feature selection method we can easily select those feature/features which contributes most to the result variable..

Importance of Feature-Selection Techniques.

1)Feature Selection is the process where you automatically or manually select those features which contribute most to your prediction variable or output in which you are interested in.

2)Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features.

3)Through Feature Selection the accuracy of many Machine Learning Model increases..

4)Not only the accuracy but as the feature/features got reduced so, the computation power,complexity and time also decreases..So,It is a good approach before training any datasets.

Few Important Feature Selection Technique : -

Through Feature Selection we can easily understand which feature/features took important role to predict the target.

For an example , in case of medical problem, we easily can understood which parameter(s)/feature(s) is to be verified to diagnosis the patient condition! However, other non-important parameter(s) can be ignored easily..

Here, I have taken a popular datasets of Breast Cancer from sklearn library..

There are many statistical method to understand the main features which will take important role in this context, but I will show few of them.

Which method to choose is depends upon the characteristics of the datasets only..

Let's see the datasets we are going to use.....

This popular Breast Cancer Data set is having 30 features and one target variable. This is basically a binary classification problem...

Here,I will implement the following methods :----

1) Applying Pearson correlation, finding values of the coefficients and hence importance of the features.

2) Removing of all constants or quasi constants.

3)RFE method with Gradient Boosting.

4)Boruta feature selection(wrapper method).

In the next part (part-2) I will enforcement to the following features...

5)LASSO(Least Absolute Shrinkage and Selection Operation.

6)Random Forest Classifier Selection Method(select from model).

7)PCA Method(Principal Component Analysis)

8)Chi-square Feature Selection method.

Now let’s go through each model with the help of the datasets that you can download from here.

Importing some Libraries.....
Loading the datasets and exploring the features of the datasets....

This is a classification problem where, 0 denoting->Malignant and 1 denoting->Benign ,lets count..

Using,count-plot we have seen the Target data distribution for 'Malignant' and 'Benign' individually and more clearly.
Used isnull() function we make sure that there are no Null values in the dataset.
Imported some another needed libraries...

Let's take our four classifiers as:

1)rfc = Random Forest , 2)gbc = Gradient Boosting , 3)dtr = Decision Tree  &

4)lgr = Logistic Regression.

Making function :-
Here, a function is created called 'modelTrain'  that could run any classifier, that could predict the test datasets and will find the accuracy immediately.
A for loop is being used to find out at which random state the accuracy will be maximum.

OUTPUT of the for loop

First 13 random state has shown..
After getting all the values it is clear at which point the accuracy is highest.So, I will set the random state to 17 for maximum accuracy 97.37%.
Decision Tree Classifier has applied.
 Gradient Boosting Ensembled method with tuned random state has shown.
Random Forest Classifier with tuned random state has shown.
Logistic Regression Classifier with tuned random state has shown.

Feature Selection:-


Applying Pearson correlation, finding values of the coefficients and hence importance of the features.

Here in 'df1' we have inserted the correlation value with respect to the Target variable and Printed the 'df1' through 'df2'..
The Pearson Correlation of all the features with respect the 'Target' (output) Variable.
On the basis of the correlations some best features (which have correlate value more than -0.6 ) has selected.
Datasets with selected features...
Here is the dataset with the above important features selecting from pearson correlation
Correlation Co-efficient Heatmap..
These are the selected features from Pearson Co-relation matrix
Accuracy from selected Features with rfc (Random Forest Classifier) and gbc(Gradient Boosting Classifier) has shown...

As we can see at Random State 14 the accuracy is highest "99.12%" .. so, we will train our model at 14 Random State..

After tuning random state I'm getting much better result from selected features with Random Forest Classifier

Now let's see with Gradient Boosting Classifier..

Here at 6th random state we got the best result "97.37%"..
Trained our model with 6th random state from Gradient Boosting Classifier....

Method -2.

Now we will remove the Constant & Quasi Constants for all the 30 features if exists any.

Quasi constants are the features that are almost constants.we can say that , the features that have

same values for a large subset of the outputs and also doesn't have a very large impact on the output.

By using a function called 'const_removal()',we removed '16' features out of 30..

Now , we will also check if there any duplicate or repeated same features are exist or not , if found remove it.

So, we have made a function called 'dupRemove()' to remove the duplicate values and impleamented it,but didn't found any duplicate values so far. So, we have 14 features now as xnow..
Displayed all the remaining features


RFE method with Gradient Boosting.

 Here,we  took 14 previously selected features from xnow , to find out the best and important features.

So, only 8 features are found best and important where, 'Accuracy : 98.25%' (maximum) with Random Forest Classifier..

Used a function called 'model_selected()' to check those 8 features and displayed them.
From the function ('model_selected()' ) , we displayed PDF or importances to each of the feature..
Heatmap of PDF of importances of each feature for better and visual understanding..

Now, using rfc verify accuracy with random state 17.

So, finally at random state 17 we got maximum accuracy 98.25%.


Boruta feature selection(wrapper method).

what is wrappeer Method?

wrapper Method is a process which doesn't need any statistical way to select features. 

Wrapper Method basically,follws three mechaniques to select the best features from the dataset. Those are,

1)Forward Selection == Forward Selection is an iterative method in which we start with no feature in the model. After each  iteration we keep adding the feature which best improves our model till an addition of a new variable doesn't improve the performance of the model.

For example, suppose in a dataset you have five features('A' ,'B' , 'C' ,'D' & 'E') as independent features and 'target' column as dependent or output feature.First, we willtrain our model with feature 'A' only , after with feature 'A' completely we will check the accuracy(let's say it is 'Accuracy 1'). In the next iteration , we will add the next feature(here,'B')train it and again get a new accuracy(let's say 'Accuracy 2'). Now, if 'Accuracy 2' is better than 'Accuracy 1' then we will consider adding this particular feature.  And thus, slowly slowly we will add all the features one by one but, if at any point adding one feature will not give us better result or decreases the accuracy we will skip that particular feature..

2.Backward Elimination==  In Backward Elimination, we start with all the features & removes the least significant feature at each iteration which improves the accuracy.we repeat this untill no improvement is observed on removal of features.

Let's Suppose, like previously we have the same dataset (with five independent features,'A','B', 'C' ,'D' & 'E' and a dependent feature'target'column).So, using Backward Elimination , first we will take all the features and then train the model with any statistical tests and find out the least impact features and skip those features.

3.Recursive feature elimination==It is a greedy Optimization algorithm which aims to find the best performing feature subset. It repeatedly create models by doing permutation_combination with all the features and keeps aside the best or the worst performing features at each iteration.After all the features has been exhausted it put the features based on the order of their elimination and selects the best features from the dataset.

*Disadvantage-- These all the wrapper method techniquies can only be used when the dataset is very small and also to perform these techniques we need high computation power, cost and time.So, for practicle examples these are not possible. 

So, we do wrapper Method by using Boruta package for best feature selections..

What is Boruta ? And how it works?

Boruta is an all relevant feature selection wrapper algorithm, capable of working with any classification method that output variable importance measure (VIM); by default, Boruta uses Random Forest. The method performs a top-down search for relevant features by comparing original attributes' importance with importance achievable at random, estimated using their permuted copies, and progressively eliminating irrelevant features to stabilise that test.

First it adds randomness to the given dataset by creating shuffled copies of all the features and train those datasets and  at every iteration check wheather the features have higher importance or not ,and on the basis of that selects best features.

Installed boruta using this command.
Created a function called 'Boruta_selaction()' for selecting features by Boruta.
After 10 iteration 23 confirmed features obtained as Important features out of 30! 
Selected 23 features..
"df4" is new dataset with Selected 23 features .
so, we are getting same result with 23 main features from 'Boruta' method.

Continue reading the part 2 of this article here.....

Read next