Feature_Importance

#https://github.com/nisaml

NISAM LC Apr 01 2021 · 1 min read
Share this

Feature_Importance_for_Better_Preprocessing

What and Why?

The importance of each feature of a dataset can be established by using feature_importance method.Feature importance provides a score for each feature in a dataset. A higher score means the feature has more importance or relevancy in relation to the output feature.Feature importance is normally an inbuilt class that comes with Tree-Based Classifiers.

How?

Here we are using Extra Tree Classifier to determine he top 5 features in a dataset

Problem-1

To build a classification methodology to determine whether a person defaults the credit card payment for the next month.

##importing required libraries
import pandas as pd
import warnings
warnings.filterwarnings('ignore') from sklearn.ensemble import ExtraTreesClassifier import matplotlib.pyplot as plt
##reading data
credit = pd.read_excel("C:/TASK_LINKED_IN/FINANCE_DATA/default of credit card clients.xls")
credit.head()
##splitting data inot dependent and independent features X = credit.iloc[:,:-1] Y = credit.iloc[:,-1]


##creating a model instance
model = ExtraTreesClassifier()
##finding important features model.fit(X,Y) print(model.feature_importances_)
#plotting top 5 important features feat_importances = pd.Series(model.feature_importances_,index=X.columns) feat_importances.nlargest(5).plot(kind='barh') plt.show()

it shows the most important features related to the output feature. Here BIll_AMT,AGE,LIMIT_BAL,PAY_0 are most important features which defines the output feature, Variations in the bill amount and age makes considerable difference in the tendency to make credit to default one, eg: youngsters showing the tendency to make credit to default

Problem-2

To build a classification methodology to determine whether a customer is placing a fraudulent insurance claim

##importing required libraries
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
#reading data
insurance = pd.read_csv('C:/TASK_LINKED_IN/FINANCE_DATA/insurence_data.csv')
insurance=insurance.drop('Unnamed: 0',axis=1)
insurance.head()
##splitting data inot dependent and independent features
independent = insurance.drop('fraud_reported',axis=1)
dependent = insurance['fraud_reported']

##creating a model instance 
imp=ExtraTreesClassifier()
# finding important features imp.fit(independent,dependent) print(imp.feature_importances_)
##visualizing top 5 features with high relevance feat_with_imp = pd.Series(imp.feature_importances_,index=independent.columns) feat_with_imp.nlargest(5).plot(kind='barh') plt.show()

Here it plots the 5 features with high relevancy towards the output feature. Through these Examples we can see that it simply shows the importance of features with the help of a score.

 Thank You 😍

Comments
Read next