### what is Regression Model?

Regression models basically used in Machine Learning for continuous Datasets. In Regression models output has no limit like 0 or 1(True or False) it can be any constant value. In regression model basically , we get the relationship between two or more variable and compute one variable with respect to the others , it means in Regression models we have some **Independent** Variables through which we find out the **dependent** or the output variable by evaluating the relationship between them. For example , In case of housing datasets , we have many independent variables through which we implement the output or the dependent variable by establishing the relationship between them through regression models.

In this particular article , I am gonna discuss end-to-end **Regression Model** description about famous "Boston-Housing-Datasets" .

you can download the dataset by this command from sklearn directly....

** load_boston()**

Importing necessary Libraries....

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
```

Loading the Datasets...

```
from sklearn.datasets import load_boston
boston = load_boston()
bos = pd.DataFrame(boston.data)
print("shape is:",bos.shape)bos.head(2)
```

`bs = boston['data']`

The features are..

```
feature = boston['feature_names']
feature
```

Feature description..

**EDA & FEATURE ENGINEERING**

```
dict= {'CRIM':['per capita crime rate by town'],
'ZN':['proportion of residential land over 25,000 sq.ft.'],
'INDUS':['proportion of non-retail business acres per town'],
'CHAS':['Charles River dummy variable'],
'NOX':['nitric oxides concentration per 10 million'],
'RM':['average number of rooms per dwelling'],
'AGE':['proportion of owner-occupied prior to 1940'],
'DIS':['distances to five Boston employment centres'],
'RAD':['index of accessibility to radial highways'],
'TAX':['full-value property-tax rate per $10,000'],
'PTRATIO':['pupil-teacher ratio by town'],
'B':['1000(Bk - 0.63)^2,Bk is the proportion of blacks'],
'LSTAT':['% lower status of the population']}
dict
```

```
des = pd.DataFrame(dict,index=['Description'])
print('\nDescription of the features:')
des.T
```

Metadata of the independent columns for doing the further operation..

```
df = pd.DataFrame(bs,columns=feature)
df.head(2)
```

`target = boston['target']`

y = pd.DataFrame(target,columns=['SalePrice'])

**Pearson correlation of all the datasets**

```
data = pd.concat([y,df],1).copy()
corr = data.corr()
plt.figure(figsize=(18,12))
sns.heatmap(corr,annot=True,fmt='.2g',linewidths=0.5,cmap='viridis')
plt.show()
```

**Pearson correlation of all the columns with respect to the output ('SalePrice') column.**

```
imp = corr[['SalePrice']]
imp = imp.sort_values(by='SalePrice',ascending=False)
print("\n Feature Importancee:")
plt.figure(figsize=(18,2))
sns.heatmap(imp.T,annot=True,fmt='.3g',linewidths=0.5,cbar=False,cmap='viridis')
plt.show()
```

#### Making Scatter-Plot, Box-Plot & Distribution-plot Simultaneously for comparison using a for loop to show all the plots together.

```
for i in range(len(data.columns)):
a = data[data.columns[i]]
b = data['SalePrice']
f,(ax1,ax2,ax3)=plt.subplots(1,3,figsize=(20,4))
sns.scatterplot(a,b,ax=ax1,color='g')
sns.boxplot(a,ax=ax2,color='m')
sns.distplot(a,ax=ax3,color='r')
```

Here, we can understand from the graphs(marked) that some datasets have outliers and the 'LSTAT' datasets also polynomial behavior.So, in the 'col' list I have put those datasets together along with 'SalePrice'.

`col = ['SalePrice','RM','AGE','LSTAT']`

For removing some outliers Z-score transformation method I have made a function 'z_score()'.

```
def Z_score(data,left,right):
index= []
mean = data.mean()
std = data.std()
for i in range(len(data)):
z = (data[i]-mean)/std
if (z >= -left) and (z <= right):
index.append(i)
else:
pass
return index
```

```
clean_index = Z_score(y.SalePrice,2.5,2.4)
print("Data to be cleaned:{}".format(len(y) - len(clean_index)))
```

Data to be cleaned : 23

**Cleaning datasets.**

```
Df = data.iloc[clean_index].reset_index(drop=True).copy()
Df.shape
```

Now data has been cleaned and the shape is (483,14).

[Note :- As I have put all the cleaned data in 'Df' , So , it is always a good practice to delete the extra data frames ('y' ,' df' , 'data') to free up the space of the memory ].

`del y,df,data`

**Separating X-data and Y-data.**

```
y = Df['SalePrice'].copy()
X = Df.drop(['SalePrice'],1).copy()
```

**Feature selection With LASSO transformation.**

```
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.50,max_iter=2000)
lasso.fit(X,y)
```

```
coeff_values = pd.DataFrame({'coeff':lasso.coef_},index=X.columns).sort_values(by='coeff')
c = (abs(coeff_values.coeff) == 0)
col_non_imp = list(X.columns[c])
print("Length of the non-important columns are :",len(col_non_imp))
print('\n')
print('Non-important columns are :',col_non_imp)
```

**We got the Non-important columns are 'DIS' & 'RAD' . Drop those columns.**

`X.drop(col_non_imp,1,inplace=True)`

`y = pd.DataFrame(y)`

```
#Making log transformation of the features, Except 'CHAS'
feat = [col for col in X.columns if col != 'CHAS']
feat
```

** Replacing Zero by-> One, otherwise logarithm will give error ,for that reason I did not taken 'CHAS'.

```
for col in feat:
X[col] = X[col].replace({0:1})
```

```
for col in feat:
X[col] = X[col].apply(lambda x : np.log(x))
```

From the above distribution plots I can say that 'CHAS' & 'B' features have many outliers . So, for better performance we can remove those features.

`X.drop(['CHAS','B'],1,inplace=True)`

**Scaling the data sets to standard Normal distribution.**

```
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x = scaler.fit_transform(X).copy()
x.shape,y.shape
x = pd.DataFrame(x,columns=X.columns)
```

`x.to_csv('X.csv',index=False)`

y.to_csv('y.csv',index=False)

`del x,y`

[ Deleted 'x' and 'y' to free up the memory space..]

**Importing scaled CSV files.**

```
X = pd.read_csv('X.csv')
y = pd.read_csv('y.csv')
```

**shape** of **X** csv file is **(483,9)** and **y** csv file is **(483,1).**

### Making Regression Model.

It is found that , if some datasets have some **curvature **in nature ,then linear model for perfect straight line would not give you the **best result **, in that case **polynomial regression** of degree two or three or more will give better result with some regularisation factor lambda, which can regulate or shrink the coefficients of higher degrees of features that generated, otherwise we can say that by **regularization factor(lambda)** we can regulate** overfitting or under-fitting** of the model.

I have seen from **EDA** that, this datasets have some curve in nature so, I will go for polynomial regression method of **degree 2.**

**Importing Polynomial-features.**

```
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2,interaction_only=False)
x = poly.fit_transform(X)
x = pd.DataFrame(x)
x.shape
```

shape of the 'x' is (483 , 55).

So, after adding higher degrees of 2, the number of features are increased to 55.

The equation should be like this :-

Splitting the datasets into train sets and test sets.

```
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
xtrain=np.array(X_train)
xtest=np.array(X_test)
ytrain=y_train
ytest=y_test
xtrain.shape,xtest.shape
```

shape of the xtrain is (386 , 55) and xtest is (97,55).

### Method-1.

### Gradient Descent

In this case I have written the equation for Gradient Descent , where the cost(MSE) of the training datasets will gradually decrease with no. of iteration.

**Gradient descent graph....**

Here is my cost function without any regularisation factor :-

However, the cost function with regularisation factor would be like this:-

Equation for Gradient Descent :-

Simultaneously update theta with no. of iteration.

In this equation learning rate(alpha) is given..

I have used first equation for calculating cost to see how it works without regularisation factor.

If you see my equation(in my code) I also added momentum with the learning rate.

```
def GradDiscent(xtrain,ytrain,xtest,ytest):
m=len(xtrain)
n=len(xtrain.T)
p = int(input('Enter no. of Iteration:'))
alpha = float(input('Enter Learning rate(alpha):'))
mom = float(input('Enter momentum :'))
theta=pd.DataFrame(np.zeros([n,p]))
f,(ax1,ax2) = plt.subplots(1,2,figsize=(20,7))
# here is the equation
J=np.zeros(p+1)
J1=np.zeros(p+1)
for i in range(1,p+1):
# momentum is added with learning rate alpha
alpha = alpha*mom
for j in range(m):
theta[i]=theta[i-1] - (sum(np.dot(xtrain,theta[i-1]) -ytrain)/m)*alpha*xtrain[j]#calculating thetas
J[i]=(1/(2*m))*sum((ytrain-np.dot(xtrain,theta[i]))**2) #calculating costs for train data
J1[i]=(1/(2*m))*sum((ytest-np.dot(xtest,theta[i]))**2) #calculating costs for test data
Theta = pd.DataFrame(np.array(theta[p]),columns='Theta'.split(),index=x.columns)#DataFrame for final theta
#plotting cost-vs-iteration
ax1.plot(np.arange(1,p),J[2:],color='r')
ax2.plot(np.arange(1,p),J1[2:],color='g')
ax1.set_xlabel('No. of Iteration')
ax1.set_ylabel('MSE, Cost for Train Data')
ax1.set_title('Gradient descent For Train Data Set')
ax2.set_xlabel('No. of Iteration')
ax2.set_ylabel('MSE, Cost for Test Data')
ax2.set_title('Gradient descent for Test Data Set')
sns.despine(left=True)
#calculating MSE and RMSE LOSS
print('\nMSE from GD: ',J1[p])
print('RMSE from GD: ',np.sqrt(J1[p]))
print('\n')
return Theta
def Prediction(theta,xtest,y):
thet = theta['Theta']
PtestGD = np.dot(xtest,thet)
p4=np.arange(int(round(y.max())))
y4=np.arange(int(round(y.max())))
#plotting prediction plot
f,(ax1,ax2) = plt.subplots(1,2,figsize=(20,8))
ax1.scatter(PtestGD,ytest,color='g')
ax1.plot(p4,y4,color='y')
ax1.set_xlabel('Predicted Data Set')
ax1.set_ylabel('Original Data Set')
ax1.set_title('Comparisign Graph')
sns.despine(left=True)
sns.distplot((PtestGD-ytest),color='red',bins=10)
ax2.set_xlabel('VALUE')
ax2.set_xlabel('[VALUE - PREDICTED VALUE]')
ax2.set_title('Distribution Plot for Predicted & Test Data')
ax2.grid()
sns.despine(left=True)
```

`theta = GradDiscent(xtrain,ytrain,xtest,ytest)`

**Result...**

`Prediction(theta,xtest,y) `

__conclusion__ :

We can find that,the result is not upto the mark.

Only reason for that is-> I have not added regularisation factor to the cost function,However my regression equation is polynomial of degree 2, I have already taken.

It's not the fault of the equation, cause the equation I applied here is for absolutely linear(straight-line) model but my model is polynomial of degree 2..

So,in this case the model having an overfitting problem. However to balance it , a regularisation factor is necessary to shrink the higher degrees of cofficients but, I have not used it..

For next method I must use the regularisation factor to balance this bias-variance problem.

### Method-2.

### NORMAL EQUATION MODEL

In a single iteration we also can calculate thetas (intercept & coefficients) with matrix multiplication as well as transpose and inverse methods of matrix.

The equation would be like this (for absolute linear model) :

In the above equation regularisation factor is not given..

As my regression model is polynomial type of degree 2 so, I have to apply some regularisation factor which will automatically shrink the coefficients of higher degree features to regularise overfitting of the model.

The normal equation with regularisation factor is like this :

In most of the cases the above two equations are used to calculate directly intercept as well as coefficients of the features, which I have learnt from **Prof. Andrew Ng.**

So, I used the **second** equation with regularisation factor lambda.

The first advantage of this equation is that, up to** 10000** features it works very good

The second advantage is that, it is very fast, and with single iteration it can calculate intercept as well as thetas.

**Using 'normalEq()' function Normal-Equation-Model has implemented.**

```
def normalEq(xtrain,ytrain,xtest,ytest):
m = len(xtrain)
n = len(xtrain.T)
lam = eval(input('Enter Regularisation Factor(Lambda): '))
p=pd.DataFrame(np.eye(n-1).T)
o =pd.DataFrame(np.zeros(n-1))
#making regularisation factor
rg = pd.concat([pd.DataFrame(np.zeros(n-1)),pd.concat([o,p],1).T],1)
#normal equation,directly calculating thetas(intercept & cofficients)
Theta = np.dot(np.dot(np.linalg.inv(np.dot(xtrain.T,xtrain) + np.array(rg)*lam),xtrain.T),ytrain)
Ptest = np.dot(xtest,Theta) # predicting for test data
Ptrain = np.dot(xtrain,Theta) # predicting for train data
from sklearn.metrics import r2_score,mean_squared_error
mse = mean_squared_error(Ptest,ytest)
score = r2_score(Ptest,ytest)
mse1 = mean_squared_error(Ptrain,ytrain)
score1 = r2_score(Ptrain,ytrain)
print('\n')
print('*'*45)
print('MSE,for test data : ',mse)
print('RMSE,for test data : ',np.sqrt(mse))
print('R2-score for test data: ',round(score*100,2),'%')
print('*'*45)
print('\n')
print('*'*45)
print('MSE,for train data : ',mse1)
print('RMSE,for train data : ',np.sqrt(mse1))
print('R2-score for train data: ',round(score1*100,2),'%')
print('*'*45)
#ploting for test data result
f,(ax1,ax2) = plt.subplots(1,2,figsize=(20,10))
ax1.scatter(Ptest,ytest,color='g')
ax1.plot([0,Ptest.max()],[0,ytest.max()],color='y')
ax1.set_xlabel('Predicted Data Set')
ax1.set_ylabel('Original Data Set')
ax1.set_title('Comparisign Graph')
sns.despine(left=True)
ax1.grid()
sns.distplot((Ptest-ytest),color='r',bins=10)
ax2.set_title('Distribution Plot for Predicting & Testing Data Set')
ax2.grid()
sns.despine(left=True)
#ploting for train data result
f,(ax1,ax2) = plt.subplots(1,2,figsize=(20,10))
ax1.scatter(Ptrain,ytrain,color='g')
ax1.plot([0,Ptrain.max()],[0,ytrain.max()],color='y')
ax1.set_xlabel('Predicted Data Set')
ax1.set_ylabel('Original Data Set')
ax1.set_title('Comparisign Graph')
sns.despine(left=True)
ax1.grid()
sns.distplot((Ptrain-ytrain),color='r',bins=10)
ax2.set_title('Distribution Plot for Predicting & Traing Data Set')
ax2.grid()
sns.despine(left=True)
return pd.DataFrame(Theta).T
```

`theta = normalEq(xtrain,ytrain,xtest,ytest)`

So , here we can see from **NORMAL EQUATION MODEL **that accuracy is way better now . For test data accuracy is **87.49%** and for train data accuracy is **86.25%.**

Let's see it more clearly by visualisation graphs...

So, from this graphs we can see the accuracy is really good.

**output**

```
print('Values of Intercept(col=0) and coefficients :-')
theta
```

** So, by adding regularisation factor I got very good result with only single iteration.**

**I also get the values of intercept & coefficients for each of the features.**

Thank you so much for reading this.....