20+ Most Asked Supervisied Machine Learning Question

#supervisedmachinelearning #interviewquestion

Atul kumbhar Dec 27 2021 · 7 min read
Share this

Interview question :

1:which is the best algorithm for the dataset?

Ans : -as my point of view i always prefer with the nature of the dataset.nothing is best algo and nothing is worst algo.if we talk about the classfication problem there are n of of algorithms such as logistic,decistion,random forest etc.. so i always prefer with the experimention process with different algorithms,also i will also understand the dataset as the business point of view.

2:Can we use PCA to reduce dimensionality of highly non-linear data.

Ans : -kernel pca technique as the non-linear data we can considered as polynomial regression data.

3:Have you ever used multiple dimensionality techniques in any project? if yes, give reason. If no, where can

we use it?

Ans: - yes I have been used as the experimentation of my ccdp project to check whether it is giving me difference for the prediction result.

4:In which case you will use naïve Bayes classifier and decision tree separately?

Ans: Basically , if i am able to understand the business point of view of the dataset, and if the specific features has the independent relation or direct relation with the label column in that case, i can use the naive bayes algorithm other wise i will go with decision tree as decison tree is one of advance algorithm to seperate the dataset in best way.

5:What is the adv & disadvantage of naïve Bayes classifier, explain?

Ans: Advantage i can say that each seperate feature column has the direct relation or independent relation with the label column that means through that specific column i will find out the result.

Disadvantage Its always find out the probability based on pre-occurences result.

6:In case of numerical data what is naïve Bayes classification equation you will use?

Ans : - If the numerical data is in from of categorical then i will go with

probability : p(A/B) = (p(B/A).p(A))/p(B)

but if the data is in continous form then i will go with gauusian naive bayes distribution.

7: What do you understand by Gradient decent? How will you explain Gradient decent to a kid?

Ans:- lets take the example of the linear regression w.r.t to gradient descent as we know that the main objective of the linear regression is to find the value of m and c.As the linear regression equation is y = mx + c. lets take the manual approach.basically lets say there are 3 terms error which can considered as residual for now.so our main aim is to find the the

value of m and c which is equal or tells to zero.lets say i have a data point at the top of error axis.so my target is to drop the datapoint at the surface of the value of m and c.for that purpose i have to move these point but i cannot directly jump these point to the surface, for that learning rate(eata) comes in the place so basically is what trying to do,it control the change so equation of the ,so after applying the formula of m_new = m_old + eata(dr/dm) and c_new = c_old + eata(dr/dc) i am able to chnage the positon of that data point w,r,t learning rate,now my value of m and c will be less as well as my dr/dm and dr/dc will also be less.i will follow the pattern continously until i am not able to get the point at the surface of m and c and when dr/dm =0 and dr/dc = 0 that value of m and c will be considered as bestfitted line as these approach is called as gradient descent.

8: Explain learning mechanism of linear regression.

Ans: - All explaination.

9: What is the error function in linear regression?

Ans: - The error function delta(e of m) = dr/dm and delta(e of c) = dr/dc

10: what is loss or cost function of linear regression?

Ans - the less difference between actual and predicted values or the error term we can consider.

11:What is meaning of bootstrap sampling? explain me in your own word.

Ans: - bootstaping sampling if i relate this concept in decision tree its the bagging part approach basically so boostrap sampling lets say i have given a population dataset and i have to segregate this dataset in various samples so if lets say i have create 5 decision tree in my bag so that samples will segregate in this 5 decision tree.

12:What is out of bag evaluation?

Ans: if the given decision tree is not able to give the segregate the data in clear manner or there is some overfitting or underfitting or bias or variance is happen with the data in that problem i will resolve this issue or not consider that decision tree this is called out of bag evaluation.

13:What do you understand by hard & soft voting classifier?

Ans: In case of the bagging approach of classification ,if the decision tree d1 having more voting as compared to decision tree 2 then that classifer is consider as hard voting classfier and the decision tree 2 will be considered as soft voting classfier.

14:What do you understand by underfitting & overfitting of model with example?

Ans: -if lets say i am able to build the model with respect to any algorithm but in case while training the dataset my accuracy was 82.5% and and in testing of the dataset my accuracy was 50% so this huge difference between training as testing is nothing but the overfitting ad underfitting.

15:Give me scenario where I will be able to use a boosting classifier and regressor?

Ans: so if in the decision tree i am using the id3 or c4.5 algorithm in that case i can use the boosting classfier algo because id3 and c4.5 follows classfication protocol and in terms of cart(classfication and regression tree)algorithm i can use boosting classfier and regressor algorithm.

16:What do you understand by leaf node in decision tree?

Ans: so basically we can say that in simple way leaf nodes are the predicted label or the result which is the main objective of the decision tree.

17:What is information gain & Entropy in decision tree?

Ans: Entropy is the scatterness or degree of freedom or randomness of the dataset just like an gini impurity where information gain is opposite of the gini,in dataset whichever column having hign information gain that column is consider as root node.

18:Give disadvantages of using Decision tree

19:How can you avoid overfitting in decision tree?

Ans: pruing is the method which helps with the problem of overfitting.lets say if it is the postpruning, in these overfiiting may be occur after the segrrgation of the dataset with differnt branches so postpruning will help to remove that branch which has high bias or high variance or residual error.

20:Explain the concept of GINI Impurity.

Ans: lets assume there is a dataset as it haves 3 column if we consider the threshold value for the particular and after the seperation of each column w.r.t threshold value we will fing the gini value for each attribute of the column and also gini value of the particular column so lets say if column 1 has less impurity as compared to column 2 and column 3 that column will be consider as root node and other column will be consider as leaf node.

21:Let’s suppose I have given you dataset with 100 column how you will be able to control growth of decision


Ans: By hyperparameter tuning approach, In a decision tree there is a parameter called as maximum depth which helps us to control the depth layer of the decision tree.

22:If you are using Ada-boost algorithm & if it is giving you underfitted result What is the hyperparameter tuning

you will do?

Ans: In the ada-boosting if i am getting some underfitting lets take a situation so basically underfitting means its just the error with respect to the dataset,in that case lets say out of 100 records i am getting 1 error record in that case i will solve it by computing alpha weightage then i will update my weight and at last i will normalize the weight which will give me updated value so in that way i will resolve the issue of the underfitting.

23:In case of using the xgboosting algorithm what can be the objective function for regression and classification,also tell me some of the objective function inside the xgboost.

Ans: If the dataset is in regression or classfication pattern there are many objective function which are part of the inbuilt sklearn hyperparameter tuning model.So if is is regressor problem then we can use RMSE.RME and if it is classification problem we can use logistic regression,binary logistic,pseudohubererror etc.

24:Give me example of lazy learner and eagar learner algorithms example.

Ans: knn algorithm can be consider as lazy learner algorithm as it takes the whole dataset to find the minimum nearest neighbour which built up the heavier model and logistic regression,linear regression.decision tree,random forest etc will consider as eager learner because here size of the model doesn't depends on records but the no of features as the size of model will not change if there are 3000 or 30000 records in dataset.

24:What is difference between Euclidian distance and Manhattan distance. Explain in simple words.

Ans : As there is no such big difference between Euclidean distance and manhatten distance.Both having same approach to find out the minimum distance between unknown data and the records in the dataset and based on the highest probabilites these unknown data belongs to which class.The difference is only the formula and the dataset must be in numerical format.

Euclidean Distance = d = √[(x2 – x1)2 + (y2 – y1)2].

Manhatten Distance = |X1 – X2| + |Y1 – Y2|

25:Explain vectorization and hamming distance.

Ans : The vectorization comes in terms of svm is the concept where if we have drawn the hyperplane to seperate the dataset,In that whichever datapoint in nearest to the hyperlane that point is consider as the support vector.The hamming distance is used to find the nearest neighbour or k value in knn algorithm only if the dataset is in categorical format.

26:Give me scenario where I will be able to use a boosting classifier and regressor?

Ans: In the boosting there are 3 boosting algorithms

1:XG Boost

2:Gradient Boosting

3:Ada Boosting

In case of regressor Gradient Boosting can be used to find the updated or new weight formula(weight_new = weight_old + learning_rate * delta(error)) and for classification problem we can use Ada Boosting and to find the updated weight = oldweight[alpha.I(y!=y^)] where I = intesection point y!=y^.

27:Give me a situation where I will be able to use SVM instead of Logistic regression.

Ans: Lets say if the dataset is in the shape where it cannot be seperated or we are not able to draw the bestfited line.In that case SVM cames into picture.In SVM there is a method called as kernel Trick which helps to seperate the training data by giving (n+1) dimension.

28:What do you understand by rbf kernel in SVM?

Ans:RBF kernel is one of the parameter used in the kernel trick inside the SVM model approach helps to seperate the training data or to draw the bestfitted line by giving (n+1) dimension.

Read next