Interview Questions Set 1

#datascience #statistics

Akash Borgalli Dec 11 2021 · 3 min read
Share this

1. Where you have used Hypothesis Testing in your Machine learning Solution.

Ans: I haven’t used hypothesis testing but I can tell you what’s hypothesis testing we basically state or define null hypothesis(H0) and Alternative Hypothesis(H1) which is the opposite of H0 statement then we perform an experiment on top of it and then state the conclusion whether we should accept or reject the null hypothesis. An example could be whether India will win WorldCup this year?

2. What kind of statistical tests you have performed in your ML Application?

Ans: I haven’t performed any statistical tests yet but I can name few

Z-Test: It is used to compare means where it follows normal distribution also which has sample size = 30.

T-Test: It is used when experiments have a small sample size which is less than 30 also you don’t know what is your standard deviation. It works well when dealing with two groups

F-Test: It is also called as Annova Test. It is used when you want to compare more than 2 groups or compare 1 numerical and 1 categorical feature.

3. What do you understand by P-Value? And what is the use of it in ML?

Ans:  P-value in short means probability. Let’s consider an example in a laptop mouse region what is the probability that u touch that particular region.

4. Which type of error is severe Error, Type 1 or Type 2? And why with example.

Ans: Totally depends on the business use case which would be more severe.

For Type 1 Error Let’s assume that the person is innocent where we don’t have much evidence to prove that person is innocent due to which it accepts alternate hypothesis(H1) this is nothing but Type 1 error.

For Type 2 Error, Let’s consider your Null Hypothesis(H0):- The market is going to crash and H1:- Market is not going to crash, now if I don’t have enough evidence because of which I was unable to prove that the market was going to crash but I had more pieces of evidence for H1 statement so H1 becomes true..but due to lack of evidence H0 becomes true. In this particular scenario, we would be getting a Type 2 error which tells us that we should not reject H1. We come to know about this seeing the confusion Matrix given below.

Figure 1

5. Where we can use chi-square and have used this test anywhere in your application?

Ans: When we do hypothesis testing we come to a conclusion for a population, based on a sample of the dataset. Based on the sample dataset we divide it into the Alternate(H1) and the null hypothesis(H0) and then we collect pieces of evidence and then we apply different techniques out of which we have Chi-Square test in order to find out Null hypothesis(H0) is true or Alternate hypothesis. The Chi-square test is generally performed on categorical data. It is also used to evaluate the relationship between 2 or more categorical variables.

6. Can we use Chi-square with the Numerical dataset? If yes, give example. If no, give a Reason?

Ans: Yes, the chi-square test can be used for the Numerical dataset provided you need to form separate categories out of it based on frequencies of numerical data would need.

7. What do you understand by ANOVA Testing?

Ans: ANOVA test becomes useful when you want to compare more than 2 groups at the same time. Let's consider that you want to consider 1 numerical feature and 1 categorical feature which has more categories in it .for example, your numerical feature is Height and your categorical feature is Age group in the age group you can have categories like Elderly, Adult, child which are more than two categories then we should go for ANOVA Testing.

8. Give me a scenario where you can use the Z test and T-test?

Ans: Z test can be used where you are provided with population standard deviation. It is based on the standard normal distribution. Its sample size is greater than 30. For example: Testing a sample mean score of 30 students is equal to the population mean when the population mean and standard deviation of a class of 80 is known.

Figure 2

The T-test is used when you don’t have a population standard deviation. It works on 1 or 2 numerical features. It’s usually used when your sample size is less than 30 For example: The height and Weight feature.

Figure 3

9. What do you understand by inferential Statistics?

Ans: Inferential Stats is something where based on a sample size of data you perform experiments and assume that population would be having same as the sample. For example: you took a survey on let’s say how will win the world cup. You went to 30 people and asked what do you think who will win the trophy? And based on their answer you conclude that who will win the match out of the country population.

10. When you are trying to calculate Std Deviation or Variance, why do you use N-1 in  Denominator? (Hint: Basel Connection)

Ans: Because there are times when we have skewed data and the datapoints are picked randomly it could be the case where the sample is taken where datapoints are close to each other now when we take such sample obviously the difference from the population mean would be very high. So to avoid this difference researchers tried dividing by n-1,n-2,n-3 and they found that n-1 fills a good amount of gap from the sample mean.

Read next