Statistics Question Answers

BIJOY ROY Oct 30 2021 · 3 min read
Share this

1. What do you understand by P Value? And what is use of it in ML?

     A:   P value is the probability of achieving a result more extreme than your current one.

It is used in hypothesis testing and it helps us to decide whether to accept or reject a null hypothesis.

2.  Which type of error is severe Error, Type 1 or Type 2? And why with an example.

A:  There is no direct answer to which one is more severe, it completely depends on the use case.

Type 1 error can also be called False positive and Type 2 error can be called False negative.

Now, if we have a person who has an allegation of some crime, here Type 1 error is more severe because if that person is innocent we can’t punish him/her.

If we test for covid then here Type 2 error is more severe, we can’t report that someone is covid negative while that person has covid, it can cost someone’s life.

3.  Where can we use chi square?

A:  A: Chi Square can be used to find relation between two categorical variables in our dataset.

4.  Give me a scenario where you can use Z test and T test

A:  Z test can be used when population standard deviation is known and sample size is >= 30.

T test can be used when population standard deviation is unknown and sample size is <30.

5. What do you understand by inferential Statistics?

A:  It is a form of statistics that deals with extracting different conclusions from the data using different tests like Z test, T test.

6. When you are trying to calculate Standard Deviation or Variance, why you used N-1 in Denominator?

A:  N-1 gives us unbiased estimation of population variance than dividing by N.

7.  What do you understand by right skewness, Give example?

A:  Right skewness is a condition when some of the data has really high values and the distribution has a long tail in the positive axis. In this scenario mean is greater than median.

If we plot the wealth distribution of the world it would be right skewed.

8.  What is difference between Normal distribution and Std Normal Distribution and Uniform Distribution?

A:  A Normal distribution doesn’t has any fixed mean and the variance can be between anything.

Standard Normal Distribution has a mean of zero standard deviation of 1.

A uniform Distribution is special distribution where all the outcome are equally likely to occur.

9.  What do you understand by symmetric dataset?

A:  A data which is almost identical on either side of it’s mean is a symmetric dataset.

10.  In your last project, were you using symmetric data or Asymmetric Data, if its asymmetric, what kind of EDA you have performed?

A:  The data was Asymmetric. I performed transformations like log transformation and box-cox transformation to turn the Asymmetric distribution to roughly Symmetric distribution

11.  What do you understand by 1st,2nd and 3rd Standard Deviation from Mean?

A:  Standard Deviation tells us how far the data is in the distribution from it’s mean.

1st standard deviation from mean means that the data is mean +/- 1*Standard deviation distance from the mean for 2nd it mean +/-  2*Standard deviation and for 3rd it is mean +/-  3*Standard deviation.

12.  Explain the relationship between Variance and Bias

A:  Machine Learning models suffers from variance and bias. When a model is high variance low bias, it is going to overfit the training data and give very low performance in unknown data.

On the other hand if a model has high bias and low variance then it means it wasn’t able to fit to the training data properly.

13.  What do you understand about the Z Value given in the Z Table?

A:  It tells you how many standard deviations away is your threshold. Or X percentage of data falls under which standard deviation.

14.  Can you please explain the critical region in your way?

A:  The critical region is a region beyond the alpha value at which you can reject the null hypothesis.

15.  What do you understand by Precision, Recall and F1 Score with example?

A:  Precision: Out of all the positively predicted results how many are actually positive.

Recall: Out of all the actual positive class how many actually predicted as positive

F1 Score: It is a harmonic mean of precision and recall.

16.  What is AUC & ROC Curve? Explain with uses.

A:  AUC & ROC curve is a classification metric that is drawn using True positive rate and False positive rate(1-Specificity). The curve helps us to decide the classification threshold other than the default one that is 0.5. The AUC score tells us how good the classifier is at predicting classes correctly.

17.  How do you set level of significance for your dataset?

A:  This is set by the domain expert.

18.  What are the testing techniques that you use for model testing, name some of those?

A:  For Regression: We use error metrics like MSE

R2 score and Adjusted R2 score for understanding the explained variance by the features.

For Classification: We use Confusion matrix

Precision, Recall, F1 Score

Auc Roc score


Read next