This blog will help you to quickly brush some of the most important Machine Learning Interview questions that were recently asked in interviews. This will a multipart series from which you can quickly revise all the questions in a go.
Q.1 Where you have used Hypothesis Testing in your Machine learning Solution?
Hypothesis testing is the kind of the experiments we perform to compare our dataset to get some conclusions
I have used Hypothesis Testing in my Solution in various cases-
1. To check whether My Dataset is normally Distributed or not by conducting Shapiro-Wilk test.
If the p-value(>=0.05) of our dataset then we can say that Our Dataset is Normally Distributed .
Hence we accept the null Hypothesis and rejects the alternate Hypothesis.
2. To check Multicollinearity between Features-
Pearson correlation test to validate if the independent quantitative variables aren’t correlated.
Q.2 What kind of statistical tests you have performed in your ML Application?
I have performed various kinds of Statistical tests in my projects like chi-square test, z-test.
Basically I have used statistical test as a feature selection technique.
It depends on the nature of the dataset that what experiments we have to perform.
1. Chi Square test to compare Categorical features.
2. Anova test to check whether the mean of two or more groups are different or not.
3. Pearson coefficient test to check multicollinearity between features.
4. Z-test to do standard scaling.
Q.3 What do you understand by P Value? And what is use of it in ML?
P-value is a set of criteria using which either we accept or reject the experiments.
P value is the smallest significant value at which the null hypothesis will be rejected. If my P value or significant value or alpha value falls in the region of gaussian normally distribution then only we are going to accept it else we need to reject the null hypothesis and accept the alternative hypothesis.
In M.L it is also known as Significance level and it is used in various kinds of tests like t-test,z-test and so on.
It can be used for feature selection and is provided by domain expert.
Q.4 Which type of error is severe Error, Type 1 or Type 2? And why with example
1. Type I error means rejecting the null hypothesis when it’s actually true. It means concluding that results are statistically significant when, in reality, they came about purely by chance or because of unrelated factors.
2. Type II error means not rejecting the null hypothesis when it’s actually false. This is not quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell you whether to reject the null hypothesis.
You decide to get tested for COVID-19 based on mild symptoms. There are two errors that could potentially occur:
- Type I error (false positive): the test result says you have coronavirus, but you actually don’t.
- Type II error (false negative): the test result says you don’t have coronavirus, but you actually do.
Q.5 Where we can use chi square and have used this test anywhere in your application?
Chi square test is a statistical test which we used to compare/understand the relationship between our categorical features.
In M.L we use this test as a feature selection technique
Q.6 Can we use Chi square with Numerical dataset? If yes, give example. If no, give Reason?
Chi square is mostly used on categorical features, it can be used on numerical features but since the chi square test is based on frequencies the numerical data would need to be split into various categories like bins.
Since chi square test is based on frequencies.
Q.7 What do you understand by ANOVA Testing?
Analysis of Variance(ANOVA) is a statistical test which we perform to see relationship between two or more groups.
ANOVA test basically used to compare the mean of different samples/groups so we can conclude that which group is among all the groups is more effective.
The fundamental concept behind the ANOVA test is “Linear Model”.
Null Hypothesis is valid when all the sample means are equal.
Alternate Hypothesis is valid when atleast one of the sample mean is different from other sample means.
H0:µ1=µ2=—=µL Null Hypothesis
H1: µl≠µm Alternate Hypothesis
Types of ANOVA-
One Way ANOVA-When we are comparing groups based on only one factor/variable.
Two Way ANOVA- When we are comparing groups based on more than one factor/variable.
Q.8 Give me a scenario where you can use Z test and T test.
a)We use Z test when population standard deviation is given and sample size is greater than equal to 30.
b) We use T test when population standard deviation is not given and sample size is less than 30. Here we use sample standard deviation.
Both Tests are used for comparison of mean.
Q.9 What do you understand by inferential Statistics?
Ans- Inferential statistics is a type of statistics which we used to make some conclusions/predictions on given sample/population i.e Dataset by using various tests like t-test, z-test, chi-square test and so on.
Q.10 When you are trying to calculate Std Deviation or Variance, why you used N-1 in Denominator?
Ans. We use N-1 in denominator to avoid the bias estimation in the result.
Suppose if we have taken sample is very closed to each other but sample mean will be far away from population mean.
In this case if we choose N/N+1/N+2 in denominator then it will lead to biased selection.
Hence we use N-1 in denominator for unbiased experiment/result.
If you are looking for affordable tech course such as data science, machine learning, deep learning,cloud and many more you can go ahead with iNeuron oneneuron platform where you will able to get 200+ tech courses at an sffordable price for a lifetime access.