Decision Tree Machine Learning Algorithms is a very important Machine Learning Algorithm through which we can solve both classification and regression problem statements. Decision Tree is also a base tree that is used in Bagging and Bossting techniques such as Random Forest and Xgboost Classification And Regression Algorithms.
All the important questions that can be asked in a Decision Tree are given below
Decision Tree Classifier And Regressor
Interview Questions:
- Decision Tree
- Entropy, Information Gain, Gini Impurity
- Decision Tree Working For Categorical and Numerical Features
- What are the scenarios where Decision Tree works well
- Decision Tree Low Bias And High Variance- Overfitting
- Hyperparameter Techniques
- Library used for constructing decision tree
- Impact of Outliers Of Decision Tree
- Impact of mising values on Decision Tree
- Does Decision Tree require Feature Scaling
First thing is to understand how decision tree works and how we split the decision tree based on entropy, Information gain and Gini impurity. You can check the below videos for the same
Entropy In Decision Tree
Information Gain Intuition
Gini Impurity
And Finally you need to understand how to visualize Decision Tree
1. What Are the Basic Assumption?
There are no such assumptions
2. Advantages
Advantages of Decision Tree
Clear Visualization: The algorithm is simple to understand, interpret and visualize as the idea is mostly used in our daily lives. Output of a Decision Tree can be easily interpreted by humans.
Simple and easy to understand: Decision Tree looks like simple if-else statements which are very easy to understand.
Decision Tree can be used for both classification and regression problems.
Decision Tree can handle both continuous and categorical variables.
No feature scaling required: No feature scaling (standardization and normalization) required in case of Decision Tree as it uses rule based approach instead of distance calculation.
Handles non-linear parameters efficiently: Non linear parameters don’t affect the performance of a Decision Tree unlike curve based algorithms. So, if there is high non-linearity between the independent variables, Decision Trees may outperform as compared to other curve based algorithms.
Decision Tree can automatically handle missing values.
Decision Tree is usually robust to outliers and can handle them automatically.
Less Training Period: Training period is less as compared to Random Forest because it generates only one tree unlike forest of trees in the Random Forest.
3. Disadvantages
Disadvantages of Decision Tree
- Overfitting: This is the main problem of the Decision Tree. It generally leads to overfitting of the data which ultimately leads to wrong predictions. In order to fit the data (even noisy data), it keeps generating new nodes and ultimately the tree becomes too complex to interpret. In this way, it loses its generalization capabilities. It performs very well on the trained data but starts making a lot of mistakes on the unseen data.
High variance: As mentioned in point 1, Decision Tree generally leads to the overfitting of data. Due to the overfitting, there are very high chances of high variance in the output which leads to many errors in the final estimation and shows high inaccuracy in the results. In order to achieve zero bias (overfitting), it leads to high variance.
Unstable: Adding a new data point can lead to re-generation of the overall tree and all nodes need to be recalculated and recreated.
Not suitable for large datasets: If data size is large, then one single tree may grow complex and lead to overfitting. So in this case, we should use Random Forest instead of a single Decision Tree.
4. Whether Feature Scaling is required?
No
6. Impact of outliers?
It is not sensitive to outliers.Since, extreme values or outliers, never cause much reduction in RSS, they are never involved in split. Hence, tree based methods are insensitive to outliers.
Types of Problems it can solve(Supervised)
- Classification
- Regression
Overfitting And Underfitting
How to avoid overfitting
Practical Implementation
- https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
- https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
Performance Metrics
Classification
- Confusion Matrix
- Precision,Recall, F1 score
Regression
- R2,Adjusted R2
- MSE,RMSE,MAE
Download All the materials from here
If you are looking for affordable tech course such as data science, machine learning, deep learning,cloud and many more you can go ahead with iNeuron oneneuron platform where you will able to get 200+ tech courses at an sffordable price for a lifetime access.