Overfitting is a concept in data science, which occurs when a statistical model fits exactly against its training data. When this happens, the algorithm unfortunately cannot perform accurately against unseen data, defeating its purpose. Generalization of a model to new data is ultimately what allows us to use machine learning algorithms every day to make predictions and classify data.
When machine learning algorithms are constructed, they leverage a sample dataset to train the model. However, when the model trains for too long on sample data or when the model is too complex, it can start to learn the “noise,” or irrelevant information, within the dataset. When the model memorizes the noise and fits too closely to the training set, the model becomes “overfitted,” and it is unable to generalize well to new data. If a model cannot generalize well to new data, then it will not be able to perform the classification or prediction tasks that it was intended for.
One of the major problems while training ML problems is overFitting / underfitting.
The model will have very high training accuracy and less test accuracy when the model is overfitted.
When we try to map the function between input and output, then it tries to completely see the data and memorize everything whatever it had seen then it works great on train data and don't do well on test data.
We should expect the alogorithm to have some variance. Ideally, it should not change too much from one training dataset to the next, meaning that the alogithm is good at picking out the hidden patterns.
Methods to avoid Overfitting
We keep one part of train data as validation data and the remaining as train.
To keep lower variance higher fold cross-validation is preferred.
This is what cross-validation looks like you set one part as validation and the remaining part as train data. If fold 10 then you will divide train data into 10 parts and you take each part as validation on every model you train and take the average of the result.
It provide guidance as to how many iterations can be run before the learner begins to overfit.
It is used extensively while building CART models.
It simple remove the nodes which add little predictive power for the problem in hand.
It introduces a cost term for bringing in more features with the objective function. Hence,it tries to push the coefficients for many variables >= 0 and reduce the cost term