Model fitting in machine learning is the measure of well the model is in making predictions/generalizations about the unseen dataset. Usually, model fitting in machine learning has two worse cases. One when the model is not able to understand the training dataset and another when the data understand everything about the training dataset. In both cases, the model will fail to make predictions about the unseen dataset. In this article, we will discuss these two cases in more detail and will learn what is the best fit model in machine learning. Moreover, we will use sklearn fit function to fit some of the models as well.
Introduction to model fitting in Machine learning
A fitting model in machine learning is simply training the model on the training dataset. Once, the model is trained, it can then make generalizations about the unseen dataset. A good fit model is one that accurately approximates and gives predictions on an unseen dataset.
We apply various hyperparameter tuning methods which are simply changing the values of the parameters in order to get a well-trained model. Usually, there are two cases when the model fails to make predictions about the input datasets. One is when the model fails to learn or train on the training dataset. This type of model is known as under fitted model. Another case is when the model understands everything from the training dataset which is known as an overfitted model. The best model is when the model understands the training model and will be able to make predictions about the unseen dataset.
As you can see how the underfitting and overfitted models can be failed to make correct predictions.
Why fitting of the model is important?
It is really important to understand the fitting process of the model in order to find the root cause of poor accuracy and predictions of the model. For example, if you know your model is under-fitted, then you can change the parameter values accordingly and make a well-fitted model. In an ideal case, our main goal is to get a model which should be in between under fitted and overfitted thresholds.
The following are some of the reasons why model fitting is important in Machine learning.
- It improves accuracy: Understanding the fitting of the model helps to get more accurate results. The predictions of the model are worse when the model is overfitted or under fitted. So, we can get a well-fitted model by knowing the root cause.
- It gives a better understanding of the dataset: Let us assume that we are trying to get a best-fitted model for the dataset about the price of houses. In such cases, we can understand which features contribute more to the price of houses.
- It helps to select the best model: As we know there are various types of models and fitting those models on the dataset helps us to get the best-fitted model.
- Improve efficiency: Fitting different models to the dataset can also help us to make more efficient use of the data.
Understanding of underfitting of the model
A machine learning model is said to be under-fitted when it fails to find the relationship between input variables and output variables. In other words, underfitting of the model is when the model fails to understand the dataset fully. One of the characteristics of an under-fitted model is that such a model will fail to give accurate predictions on both unseen and training datasets. So, if a model is making worse predictions on both training and unseen data, then most probably the model is under-fitted. In such cases, the model needs more time to train and understand the dataset.
Another characteristic of under-fitted models is high bias and low variances.
How to avoid under-fitting the model?
The following are some of the ways through which we can avoid a model being under-fitted.
- Increase the complexity of the model: Increasing the complexity of the model means increasing the neural networks, increasing iterations, depth of the decision trees, etc so that the model will have more time to interact with the dataset and get more details.
- Increase the dataset. Sometimes the reason for under-fitting is the lack of a dataset. We can also avoid a model being under-fitted by providing more datasets.
- Using the regularization techniques: Regularization techniques such as L1 and L2 can help to prevent the under-fitting of the model.
- Cross-validation: Another method to avoid under-fitting is to use cross-validation methods such as k-fold cross-validation.
- hyperparameter tuning: This is the most efficient method. We can adjust the parameters and make find the optimum model. There are already built-in algorithms for hyperparameter tuning such as grid search or random search.
- Remove noise from the data: Sometimes unnecessary data in the dataset can cause the model to be under-fitted as well.
Understanding the overfitting of the model
A Machine learning model is said to be overfitted when the model learns everything about the training dataset but fails to generalize the results for the unseen dataset. In such cases, the model also fails to make accurate predictions about the incoming unseen dataset. So, when a model performs well on the training dataset but fails to make accurate predictions on the unseen data, then the model is over-fitted.
Usually, overfitting occurs in supervised machine learning models, and these models are exposed to the training dataset.
How to avoid overfitting the model?
The following are some of the ways to avoid overfitting the model.
- Use more data: In most cases when we have less data then the model understands everything about the data and becomes over-fitted so increasing the dataset will help to avoid overfitting.
- Use regularization techniques: Regularization techniques such as L1 and L2 can help to avoid over-fitting models.
- Use early stop: Early stop is the feature of the fitting process which means if we achieved certain accuracy in the training process, the training should stop so the model will not be overfitted.
- Use cross-validation: Similar to underfitting, cross-validation helps to avoid over-fitting as well.
- Use ensembles: Ensemble methods such as bagging or boosting can reduce the risk of overfitting.
- Feature dropout: Another method is to reduce the dimensions of the dataset.
What is good fitted model?
A good fitted model in machine learning is a model that is able to make accurate predictions on training and testing datasets based on the relationship between the input variables and output variables. The following are some of the characteristics of the well-fitted model.
- It has a low bias
- It has a low variance
- Good performance on the training dataset
- Good performance on the testing dataset
- Consistent in performance
Fitting machine learning model in sklearn
In sklearn module, the fit function is used to fit or train the model. Remember that fitting a model is not the only thing we need to do in order to train the model. First, we have to understand the dataset, then apply feature engineering to filter the dataset, and based on the dataset, we select the best possible models and train them on the dataset.
In this section, we will take the already built-in dataset from sklearn dataset and apply the fit model.
Let us first import the dataset and required modules.
# importing the required modules from sklearn.datasets import load_iris #importing the knn model from sklearn.neighbors import KNeighborsClassifier # loading dataset data = load_iris()
Now, we will use the KNN model and use the fit function in sklearn to train the model.
x= data y= data.target #Create KNN Classifier knn = KNeighborsClassifier(n_neighbors=5) #Train the model using fit function knn.fit(x, y)
As you can see, the fit function basically has two parameters. The first parameter represents the input dataset and the second parameter shows the corresponding output dataset.
In this short article, we learned about model fitting in machine learning. We discussed the type of fitting and how to get a well-fitted model Moreover, we also learned how we can implement model fitting in machine learning using the fit function in sklearn module.
2 thoughts on “Model Fitting in Machine Learning – Sklearn”