Are you looking to have an in-depth understanding of Gradient boosting and want to learn how to do hyperparameter tuning of gradient boosting algorithm using Python? Well, here we go!

A gradient boosting algorithm is a type of boosting algorithm that combines many weak learners to come up with a strong predictive model. Similar to the Ada boost algorithm, the Gradient boosting algorithm also uses decision trees as a weak learner. Gradient boosting algorithm creates sequential trained models (weak) where every model tries to overcome the weaknesses of the previous model. In this article, we will learn about the gradient boosting algorithm and discuss how it works. Moreover, we will also cover various ways of hyperparameter tuning of the gradient boosting algorithm.

We assume that you already have covered the previous articles about the Machine learning and boosting algorithm section. Also, make sure that you have a strong understanding of the Ada boost algorithm and decision trees as well.

You may also like:

**Full Machine learning tutorial with more than 30+ algorithms explained**

## What is a Gradient boosting algorithm using Python?

Gradient Tree Boosting, Stochastic Gradient Boosting, and GBM are other names for the Gradient Boosting Algorithm. It creates a sequence of weak models ( usually decision trees) and comes up with a final strong learner. Each prediction in gradient boosting aims to outperform the one before it by lowering the errors. Gradient Boosting’s key principle is that it fits a new predictor to the residual errors created by the preceding predictor rather than fitting a prediction to the data at each iteration.

### How does the gradient boosting algorithm work on regression?

The working of the gradient boosting algorithm is simple and very smart. It starts predicting the output values by building various decision trees. The very first decision tree contains a single leave. In the case of the regression dataset, this leaf contains the average of the output values. For example, let us assume that we have the following dataset about the age and salary of individuals. And we want to apply the Gradient boosting algorithm to make predictions.

So, the very first weak learner of the Gradient boosting algorithm is a decision tree with a single leaf. And in the case of a regression dataset, this leaf is simply the average value of the output class.

This will be the very first prediction of the Gradient boosting in the first iteration. Let us also visualize the prediction on a graph using Python.

```
# acutal values
actual = [3000, 2000, 3300, 1800, 3600, 2400]
first_iteration =[2683,2683,2683,2683,2683,2683 ]
# importing the required modules
import matplotlib.pyplot as plt
# actual values
plt.plot([i for i in range(len(actual))], actual, label='actual' , c='m')
# predicted values
plt.plot([i for i in range(len(actual))], first_iteration, label='predicted')
plt.legend()
plt.show()
```

Output:

As you can see the first weak learner just provides the average value as the prediction. In a gradient boosting algorithm, the next weak learner will not be trained on the training dataset, instead, it will be trained on the residuals ( errors) of the previous weak learner to improve the predictions.

Let us understand the training of the next weak learner step by step. So, the very first step is to calculate the residual of the first predictions.

So, now based on these residuals another decision tree will be created. Learn how to build decision trees from the article Decision trees using Python.

One of the most important parameters of boosting algorithms is the learning rate which is simply the step sizes to get the optimum solution. In this case, we will use 0.4 as the learning rate. So, now the algorithm will use the previous predictions ( 2683) and combine them with learning rate and error to come up with a new prediction. It uses the following formula to calculate the next predictions.

previous predictions – (learning rate) * ( error)

For example, the prediction of the first value in the second weak learner will be:

2683 – ( 0.4) * (-317)

2809.8

As you can see, the predicted value of the second weak learner is better than the first weak learner. Let us calculate the predicted values for each of the input values.

Let us also visualize the predictions of the second weal learner along with the first one to see how the model boosts the learning in the second step.

```
# second step
second_iteration =[2809, 2409, 2929, 2329, 3024, 2570]
# actual values
plt.plot([i for i in range(len(actual))], actual, label='actual' , c='m')
# predicted values
plt.plot([i for i in range(len(actual))], first_iteration, label='weak learner-1', linestyle='dashed')
# predicted values of weak learner 2
plt.plot([i for i in range(len(actual))], second_iteration, label='weak learner-2', linestyle='dashed')
plt.legend()
plt.show()
```

Output:

As you can see, the predictions of the second weak learner are much better than the first one. In a similar way, the gradient boosting algorithm will build a specified number of weak learners where each model tries to reduce the error of the previous one.

### How does a Gradient boosting algorithm work on a classification dataset?

The working of the Gradient boosting algorithm on classification is very much similar to the regression one. It creates a first weak learner ( decision tree with one leaf) and then calculates the residual. The second model considers these residuals and based on the learning rate, tries to decrease these residuals.

The only difference is the value assigned to the first decision tree. In the case of the regression dataset, the very first prediction of the weak learner was the average value of the output values, but in the case of the classification dataset, the first prediction of the weak learner is the log(odds). The rest of the steps are similar to the regression one.

**Gradient boosting algorithm using Python** for classification

Now, we will use the gradient boosting algorithm on a classification dataset and will learn how we can implement it using Python. For simplicity, we will be using the iris dataset as sample data. Before going to the implementation part, ensure you have installed the following Python modules.

- sklearn
- matplotlib
- pandas
- NumPy
- plotly
- seaborn

Let us first load the dataset and explore it a little bit using the pandas module.

```
# importing the required modules
from sklearn import datasets
import pandas as pd
import numpy as np
# loading the iris dataset
dataset = datasets.load_iris()
# converting the data to DataFrame
data = pd.DataFrame(data= np.c_[dataset['data'], dataset['target']],
columns= dataset['feature_names'] + ['target'])
# printing the few rows
data.head()
```

Output:

As you can see, there are four input attributes and one target class. The target class contains three different types of flowers.

Let us now plot the box plot using pandas to see the distribution of input data. By the way, check this article to learn how you can use pandas for data visualization.

```
# plotting the box plot
data.drop('target', axis=1).plot.box(figsize=(10, 6))
```

Output:

Let us now use the move to the splitting of the dataset.

### Splitting the dataset

Let us first divide the dataset into inputs and output variables.

```
# splitting the dataset into input and output
Input = data.drop('target', axis=1)
Output =data['target']
```

The next step is to split the dataset into the testing and training parts. We will also assign value 1 to the random state.

```
# importing the module
from sklearn.model_selection import train_test_split
# splitting into testing and training parts
X_train, X_test, y_train, y_test = train_test_split(Input, Output, test_size=0.30, random_state=1)
```

As you can see, we have assigned 30% of the data to the testing part.

### Gradient boosting classifier with 2 iterations

As we learned how the iterations affect the overall predictions. Let us first create a model with 2 iterations.

```
# importing gradient boosting classifier
from sklearn.ensemble import GradientBoostingClassifier
# specifing the 2 iterations
GB_clf=GradientBoostingClassifier(n_estimators=2)
# training the model
GB_clf.fit(X_train,y_train)
```

Once, the training is complete, we can then use the testing dataset to make predictions.

```
# predicting
GB_predict=GB_classifier.predict(X_test)
```

Now we will use various evaluation techniques to know how well the model has predicted. Let us first visualize the confusion matrix.

```
# importing modules
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
# confusion matrix plotting
cm = confusion_matrix(y_test,GB_predict, labels=GB_clf.classes_)
# labelling
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=GB_clf.classes_)
disp.plot()
plt.show()
```

Output:

Let us also find the accuracy of the model.

```
# importing the module
from sklearn.metrics import accuracy_score
# printing
print("The accuracy is: ", accuracy_score(y_test, GB_predict))
```

Output:

This shows that the model was able to classify 88% of the testing data correctly.

### Gradient boosting with 100 iterations

Now, let us increase the iterations to 100 and train the model again.

```
# importing gradient boosting classifier
from sklearn.ensemble import GradientBoostingClassifier
# specifing the 2 iterations
GB_clf=GradientBoostingClassifier(n_estimators=100)
# training the model
GB_clf.fit(X_train,y_train)
# predicting
GB_predict=GB_clf.predict(X_test)
```

Now, let us also evaluate the model using the confusion matrix and accuracy score.

```
# importing modules
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
# confusion matrix plotting
cm = confusion_matrix(y_test,GB_predict, labels=GB_clf.classes_)
# labelling
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=GB_clf.classes_)
disp.plot()
plt.show()
```

Output:

As you can see, this time there are fewer misclassified items. Let us also calculate the accuracy score of the model.

```
# importing the module
from sklearn.metrics import accuracy_score
# printing
print("The accuracy is: ", accuracy_score(y_test, GB_predict))
```

Output:

As you can see, we got better accuracy this time. In a similar way, we can also find optimum values for each of the parameters to find the optimum result. Later in this article, we will learn the hyperparameter tuning of the gradient boosting algorithm to find the optimism model.

## Gradient boosting algorithm using Python on regression data

Now, we will use the gradient boosting algorithm to solve a regression problem. In this section, we will be using a dataset about house prices. Let us first import the dataset and explore it a little bit.

```
# importing dataset
data = pd.read_csv('house.csv')
# heading of the dataset
data.head()
```

Output:

The next step is to remove the null values as the Gradient boosting algorithm cannot handle null values.

```
# removing the null values
data.dropna(axis=0, inplace = True)
```

Now the dataset is ready and we can split the data to train the model.

### Splitting dataset

Let us now divide the dataset into input values and output values.

```
# input and output variables
Input = data.drop('price', axis=1)
Output = data.price
```

The next step is to split the dataset into testing and training parts.

```
# importing the module
from sklearn.model_selection import train_test_split
# splitting into testing and training parts
X_train, X_test, y_train, y_test = train_test_split(Input, Output, test_size=0.25)
```

Once, we are done with the splitting of the dataset, we can then move to the training part.

### Gradient boosting regressor with 2 iterations

Let us now import the Gradient boosting regressor and initialize the model with 2 iterations.

```
# importing the regressor
from sklearn.ensemble import GradientBoostingRegressor
# training the model with 2 iterations
GB_rgsr=GradientBoostingRegressor(n_estimators=2)
# training the model
GB_rgsr.fit(X_train,y_train)
```

Once the training is complete, we can then use the testing dataset to make predictions.

```
# making the predictions
GB_predict=GB_rgsr.predict(X_test)
```

let us now visualize the actual and the predicted values of the model.

```
# figure size
plt.figure(figsize=(12, 8))
# acutal values
plt.plot([i for i in range(len(y_test))],y_test, label="actual values")
# predicted values
plt.plot([i for i in range(len(y_test))],GB_predict, c='m',label="predicted values")
plt.legend()
plt.show()
```

Output:

As you can see, the predictions are not really good because we have used only two iterations. The R-score of the model is:

```
#importing the r-square score
from sklearn.metrics import r2_score
# calculating the r score
print('R score is :', r2_score(y_test, GB_predict))
```

Output:

As you can see, the R-square score is pretty low because we have used only two iterations.

### Gradient boosting regressor with 20 iterations

Now, we will train the model using 20 iterations and see how the algorithm will perform. Let us initialize the model with 20 iterations and make predictions using the testing dataset.

```
# training the model with 2 iterations
GB_rgsr=GradientBoostingRegressor(n_estimators=20)
# training the model
GB_rgsr.fit(X_train,y_train)
# making the predictions
GB_predict=GB_rgsr.predict(X_test)
```

We can now move to the visualization part and visualize the actual and the predicted values of the model.

```
# figure size
plt.figure(figsize=(12, 8))
# acutal values
plt.plot([i for i in range(len(y_test))],y_test, label="actual values")
# predicted values
plt.plot([i for i in range(len(y_test))],GB_predict, c='m',label="predicted values")
plt.legend()
plt.show()
```

Output:

As you can see, this time the predictions are much better and more accurate as compared to last time.

Let us also calculate the R-square score of the model.

```
# calculating the r score
print('R score is :', r2_score(y_test, GB_predict))
```

Output:

As you can see, the r-square score is also better than last time.

## Hyperparameter tuning of Gradient boosting algorithm using Python

As we know there are various important parameters in the Gradient boosting algorithm that helps to get an optimum result. In this section, we will go through some of these parameters and will use a couple of methods to find the optimum values for these parameters.

We will use the iris data and Gradient boosting classifier. Let us again import the iris data and split it into input and output values.

```
# loading the iris dataset
dataset = datasets.load_iris()
# converting the data to DataFrame
data = pd.DataFrame(data= np.c_[dataset['data'], dataset['target']],
columns= dataset['feature_names'] + ['target'])
# splitting the dataset into input and output
Input = data.drop('target', axis=1)
Output =data['target']
```

As we had imported the dataset, it is now time to create some functions to find the optimum value for parameters.

### finding The optimum number of trees in Gradient boosting

As we know that it is important to find the optimum number of trees in Gradient boosting to get an optimum result. We have seen before that changing the number of trees (weak leaner) effect the overall predictions of the model. The hyperparameter tuning of the Gradient boosting algorithm is very much similar to the hyperparameter tuning of the Ada boost algorithm.

First we will import all the necessary modules that are required for the hyperparameter tuning of the Gradient boosting algorithm using Python.

```
# importing all modules for Gradient boosting algorithm using Python
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from matplotlib import pyplot
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from numpy import arange
```

The next step is to initialize the model with multiple iterations. Here iterations simply mean the number of trees or weak learners. Because in each iteration, the model creates a weak learner. We will create a user-defined function that will return multiple models with different numbers of iterations.

```
# fuction to create models
def build_models():
# dic of models
GB_models = dict()
# number of decision stumps
decision_stump= [10, 50, 100, 500, 1000]
# using for loop to iterate though trees
for i in decision_stump:
# building model with specified trees
GB_models[str(i)] = GradientBoostingClassifier(n_estimators=i)
# returning the model
return GB_models
```

As you can see, the above model returns a dictionary of models with different iteration values starting from 10 to 1000.

Next, we will create a function for the evaluation of the models that we have created above. We will use the same function for the evaluation of models while finding other parameters as well.

```
# function for the validation of model
def evaluate_model(model, Input, Ouput):
# defining the method of validation
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3)
# validating the model based on the accurasy score
accuracy = cross_val_score(model, Input, Ouput, scoring='accuracy', cv=cv, n_jobs=-1)
# returning the accuracy score
return accuracy
```

As you can see, the above function evaluates the model based on the accuracy core and it uses the cross-validation method.

Now, it is time to call these functions and print out the optimum number of iterations.

```
# calling the build_models function
models = build_models()
# creating list
results, names = list(), list()
# using for loop to iterate thoug the models
for name, model in models.items():
# calling the validation function
scores = evaluate_model(model, Input, Output)
# appending the accuray socres in results
results.append(scores)
names.append(name)
# printing - Gradient boosting algorithm using Python
print('---->Iterations (%s)---Accuracy( %.5f)' % (name, mean(scores)))
```

Output:

As you can see, the accuracy increases to 50 iterations and then again starts to decrease. So, we can say that the optimum number of iterations of the Gradient boosting model on our dataset is 50.

Let us also visualize the mean accuracy of the model with box plots.

```
# plotting box plot of the
pyplot.boxplot(results, labels=names,showmeans=True)
# showing the plot
pyplot.show()
```

Output:

As you can see, the mean accuracy of the iteration with 50 is slightly higher than the others.

### Finding the optimum depth of trees in the Gradient boosting algorithm using Python

As we know that the Gradient boosting algorithm uses decision trees as weak learners and it is important to find the optimum depth of these weak learners. Let us create a function with various depth values of trees.

```
# building function for the model
def build_models():
# creating dic of models
GB_models = dict()
# specifying the depth of trees
for i in range(1,12):
# appending the models
GB_models[str(i)] = GradientBoostingClassifier(max_depth=i)
# returining the model
return GB_models
```

We will use the evaluation function that we have created in the above section. So let us now call the building model and evaluation functions to get the optimum depth of decision trees.

```
# calling the function
models = build_models()
# creating lists
results, names = list(), list()
# iterating through the models
for name, model in models.items():
# calling the evalution function
accuracy = evaluate_model(model, Input, Output)
# appending the results
results.append(accuracy)
names.append(name)
# printing - Gradient boosting algorithm using Python
print('---->Decision tree depth (%s)---Accuracy( %.5f)' % (name, mean(accuracy)))
```

Output:

As you can see, we get an optimum accuracy score when the depth of decision trees was 4.

Let us now also plot the same information using a box plot.

```
# plotting box plot of the
pyplot.boxplot(results, labels=names,showmeans=True)
# showing the plot
pyplot.show()
```

Output:

As you can see, the accuracy score is high when the depth of the decision trees is 4.

### Finding an optimum Learning rate in Gradient boosting algorithm using Python

The learning rate determines the step sizes in each of the iterations. It is also one of the important parameters that have a high impact on the results of the model. Let us now create a function that will build multiple models with different learning rates.

```
# creating function
def build_models():
# creating dic of models
GB_models = dict()
# different learning rates
for i in [0.0001, 0.001, 0.01, 0.1, 1.0]:
# key value
k = '%.4f' % i
# appending the models
GB_models[k] = GradientBoostingClassifier(learning_rate=i)
return GB_models
```

Now, we will call this function to create multiple models, and then we will call the evaluation function to evaluate the performance of each of the models with different learning rates.

```
# calling the function
models = build_models()
# creating the list
results, names = list(), list()
# for loop to iterate through the models
for name, model in models.items():
# calling the evaluting function
accuracy = evaluate_model(model, Input, Output)
# storing the accurcy
results.append(accuracy)
names.append(name)
# printing learning rate of Gradient boosting algorithm using Python
print('---->Learning Rate(%s)---Accuracy( %.5f)' % (name, mean(accuracy)))
```

As you can see, we get the highest accuracy score when the learning rate was 0.1. Let us also plot the mean accuracy on the box plot as well.

```
# plotting box plot of the
pyplot.boxplot(results, labels=names,showmeans=True)
# showing the plot
pyplot.show()
```

Output:

### Finding an optimum number of features in the Gradient boosting algorithm using Python

Each decision tree can have a different amount of attributes that are utilized to suit it.

Similar to modifying the sample size, changing the number of features gives more variance to the model, which may enhance performance. Let us create a function that will return multiple models with a different number of input features. In our case, we can use at most 4 features as our dataset has 4 input features ( you can adjust the range depending on your dataset).

```
# creating the function
def build_models():
# creating dic of models
GB_models = dict()
# explore features numbers from 1-4
for i in range(1,5):
# appending the models
GB_models[str(i)] = GradientBoostingClassifier(max_features=i)
# returining the models
return GB_models
```

Now, we will call this function and the evaluation function to get the optimum number of features.

```
# calling the function
models = build_models()
# creating the list
results, names = list(), list()
# for loop to iterate through the models
for name, model in models.items():
# calling the evaluting function
accuracy = evaluate_model(model, Input, Output)
# storing the accurcy
results.append(accuracy)
names.append(name)
# printing features of Gradient boosting algorithm using Python
print('---->Features(%s)---Accuracy( %.5f)' % (name, mean(accuracy)))
```

Output:

As you can see, the optimum number of features is 3, as we get high accuracy score for it. Let us also visualize the same information using a box plot.

### Finding the optimum number of samples in the Gradient boosting algorithm using Python

You can change how many samples were used to fit each tree. This indicates that a randomly chosen portion of the training dataset is used to fit each tree. Using fewer samples introduces more variance for each tree, although it can improve the overall performance of the model. Let us now create a function that returns multiple models with different sample values.

```
# creating the function
def build_models():
# dic of models
GB_models = dict()
# exploring different sample values
for i in arange(0.1, 1.1, 0.1):
# key value
k = '%.1f' % i
# appending the model
GB_models[k] = GradientBoostingClassifier(subsample=i)
return GB_models
```

Let us now call the above function and the evaluation function.

```
# calling the function
models = build_models()
# creating the list
results, names = list(), list()
# for loop to iterate through the models
for name, model in models.items():
# calling the evaluting function
accuracy = evaluate_model(model, Input, Output)
# storing the accurcy
results.append(accuracy)
names.append(name)
# printing sampes for Gradient boosting algorithm using Python
print('---->Samples(%s)---Accuracy( %.5f)' % (name, mean(accuracy)))
```

Output:

As you can see, we get different accuracy scores for each of the sample sizes but the optimum score is when the sample size is 0.9. Let us also visualize the same information using the box plot.

```
# plotting box plot of the
pyplot.boxplot(results, labels=names,showmeans=True)
# showing the plot
pyplot.show()
```

Output:

### GridSearchCV for Gradient boosting algorithm using Python

GridSearchCV is a process of hyperparameter tuning in which different values of the parameters are given to the model and the GridSearchCV finds the optimum combination and returns the best values.

Now, we will use the GridSearchCV to find the optimum values of parameters. The first step is to initialize the model and the different values for the parameters.

```
# defiing the model
model = GradientBoostingClassifier()
# creating a dict of grids
grid = dict()
# values for iteration
grid['n_estimators'] = [10, 50, 100, 500]
# values for learning rate
grid['learning_rate'] = [0.0001, 0.001, 0.01, 0.1, 1.0]
# values for the sampel
grid['subsample'] = [0.5, 0.7, 1.0]
# values for teh depth of tree
grid['max_depth'] = [3, 4, 5]
```

As you can see, we have defined the values for various parameters. Let us now apply the GridSearchCV method to find the optimum values for the above parameters.

```
# defining the cv
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3)
# applying the gridsearchcv method
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy')
# storing the values
grid_result = grid_search.fit(Input, Output)
# printing the best parameters of Gradient boosting algorithm using Python
print("Accuracy score: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
```

Output:

As you can see, we get an accuracy score of 96% with the above parameter values.

NOTE: You can get access to the source code and the dataset used in this article from my GitHub account. Please don’t forget to give me a star and follow.

## Summary

A gradient boosting algorithm is a type of boosting algorithm than can be used for both classification and regression problems. Similar to many other boosting algorithms, The gradient boosting algorithm builds a sequence of models where each model tried to overcome the errors in the previous model. In this article, we discussed the Gradient boosting algorithm using Python in detail by covering the working of the Gradient boosting algorithm on regression and classification datasets. Moreover, we discussed and implemented in Python the hyperparameter tuning of the Gradient boosting model.

## Frequently Asked questions

### Is Gradient boosting algorithm using Python similar to ada boosting algorithm?

No, they are not similar. Ada boosting algorithm works based on the loss function while on the other hand Gradient boosting algorithm works by reducing the error rate in each of the models.

### Is gradient boosting better than ada boosting?

There is no such answer as yes or no. The algorithm mainly depends on the type of dataset and your ultimate goal.

### How to apply Gradient boosting to my dataset?

Depending on your dataset ( classification or regression) initialize the model and then use the training dataset to train the model as we did above.

### Where do we use the Gradient boosting algorithm using Python?

We can use the Gradient boosting algorithm using Python on classification and regression problems.

### Is gradient boosting a good option for boosting?

Yes, it is a really smart way of boosting.

Thank you:)