Understanding parameters of LightGBM to apply hyperparameter tuning

In this short article, we will understand the parameters of LightGBM so that we can easily apply hyperparameter tuning. LightGBM is a supervised boosting algorithm, that was developed by the Mircosoft company and was made publically available in 2017. It is an open-source module that can be used as a boosting model. It is very powerful, fast, and accurate as compared to many other boosting algorithms. In this article, we will go through some of the features of the LightGBM that make it fast and powerful. We will also implement LightGBM on regression and classification datasets and evaluate the models. Moreover, we will also discuss the hyperparameter tuning of LighGBM as well.

The lightGBM is mostly more powerful than XGBoostGradient boost, and Ada boost on many testing data samples. Also, make sure you have strong knowledge of decision trees and random forest algorithms, as LightGBM uses random trees as weak learners.

Understanding the parameters of LightGBM

The LightGBM is short for Light Gradient Boosting Machine. It is a supervised boosting algorithm that works in a similar way as the XGBoost algorithm does but with some advanced features that makes it more powerful and fast.

The lightGBM uses various parameters and mostly it gives the best result for its default values. However, sometimes, we need to understand the parameters of LightGBM to apply hyperparameter tuning. Learn how we can apply hyperparameter tuning on LightGBM.


Gradient boosting methods

In the LightGBM algorithm, we have the option to use different gradient boosting methods which include GBDT, DART, and GOSS. These are known as boosting parameters of the LightGBM. So, if you are struggling to get accurate results, you can switch to any of these boosting methods.

GBDT (Gradient boosted decision trees) in LightGBM

The GBDT boosting method is the traditional gradient boosting method and many well-known boosting algorithms, for example, XGBoot uses it.

Nowadays, GBDT is widely used because of its efficiency and accuracy. It is also known as an ensemble model of decision trees which means it is based on three important principles.

  • It works by combining weak learners which are mostly decision trees
  • Gradient optimization
  • Boosting techniques

In other words, in the GBDT method, we will have a lot of decision trees which are known as weak learners. All those trees combine to reduce the residuals and give us accurate results.

Besides its accuracy, one of the drawbacks of this method is that it takes a lot of time and memory when it comes to the best split points of decision trees as compared to other methods.

Dart Gradient boosting method

The Dart gradient boosting method uses dropout to improve the model regularization and deal with some other less-obvious problems.

Namely, GDBT suffers from over-specialization, which means trees added at later iterations tend to impact the prediction of only a few instances and make a negligible contribution toward the remaining instances. Adding dropout makes it more difficult for the trees at later iterations to specialize on those few samples and hence improves the performance.

This method provides better accuracy as compared to other ones.

LGBM GOSS – Gradient-based one-side sampling

Gradient-based One Side Sampling (GOSS) is used for sampling the dataset in LightGBM. GOSS weights data points with larger gradients higher while calculating the gain. In this method, instances that have not been used well for training contribute more.

On large datasets, the normal GBDT is trustworthy but not quick enough. Goss recommends a sampling technique based on the gradient, as a result, to avoid exploring the entire search space. We are aware that for each data instance, a small gradient indicates that there are no concerns and that the data is well-trained, and a large gradient indicates that the data needs to be retrained. As a result, this data instance has two sides, large and tiny gradients. As a result, goss maintains all data with a large gradient and performs one-side sampling on data with a tiny gradient.

The advantage of this method is that it converges faster than the other two methods but overfitting can occur on a small dataset.


Regularization refers to techniques that are used to calibrate machine learning models in order to minimize the adjusted loss function and prevent overfitting or underfitting.

If your model is facing the problem of overfitting, you can change the following parameter values:


Lambda_l1 controls the l1/l2 along with min_gain_to_split to combat overfitting.


This parameter is used to control the complexity of the model. You can use this parameter to set the maximum number of leaves each weak learner will have.


With this parameter value, we can specify the percentage of rows used per tree-building iteration which means the model will select some random rows to fit weak learners. The good practice is to have a small value in the beginning and then later change it accordingly.


eature_fraction specifies the percentage of features to sample when training each tree. So, it also takes a value between (0, 1). For example, if you set it to 0.7, then LightGBM will select 70% of the features before training each tree. This can be used to speed up training and can also help to deal with overfitting problems.


The max_depth controls the maximum depth of each of the trained trees. You need to be very specific here because a large value of max_depth can cause overfitting of the model.


If you define max_bin 255 that means we can have a maximum of 255 unique values per feature. Then Small max_bin causes faster speed and a large value improves accuracy.

In other words, binning is a technique for representing the data in a histogram. Lightgbm uses a histogram-based algorithm to find the optimal split point while creating a weak learner. Therefore, each continuous numeric feature (e.g. number of views for a video) should be split into discrete bins.

Training parameters of LightGBM

Sometimes while training the LightGBM model we may face problems of time-consuming, or to deal with categorical values, having unbalanced data, etc. In such cases, we can use the following parameters to rescue us.


A number of iterations specify the number of iterations to build the model. The more trees we will build, the more accurate the model will be but it will take more time and might get overfitted as well.

So, it is a better idea to start with a small number of iterations and then increase the iterations. It is highly recommended to have a small learning rate with large number of iterations.


Early stopping is a method that allows you to specify an arbitrarily large number of training epochs and stop training once the model performance stops improving on a hold-out validation dataset. In other words, This parameter will stop training if the validation metric is not improving after the last early stopping round. That should be defined in pairs with a number of iterations. If you set it too large you increase the chance of overfitting (but your model can be better).


LightGBM allows us to specify directly categorical features and handles those internally in a smart way. We have to use categorical_features to specify the categorical features. Categorical features must be encoded as non-negative integers. So when using LighGBM, we don’t need to be worried about the categorical values because this feature handles them automatically.


Sometimes, we face challenges that how we can deal with unbalanced datasets. The is_unbalance in LightGBM tries automatically balance the weight of the dominated labels.

You can also check these 7 methods to balance an unbalanced dataset in Machine learning.


In this short article, we learned how about the parameters of LightGBM. We can use these parameters to tune the LightGBM.

Leave a Comment