If things don’t go your way in predictive modeling, use XGboost. XGBoost algorithm has become the ultimate weapon of many data scientists. It’s a highly sophisticated algorithm, powerful enough to deal with all sorts of irregularities of data. It uses parallel computation in which multiple decision trees are trained in parallel to find the final prediction. This article is best suited to people who are new to XGBoost. We’ll learn the art of XGBoost parameters tuning and XGBoost hyperparameter tuning. Also, we’ll practice this algorithm using a training data set in Python. With that you will get insights about the xgbclassifier parameters, and xgboost hyperparamters so in this article we have cover all the topic related xgbclassifier parameters in python.
XGBoost is a popular gradient boosting algorithm known for its high performance and efficiency in machine learning tasks. Its extensive set of parameters is useful for those familiar with Gradient Boosting Machine (GBM). A comprehensive guide to parameter tuning in GBM in Python is recommended, as it enhances understanding of boosting techniques and prepares for a more nuanced comprehension of naturally available XGBoost parameters.
I’ve always admired the boosting capabilities that the XGBoost parameters algorithm infuses into a predictive model. When I explored more about its performance and the science behind its high accuracy, I discovered many advantages, including the flexibility and power of its parameters :
I hope now you understand the sheer power XGBoost algorithm. Note that these are the points that I could muster. Do you know a few more? Feel free to drop a comment below, and I will update the list.
The overall parameters have been divided into 3 categories by XGBoost authors:
Must Read: Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python
These define the overall functionality of XGBoost.
There are 2 more parameters that are set automatically by XGBoost, and you need not worry about them. Let’s move on to Booster parameters.
Though there are 2 types of boosters, I’ll consider only tree booster here because it always outperforms the linear booster, and thus the latter is rarely used.
| Parameter | Description | Typical Values |
|---|---|---|
| eta | Analogous to the learning rate in GBM. | 0.01-0.2 |
| min_child_weight | Defines the minimum sum of weights of observations required in a child. | Tuned using CV |
| max_depth | The maximum depth of a tree. Used to control over-fitting. | 3-10 |
| max_leaf_nodes | The maximum number of terminal nodes or leaves in a tree. | |
| gamma | Specifies the minimum loss reduction required to make a split. | Tuned depending on loss function |
| max_delta_step | Allows each tree’s weight estimation to be constrained. | Usually not needed, explore if necessary |
| subsample | Denotes the fraction of observations to be random samples for each tree. | 0.5-1 |
| colsample_bytree | Denotes the fraction of columns to be random samples for each tree. | 0.5-1 |
| colsample_bylevel | Denotes the subsample ratio of columns for each split in each level. | Usually not used |
| lambda | L2 regularization term on weights (analogous to Ridge regression). | Explore for reducing overfitting |
| alpha | L1 regularization term on weight (analogous to Lasso regression). | Used for high dimensionality |
| scale_pos_weight | Used in case of high-class imbalance for faster convergence. | > 0 |
These parameters are used to define the optimization objective and the metric to be calculated at each step.
If you’ve been using Scikit-Learn till now, these parameter names might not look familiar. The good news is that the xgboost module in python has an sklearn wrapper called XGBClassifier parameters. It uses the sklearn style naming convention. The parameters names that will change are:
You must be wondering why we have defined everything except something similar to the “n_estimators” parameter in GBM. Well, this exists as a parameter in XGBClassifier. However, it has to be passed as “num_boosting_rounds” while calling the fit function in the standard xgboost implementation.
We will take the data set from Data Hackathon 3. x AV hackathon, as taken in the GBM article. The details of the problem can be found on the following page. You can download the data set from here. I have performed the following steps:
city variable dropped because of too many categories.DOB converted to Age | DOB dropped.EMI_Loan_Submitted_Missing created, which is 1 if EMI_Loan_Submitted was missing; else 0 | Original variable EMI_Loan_Submitted dropped.EmployerName dropped because of too many categories.Existing_EMI imputed with 0 (median) since only 111 values were missing.Interest_Rate_Missing created, which is 1 if Interest_Rate was missing; else 0 | Original variable Interest_Rate dropped.Lead_Creation_Date dropped because it made a little intuitive impact on the outcome.Loan_Amount_Applied, Loan_Tenure_Applied imputed with median values.Loan_Amount_Submitted_Missing created, which is 1 if Loan_Amount_Submitted was missing; else 0 | Original variable Loan_Amount_Submitted dropped.Loan_Tenure_Submitted_Missing created, which is 1 if Loan_Tenure_Submitted was missing; else 0 | Original variable Loan_Tenure_Submitted dropped.LoggedIn, Salary_Account dropped.Processing_Fee_Missing created, which is 1 if Processing_Fee was missing; else 0 | Original variable Processing_Fee dropped.For those who have the original data from the competition, you can check out these steps from the data_preparation iPython notebook in the repository.
Let’s start by importing the required libraries and loading the data.
#Import libraries:
import pandas as pd
import numpy as np
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import metrics #Additional scklearn functions
from sklearn.model_selection import GridSearchCV
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12, 4
train = pd.read_csv('Train_Modified.csv', encoding='ISO-8859–1')
target = 'Disbursed'
IDcol = 'ID'
print("There will be no output for this particular block of code")
Note: I have imported 2 forms of XGBoost:
Before proceeding further, let’s define a function that will help us create XGBoost models and perform cross-validation. The best part is that you can take this function as it is and use it later for your own models.
def modelfit(alg, dtrain, predictors,useTrainCV=True, cv_folds=5, early_stopping_rounds=50):
if useTrainCV:
xgb_param = alg.get_xgb_params()
xgtrain = xgb.DMatrix(dtrain[predictors].values, label=dtrain[target].values)
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], nfold=cv_folds,
metrics='auc', early_stopping_rounds=early_stopping_rounds, show_progress=False)
alg.set_params(n_estimators=cvresult.shape[0])
#Fit the algorithm on the data
alg.fit(dtrain[predictors], dtrain['Disbursed'],eval_metric='auc')
#Predict training set:
dtrain_predictions = alg.predict(dtrain[predictors])
dtrain_predprob = alg.predict_proba(dtrain[predictors])[:,1]
#Print model report:
print "\nModel Report"
print "Accuracy : %.4g" % metrics.accuracy_score(dtrain['Disbursed'].values, dtrain_predictions)
print "AUC Score (Train): %f" % metrics.roc_auc_score(dtrain['Disbursed'], dtrain_predprob)
feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')
This code is slightly different from what I used for GBM. The focus of this article is to cover the concepts and not coding. Please feel free to drop a note in the comments if you find any challenges in understanding any part of it. Note that xgboost’s sklearn wrapper doesn’t have a “feature_importances” metric but a get_fscore function, which does the same job.
We will use an approach similar to that of GBM here. The various steps to be performed are:
Let us look at a more detailed step-by-step approach.
In order to decide on boosting parameters, we need to set some initial values of other parameters. Let’s take the following values:
Please note that all the above are just initial estimates and will be tuned later. Let’s take the default learning rate of 0.1 here and check the optimum number of trees using the cv function of xgboost. The function defined above will do it for us.
#Choose all predictors except target & IDcols
predictors = [x for x in train.columns if x not in [target, IDcol]]
xgb1 = XGBClassifier(
learning_rate =0.1,
n_estimators=1000,
max_depth=5,
min_child_weight=1,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= 'binary:logistic',
nthread=4,
scale_pos_weight=1,
seed=27)
modelfit(xgb1, train, predictors)
As you can see that here we got 140 as the optimal estimator for a 0.1 learning rate. Note that this value might be too high for you depending on your system’s power. In that case, you can increase the learning rate and re-run the command to get the reduced number of estimators.
Note: You will see the test AUC as “AUC Score (Test)” in the outputs here. But this would not appear if you try to run the command on your system as the data is not made public. It’s provided here just for reference. The part of the code which generates this output has been removed here.
We tune these first as they will have the highest impact on the model outcome. To start with, let’s set wider ranges, and then we will perform another iteration for smaller ranges.
Important Note: I’ll be doing some heavy-duty grid searches in this section, which can take 15-30 mins or even more time to run, depending on your system. You can vary the number of values you are testing based on what your system can handle.
param_test1 = {
'max_depth':range(3,10,2),
'min_child_weight':range(1,6,2)
}
gsearch1 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=140, max_depth=5,
min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27),
param_grid = param_test1, scoring='roc_auc',n_jobs=4,iid=False, cv=5)
gsearch1.fit(train[predictors],train[target])
gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_

Here, we have run 12 combinations with wider intervals between values. The ideal values are 5 for max_depth and 5 for min_child_weight. Let’s go one step deeper and look for optimum values. We’ll search for values 1 above and below the optimum values because we took an interval of two.
param_test2 = {
'max_depth':[4,5,6],
'min_child_weight':[4,5,6]
}
gsearch2 = GridSearchCV(estimator = XGBClassifier( learning_rate=0.1, n_estimators=140, max_depth=5,
min_child_weight=2, gamma=0, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27),
param_grid = param_test2, scoring='roc_auc',n_jobs=4,iid=False, cv=5)
gsearch2.fit(train[predictors],train[target])
gsearch2.grid_scores_, gsearch2.best_params_, gsearch2.best_score_
Here, we get the optimum values as 4 for max_depth and 6 for min_child_weight. Also, we can see the CV score increasing slightly. Note that as the model performance increases, it becomes exponentially difficult to achieve even marginal gains in performance. You would have noticed that here we got 6 as the optimum value for min_child_weight, but we haven’t tried values more than 6. We can do that as follow:
param_test2b = {
'min_child_weight':[6,8,10,12]
}
gsearch2b = GridSearchCV(estimator = XGBClassifier(learning_rate=0.1, n_estimators=140, max_depth=4,
min_child_weight=2, gamma=0, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27),
param_grid = param_test2b, scoring='roc_auc',n_jobs=4,iid=False, cv=5)
gsearch2b.fit(train[predictors],train[target])
modelfit(gsearch3.best_estimator_, train, predictors)
gsearch2b.grid_scores_, gsearch2b.best_params_, gsearch2b.best_score_

We see 6 as the optimal value.
Now let’s tune the gamma value using the parameters already tuned above. Gamma can take various values, but I’ll check for 5 values here. You can go into more precise values.
param_test3 = {
'gamma':[i/10.0 for i in range(0,5)]
}
gsearch3 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=140, max_depth=4,
min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27),
param_grid = param_test3, scoring='roc_auc',n_jobs=4,iid=False, cv=5)
gsearch3.fit(train[predictors],train[target])
gsearch3.grid_scores_, gsearch3.best_params_, gsearch3.best_score_

This shows that our original value of gamma, i.e., 0 is the optimum one. Before proceeding, a good idea would be to re-calibrate the number of boosting rounds for the updated parameters.
xgb2 = XGBClassifier(
learning_rate =0.1,
n_estimators=1000,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= 'binary:logistic',
nthread=4,
scale_pos_weight=1,
seed=27)
modelfit(xgb2, train, predictors)
Here, We can see the score improvement. so, the final parameters are:
The next step would be to try different subsample and colsample_bytree values. Let’s do this in 2 stages as well and take values 0.6,0.7,0.8,0.9 for both to start with.
param_test4 = {
'subsample':[i/10.0 for i in range(6,10)],
'colsample_bytree':[i/10.0 for i in range(6,10)]
}
gsearch4 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,
min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27),
param_grid = param_test4, scoring='roc_auc',n_jobs=4,iid=False, cv=5)
gsearch4.fit(train[predictors],train[target])
gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_

Here, we found 0.8 as the optimum value for both subsample and colsample_bytree. Now we should try values in 0.05 intervals around these.
param_test5 = {
'subsample':[i/100.0 for i in range(75,90,5)],
'colsample_bytree':[i/100.0 for i in range(75,90,5)]
}
gsearch5 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,
min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27),
param_grid = param_test5, scoring='roc_auc',n_jobs=4,iid=False, cv=5)
gsearch5.fit(train[predictors],train[target])

Again we got the same values as before. Thus the optimum values are:
The next step is to apply regularization to reduce overfitting. However, many people don’t use this parameter much as gamma provides a substantial way of controlling complexity. But we should always try it. I’ll tune the ‘reg_alpha’ value here and leave it up to you to try different values of reg_lambda.
param_test6 = {
'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100]
}
gsearch6 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,
min_child_weight=6, gamma=0.1, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27),
param_grid = param_test6, scoring='roc_auc',n_jobs=4,iid=False, cv=5)
gsearch6.fit(train[predictors],train[target])
gsearch6.grid_scores_, gsearch6.best_params_, gsearch6.best_score_

We can see that the CV score is less than in the previous case. But the values tried are very widespread. We should try values closer to the optimum here (0.01) to see if we get something better.
param_test7 = {
'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05]
}
gsearch7 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,
min_child_weight=6, gamma=0.1, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27),
param_grid = param_test7, scoring='roc_auc',n_jobs=4,iid=False, cv=5)
gsearch7.fit(train[predictors],train[target])
gsearch7.grid_scores_, gsearch7.best_params_, gsearch7.best_score_

You can see that we got a better CV. Now we can apply this regularization in the model and look at the impact:
xgb3 = XGBClassifier(
learning_rate =0.1,
n_estimators=1000,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.005,
objective= 'binary:logistic',
nthread=4,
scale_pos_weight=1,
seed=27)
modelfit(xgb3, train, predictors)
Again we can see a slight improvement in the score.
Lastly, we should lower the learning rate and add more trees. Let’s use the cv function of XGBoost classifier to do the job again.
xgb4 = XGBClassifier(
learning_rate =0.01,
n_estimators=5000,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.005,
objective= 'binary:logistic',
nthread=4,
scale_pos_weight=1,
seed=27)
modelfit(xgb4, train, predictors)
Here is a live coding window where you can try different parameters and test the results.
import pandas as pd
import numpy as np
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn.model_selection import cross_val_score
from sklearn import metrics
from sklearn.model_selection import GridSearchCV
train = pd.read_csv('train_modified_sample.csv')
print(train.head())
print(train['Disbursed'].value_counts())
target='Disbursed'
IDcol = 'ID'
predictors = [x for x in train.columns if x not in [target, IDcol]]
param_test = {
'reg_alpha':[1e-5, 1e-2, 0.1, 100]
}
gsearch = GridSearchCV(estimator =
XGBClassifier(learning_rate =0.1,
n_estimators=10,
max_depth=5,
min_child_weight=2,
gamma=0.1,
subsample=0.85,
colsample_bytree=0.8,
objective= 'binary:logistic',
nthread=4,
scale_pos_weight=1,
seed=27),
param_grid = param_test,
scoring='roc_auc',
n_jobs=4,
iid=False,
cv=2,
verbose=10)
gsearch.fit(train[predictors],train[target])
print('Best Grid Search Parameters :',gsearch.best_params_)
print('Best Grid Search Score : ',gsearch.best_score_)
Now we can see a significant boost in performance, and the effect of parameter tuning is clearer.
As we come to an end, I would like to share 2 key thoughts:
You can also download the iPython notebook with all these model codes from my GitHub account. For codes in R, you can refer to this article.
This tutorial was based on developing an XGBoost machine learning model end-to-end. We started by discussing why XGBoost Parameters has superior performance over GBM, which was followed by a detailed discussion of the various parameters involved. We also defined a generic function that you can reuse for making models. Finally, we discussed the general approach towards tackling a problem with XGBoost and also worked out the AV Data Hackathon 3.x problem through that approach.
Hope you like the article! The XGBoost algorithm is a strong tool used in machine learning. The XGBoost classifier helps improve predictions by using an XGBoost model. To understand how XGBoost works, it’s important to know its gradient boosting method, which is explained by how well it manages data.
If you’re looking to take your machine learning skills to the next level, consider enrolling in our GenAI Pinnacle Plus Program.
A. The choice of XGBoost parameters depends on the specific task. Commonly adjusted parameters include learning rate (eta), maximum tree depth (max_depth), and minimum child weight (min_child_weight).
A. The ‘n_estimators’ parameter in XGBoost determines the number of boosting rounds or trees to build. It directly impacts the model’s complexity and should be tuned for optimal performance.
A. In XGBoost, a hyperparameter is a preset setting that isn’t learned from the data but must be configured before training. Examples include the learning rate, tree depth, and regularization parameters.
A. XGBoost provides L1 and L2 regularization terms using the ‘alpha’ and ‘lambda’ parameters, respectively. These parameters prevent overfitting by adding penalty terms to the objective function during training.
Please provide the R code as well. Thnkx
It is a great article , but if you could provide codes in R , it would be more beneficial to us. Thanks
Nowadays less people are using R already. Python is the way to go
Hi guys, Thanks for reaching out! I've given a link to an article (http://www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/) in my above article. This has some R codes for implementing XGBoost in R. This won't replicate the results I found here but will definitely help you. Also, I don't use R much but think it should not be very difficult for someone to code it in R. I encourage you to give it a try and share the code as well if you wish :D. In the meanwhile, I'll also try to get someone to write R codes. I'll get back to you if I find something. Cheers, Aarshay