Tune ML Models in No Time with Optuna

Thetechwriters 30 Nov, 2021

6 min read

This article was published as a part of the Data Science Blogathon

A comprehensive guide for finding the best hyper-parameter for your model efficiently.

Tuning hyperparameter is more efficient with Bayesian optimized algorithms compared to Brute-force algorithms.

You will see how to find the best hyperparameters for XGboost Regressor in this article.

Introduction

Optimizing ML models is a very challenging job. There are many tools in the market that makes our fine-tuning easier, In order to optimize our model, you need to make sure that our data is meaningful and our ML model fits on it perfectly. In this article, we will see how to fine-tune an Xgboost regressor in the easiest way possible.

In machine learning, a hyperparameter is a parameter whose value is used to control the learning process. ie-LR, regularization, gamma, and so on. Hyperparameters’ values can’t be estimated from the data. These are parameters that control the learning of the model on our data.

You can control the learning parameters in-order to make the model perfectly fit and learn from our data and give skilful predictions. Hyperparameter optimization of ML models gets complicated as our model gets complex. there are some popular ways to find optimized hyperparameters.

Grid Search: Grid Search trains the model on all possible hyperparameter combination and then compare using cross-validation.
Randomized Search: Randomized Search trains a model on a random hyperparameter combination and then compares using cross-validation.
Bayesian approach: it uses the Bayesian technique to model the search space and to reach an
optimized parameter. There are many handy tools designed for fast
hyperparameter optimization for complex deep learning and ML models like
HyperOpt, Optuna, SMAC, Spearmint, etc.

Optuna

Optuna is the SOTA algorithm for fine-tuning ML and deep learning models. It depends on the Bayesian fine-tuning technique. It prunes unpromising trials which don’t further improve our score and try only that combination that improves our score overall.

Salient Features of Optuna:

Efficiently search large spaces and prune unpromising trials for faster results.
Parallelize hyperparameter searches over multiple threads or processes without modifying code.
Automated search for optimal hyperparameters using Python conditionals, loops, and syntax.

Tuning Process using Optuna

Article Checkpoints:

Train Xgboost on the boston_housing dataset.
Explore hyperparameter tuning in detail.
Analyze the visualization for better and faster tuning.

Load the Data and Libraries

Importing necessary modules and loading the data into a data frame.

import sklearn
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
boston = load_boston()
df = pd.DataFrame(boston.data , columns = boston.feature_names)
df['target'] = boston.target
X = df.iloc[:,df.columns != 'target']
y = df.target

load_boston() returns the data in matrix form and after framing it into pandas data frame we separated X (features), y (target).

from sklearn.preprocessing import StandardScaler
se = StandardScaler()
X = se.fit_transform(X)

Scaling the dataset into one scale, helps the model to converge efficiently and split the dataset into train and test.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, y, test_size = 0.2, random_state = 12)

Before Tuning First Test the Base Model

Building a Vanilla (default) model before gives a scale for comparison,
and we can compare how much our model has improved after parameter
fine-tuning.

import xgboost as xgb
from sklearn.model_selection import cross_val_score
xg_reg = xgb.XGBRegressor()
scores = cross_val_score(xg_reg, X_train,y_train , scoring = 'neg_root_mean_squared_error',n_jobs = -1,cv = 10)
print(np.mean(scores), np.std(scores))
print(scores)

Before Tuning First Test the Base Model using Optuna — Source: Local

for some reason cross_val_score has “neg_root_mean_squared_error”, we can change this into positive by multiplying by -1.

cross_val_score Evaluate a score by cross-validation.
KFold creates a custom k cross-validation.
here scoring is “root_mean_squared_error” lower is better.
cv = 10 folds , n_jobs= -1 ( using all available CPUs).

Define Function that Takes Hyper-Parameters and Return Scores

We often calculate rmse in the regressor model and AUC scores for the classifier model.

def return_score(param):
  model = xgb.XGBRegressor(**param)  
  rmse = -np.mean(model_selection.cross_val_score(model,X_train[:1000],y_train[:10000], cv = 4, n_jobs =-1,scoring='neg_root_mean_squared_error'))
  return rmse

return_score() takes parameters as keyword argument and returns the cross_validated value for -(neg_root_mean_squared_error), in order to save time, we are taking only 1000 samples.

Import Optuna

This is the step where actual Optuna comes into the Picture.

import optuna
from optuna import Trial, visualization
from optuna.samplers import TPESampler

Samplers class provides us with a class to define our hyper-parameter space.
Visualization is used to plot the tuning results.
A trial is a method of determining the effectiveness of an objective function.

Creating Hyper-Parameters Space

Time to create our main objective function, we put all the possible
hyper-parameter ranges and the function takes a trial object and should
return a score value.

We have various ways to define the ranges of possible hyper-parameter values:-

trial.suggest_uniform(‘reg_lambda’,0,2) = uniformly distributed numbers between 0,2
trial.suggest_int(‘ParamName’, 2 , 5) = [2,3,4,5]
trial.suggest_loguniform(‘learning_rate’,0.05,0.5) = log distribution between [0.05,.5]

def objective(trial):
  param = {
                "n_estimators" : trial.suggest_int('n_estimators', 0, 500),
                'max_depth':trial.suggest_int('max_depth', 3, 5),
                'reg_alpha':trial.suggest_uniform('reg_alpha',0,6),
                'reg_lambda':trial.suggest_uniform('reg_lambda',0,2),
                'min_child_weight':trial.suggest_int('min_child_weight',0,5),
                'gamma':trial.suggest_uniform('gamma', 0, 4),
                'learning_rate':trial.suggest_loguniform('learning_rate',0.05,0.5),
                'colsample_bytree':trial.suggest_uniform('colsample_bytree',0.4,0.9),
                'subsample':trial.suggest_uniform('subsample',0.4,0.9),
                'nthread' : -1
            }
  return(return_score(param)) # this will return the rmse score

For A detailed explanation on various parameters refer to this link.

Start Tuning

In order to start our searching of hyperparameters, we need to first create an optuna– study object, which holds all the information of hyper-parameters tried under that study.

The study creates an object which holds all the parameters, history about that particular optimization.

study1 = optuna.create_study(direction='minimize',sampler=TPESampler())
study1.optimize(objective, n_trials= 550,show_progress_bar = True)

direction = ‘minimize’, since we want to minimize your rmse.
show_progress_bar = True, gives a nice-looking progress bar.
n_trails = 1050, is the same as the epoch.
sampler = TPESample(), Bayesian Sampling Technique

Let’s Run the study !!

After 1050 trials we got our best hyper-parameters as shown below

Seeing the hyper-parameter safter1050 trials using optuna — Source: Local

In the beginning, we had rmse 2.98 and after 500 trials it became 2.54.

How to get tuned hyper-parameters?

The study object holds all the information of the particular search.

Calling study.best_params returns the optimized parameters in the form of a dictionary.

study1.best_params

Visualization of Tuning

Visualizing search history gives us a better idea of the effect of
hyper-parameters on our model. we can also further narrow down the
search by using narrower parameters space.

optuna.visualization.plot_slice(study1)

Visualization of Tuning with optuna — Source: Local

These are the values of various hyper-parameters and their impact on our objective value (rmse in this case)

from the above graph, the minimum rmse found when max_depth was 3, learning_rate was .054, n_estimators = 340, and so on…

We can further deep dive and try to tune using narrower range of hyper-parameters values by taking help of visualizations.

Compare the Tuned Model and Base Model

Let’s see how much our model has improved.

params = {}
print(f"without tuning {return_score(params)}")
print(f"with tuning {return_score(study1.best_params)}")

For the default model, we are not specifying any hyperparameters, the model will be using the default hyper-parameters, for the Optimized model we are passing study1.best_params which returns the optimized h-parameters in a form of a dictionary.

Comparing the Tuned Model and Base Model — Source: Local

Perfect !! as you can see after fine-tuning we have got a lower rmse by 30 units. It does not end, we can fine-tune further-more by narrowing the hyperparameters ranges.

Conclusion

In this article we have seen how to optimize ML models, we saw how to find optimized parameters, but this is not the end. you can improve it further by narrowing down the learning parameter ranges.

In the next article, we will learn fine-Tuning of Xgboost Classifier using HyperOpt

Download full Code here.

Thanks for reading the article, please share if you liked this article.

Article From: Abhishek Jaiswal, Reach out to me on LinkedIn.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.