Plotting Decision Surface for Classification Machine Learning Algorithms

guest_blog 09 Nov, 2020 • 5 min read

Overview

Decision Surface

A Decision Surface could be a powerful tool to visualize and understand how a model arrives at its predictions. It is a diagnostic tool to identify the strengths and weaknesses of a model. It also provides a “quick & dirty” way to identify areas where the model under-fits/over-fits the data.

This article describes how you can write your own function to plot a decision surface for any classification algorithm using Python. You will also learn how to create a simple synthetic dataset using built-in libraries and test your function. You can customize the function to suit your needs.

Let’s dive right into it.

Process-

Step 1:

Import libraries for working with arrays and visualization

Decision surface : Import Libraries

Step 2:

Let’s create a function that takes in the feature matrix and target vector as input and creates a decision surface.

Decision surface : Create a function

Step 3:

Then, draw a scatter plot of the original training data.

Decision surface : Scatter Plot

Now, determine the minimum and maximum of each feature.

Decision surface : Min and max feature

The meshgrid function of the numpy module can then be used to generate a grid of values (500 in the example shown) for each feature using minimum and maximum values of the features.

Decision surface : Meshgrid

Step 4:

The grid for each feature will be a 500 x 500 array. These arrays need to be flattened and then converted to a one-dimensional vector. The flatten reshape functions of the numpy module can be used for this purpose.

Decision surface : y_pred reshape

y_pred reshape

Decision surface : Shape

The feature vectors should be stacked to represent the feature matrix similar to the original feature matrix using the hstack function of the numpy module.

Decision surface : hpstack

Step 4:

Let’s fit the model on the original dataset and make predictions using the grid of features we created.

Decision surface : Model fit

The prediction vector has to be reshaped into the same shape as the feature grid to be able to draw a contour plot.

Decision surface : y_pred reshape

Step 5:

Now, we draw a contour plot using pyplot’s built-in contourf function.

Image for post

That’s it for creating the function to draw a decision surface for any classification algorithm. It is ready to be tested on a synthetic dataset. Note that, it is always a good idea to test our custom-functions on a hypothetical dataset.

The make_blobs function of the sklearn library is the most commonly used function for this purpose. It allows us to specify the number of samples, number of features, number of classes, and several other parameters. In our example, we specify 1000 samples, 2 features, and 3 classes for the sake of simplicity.

Image for post

 

Step 6:

Import a few different classification machine learning libraries for testing the above-written function.

Image for post

Step 7:

Create a dictionary of classification machine learning models

Image for post

Step 8:

Plot a decision surface for the models in the dictionary using the above-written function.

Image for post

The following are the decision surfaces for the different classification, out-of-the-box machine learning algorithms.

Image for post

Decision Tree

Image for post

Logistic Regression

Image for post

Random Forest

Image for post

Gradient Boosting

Image for post

Support Vector

Image for post

K-Nearest Neighbor

This function can be used with any model that has the .fit() and .predict() methods found in most sklearn algorithms.

Summary

  1. How to create a function for plotting a decision surface for classification machine learning algorithms.
  2. How to create a hypothetical dataset.
  3. How to test decision surface plotting function on the hypothetical dataset and derive insights into the decision making process for the machine learning model.

Further Improvements

  1. Plot a decision surface for a more complex algorithm like neural networks and iterate through several pairs of features.

 

About the Author

Author

I am currently a Mechanical Engineer aspiring to be a data scientist. I have a B.E, Mechanical Engineering from GCT, Coimbatore, India, and M.S Mechanical Engineering from Texas A&M University, College Station, TX, USA. I have been working as a Mechanical Engineer for HI-Q Environmental Products Company for the past 17 years.

guest_blog 09 Nov 2020

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Deep Learning
Become a full stack data scientist