Global Model Interpretability Techniques for Black Box Models

Aman Gupta 12 Oct, 2020 • 5 min read

This article was published as a part of the Data Science Blogathon.

Introduction

There is no mathematical equation for model interpretability.

‘Interpretability is the degree to which a human can consistently predict the model’s result’

An interpretable model that makes sense is far more trustworthy than an opaque one. There are two reasons for this. First, the business users do not make million-dollar decisions just because a computer said so. Second, the data scientists need interpretable models to ensure that no errors were made in data collection or modeling, which would otherwise cause the model to work well in evaluation, but fail miserably in production.

The importance of interpretability is subjective to the user of the model. The accuracy of a model may be more important than the interpretability of the model in cases where the model is used to power a solution. The data product is communicating with an entity or through an interface that eliminates the need for interpretability. However, when humans are the users of the model, interpretability takes a front seat.

Interpretability is important in fields where the margin of error is low. In the field of Finance, which is empirically a social science, and where there is no logical reason for tomorrow to be similar to any day in the past, it is essential that the users need to understand the model. For example, if we consider the probability of default model, just classifying a customer as ‘good’ or ‘bad’ is not sufficient. The loan approving authorities need a definite scorecard to justify the basis for this classification. The PD model should make sense in regard to the variables used.

Interpretability can be classified as ‘Global’ and ‘Local’.

 

Global Interpretability:

This level of interpretability is about understanding how the model makes decisions, based on a holistic view of its features and each of the learned components such as weights, other parameters, and structures. Global model interpretability helps to understand the distribution of your target outcome based on the features. For a PD model, it helps in understanding the basis for the classification of ‘good’ or ‘bad’.

 

Local Interpretability:

This level of interpretability is about understanding a single prediction of a model. Say, if you want to know why a particular customer is classified as ‘bad’, the local interpretability of the model is imperative.

Machine learning algorithms improve prediction accuracy over traditional statistical models. These algorithms are often classified as black-box models. In this article, I will discuss some of the global techniques, which help us to interpret these black-box models.

 

Implementation & Explanation

I have used the Default of credit card clients dataset from UCI Machine Learning Library for the explanation. The goal was to classify a customer for a default next month (Yes = 1, No = 0).

After pre-processing the data, I have split the data into train and test with the test size as 30%. The data were standardized using StandardScaler() from sklearn.preprocessing. Three black-box models were used to classify the clients – Random Forest, XGBoost, and Support Vector Classifier. I achieved the following evaluation results:

Model Interpretability

All three models give us more than 80 percent accuracy. Let us now try the Global Interpretation methods for determining feature importance, feature effects, and feature interaction.

 

Feature Importance: Permutation Importance

Permutation feature importance measures the increase in the prediction error of the model after we permuted the feature’s values, which breaks the relationship between the feature and the true outcome. The permutation feature importance relies on measurements of model error. We, therefore, use the test data here.

Model Interpretability

Model Interpretability

The values towards the top are the most important features and those towards the bottom matter least. The first number in each row shows how much model performance decreased with a random shuffling (in this case, using “accuracy” as the performance metric). The number after the ± measures how performance varied from one-reshuffling to the next.

The negative values indicate that the predictions on the shuffled (or noisy) data happened to be more accurate than the real data. This happens when the feature didn’t matter (should have had importance close to 0), but random chance caused the predictions on shuffled data to be more accurate.

 

Feature Effects: Partial Dependence Plots (PDP’s)

While feature importance shows what variables most affect predictions, partial dependence plots show how a feature affects predictions. Like permutation importance, partial dependence plots are calculated after a model has been fit. The model is fit on real data that has not been artificially manipulated in any way.

We observe a non-analogous behavior for Random Forest versus XGBoost. The main drawback of PDP’s is that they ignore correlations among features. Another technique is ALE which deals with correlations among features.

 

Feature Effects: Accumulated Local Effects (ALE)

Accumulated local effects describe how features influence the prediction of a machine learning model on average. ALE plots are a faster and unbiased alternative to partial dependence plots (PDP’s). PDP’s suffer from a stringent assumption: features have to be uncorrelated. In real-world scenarios, features are often correlated, whether because some are directly computed from others, or because observed phenomena produce correlated distributions. Accumulated Local Effects (or ALE) plots first proposed by ‘Apley and Zhu (2016)’ alleviate this issue reasonably by using actual conditional marginal distributions instead of considering each marginal distribution of features. This is more reliable when handling (even strongly) correlated variables.

In the python environment, there is no good and stable library for ALE. I’ve only found one alepython, which is still very much in development. It’s also not developed for categorical features.

 

Feature Interaction: Friedman’s H-statistic

When features interact with each other in a prediction model, the prediction cannot be expressed as the sum of the feature effects, because the effect of one feature depends on the value of the other feature. We are going to deal with two cases: A two-way interaction measure that tells us whether and to what extent two features in the model interact with each other. Friedman and Popescu also propose a test statistic to evaluate whether the H-statistic differs significantly from zero. The null hypothesis is the absence of interaction.

 

Conclusion

A trend in machine learning is the automation of model training. That includes automated engineering and selection of features, automated hyperparameter optimization, comparison of different models, and ensembling or stacking of the models. Model Interpretability will aid in this process and will eventually be automated itself.

This paper has been dedicated to ‘Sir Christoph Molnar’. The motivation was his holistic work ‘A Guide for Making Black Box Models Explainable’.

Aman Gupta 12 Oct 2020

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Machine Learning
Become a full stack data scientist