IBM Open Sources Comprehensive Python Toolkit for Detecting & Fighting Bias (30 Metrics, 9 Algorithms)

Pranav Dar 19 Sep, 2018 • 2 min read

Overview

  • IBM has open sourced a toolkit to deal with bias in datasets and algorithms
  • The toolkit contains 30 fairness metrics and 9 state-of-the-art algorithms
  • The Python code and detailed documentations, along with detailed installation instructions, are available on GitHub

 

Introduction

Bias is a serious issue in machine learning models. Quite often we tend to skim through the data in our eagerness to build the model, and then scratch our heads when the model doesn’t translate well in real-world situations. It’s a pervasive issue, and one that experts have been trying to mitigate for years.

Image result for bias

Source: YouTube

With the seriousness of this challenge in mind, IBM has released a toolkit that contains a set of “fairness metrics” for datasets and models, explanations for these metrics, and algorithms that can deal with any bias that is unearthed. And the best part? It’s open source (and in Python)! Check out the below links to get started by yourself:

The toolkit, officially labelled the ‘AI Fairness 360 Open Source Toolkit’, contains over 30 fairness metrics and 9 algorithms that aim to deal with bias. These algorithms are state-of-the-art, and are mentioned below:

  • Optimized Preprocessing
  • Disparate Impact Remover
  • Equalized Odds Postprocessing
  • Reweighing
  • Reject Option Classification
  • Prejudice Remover Regularizer
  • Calibrated Equalized Odds Postprocessing
  • Learning Fair Representations
  • Adversarial Debiasing

The above mentioned official site has multiple tutorials in different industry functions to give you a taste of how to use the toolkit. These include credit scoring, medical expenditure, and gender bias in facial recognition. What are you waiting for? Get started already!

 

Our take on this

We need to remember that data isn’t just numbers on a spreadsheet, but is linked to human beings. Bias is an omnipotent issue. I cannot stress enough on how important dealing with it is, especially when we’re running algorithms that will directly impact lives.

Can you imagine running a credit risk model, or a loan default model, and turning away folks who most desperately need the money? They were perfectly eligible for it, but due to some bias in the data, and subsequently the model, we failed to consider that aspect. Unacceptable, right? Let’s keep that in mind next time we work on a project and try to use this toolkit, if other known methods are not working.

 

Pranav Dar 19 Sep 2018

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear