Understand Weight of Evidence and Information Value!

Kruthika Last Updated : 20 Jul, 2021

6 min read

This article was published as a part of the Data Science Blogathon

Agenda

We have all built a logistic regression at some point in our lives. Even if we have never built a model, we have definitely learned this predictive model technique theoretically. Two simple, undervalued concepts used in the preprocessing step to build a logistic regression model are the weight of evidence and information value. I would like to bring them back to the limelight through this article.

This article is structured in the following way:

Introduction to logistic regression
Importance of feature selection
Need for a good imputer for categorical features
WOE
IV

Let’s get started!

1. Introduction to Logistic Regression

First thing first, we all know logistic regression is a classification problem. In particular, we consider binary classification problems here.

Logistic regression models take as input both categorical and numerical data and output the probability of the occurrence of the event.

Example problem statements that can be solved using this method are:

Given the customer data, what is the probability that the customer will buy a new product introduced by a company?
Given the required data, what is the probability that a bank customer will default on a loan?
Given the weather data for the last one month, what is the probability that it will rain tomorrow?

All the above statements had two outcomes. (buy & not buy, default & not default, rain & not rain). Hence a binary logistic regression model can be built. Logistic regression is a parametric method. What does this mean? A parametric method has two steps.

1. First, we assume a functional form or shape. In the case of logistic regression, we assume that

2. We need to predict the weights/coefficients bi such that, the probability of an event for an observation x is close to 1 if the actual value of the target is 1 and the probability is close to 0 if the actual value of the target is 0.

With this basic understanding, let us understand why do we need feature selection.

2. Importance of Feature Selection

In this digital era, we are equipped with a humongous amount of data. However, not all features available to us are useful in every model prediction. We have all heard the saying “Garbage in, garbage out!”. Hence, choosing the right features for our model is of utmost importance. Features are selected based on the predictive strength of the feature.

For instance, let us say we want to predict the probability that a person will buy a new Chicken recipe at our restaurant. If we have a feature – “Food preference” with values {Vegetarian, Non-Vegetarian, Eggetarian}, we are almost certain that this feature will clearly separate people who have a higher probability of buying this new dish from those who will never buy it. Hence this feature has high predictive power.

We can quantify the predictive power of a feature using the concept of information value that will be described here.

3. Need for a good imputer for categorical features

Logistic regression is a parametric method that requires us to calculate a linear equation. This requires that all features are numerical. However, we might have categorical features in our datasets that are either nominal or ordinal. There are many methods of imputation like one-hot encoding or simply assigning a number to each class of categorical features. each of these methods has its own merits and demerits. However, I will not be discussing the same here.

In the case of logistic regression, we can use the concept of WoE (Weight of Evidence) to impute the categorical features.

4. Weight of Evidence

After all the background provided, we have finally arrived at the topic of the day!

The formula to calculate the weight of evidence for any feature is given by

Before I go ahead explaining the intuition behind this formula, let us take a dummy example:

The weight of evidence tells the predictive power of a single feature concerning its independent feature. If any of the categories/bins of a feature has a large proportion of events compared to the proportion of non-events, we will get a high value of WoE which in turn says that that class of the feature separates the events from non-events.

For example, consider category C of the feature X in the above example, the proportion of events (0.16) is very small compared to the proportion of non-events(0.37). This implies that if the value of the feature X is C, it is more likely that the target value will be 0 (non-event). The WoE value only tells us how confident we are that the feature will help us predict the probability of an event correctly.

Now that we know that WoE measures the predictive power of every bin/category of a feature, what are the other benefits of WoE?

1. WoE values for the various categories of a categorical variable can be used to impute a categorical feature and convert it into a numerical feature as a logistic regression model requires all its features to be numerical.

On careful examination of the formula of WoE and the logistic regression equation to be solved, we see that the WoE of a feature has a linear relationship with the log odds. This ensures that the requirement of the features having linear relation with the log odds is satisfied.

2. For the same reason as above, if a continuous feature does not have a linear relationship with the log odds, the feature can be binned into groups and a new feature created by replaced each bin with its WoE value can be used instead of the original feature. Hence WoE is a good variable transformation method for logistic regression.

3. On arranging a numerical feature in ascending order, if the WoE values are all linear, we know that the feature has the right linear relation with the target, However, if the feature’s WoE is non-linear, we should either discard it or consider some other variable transformation to ensure the linearity. Hence WoE gives us a tool to check for the linear relationship with the dependent feature.

4. WoE is better than one-hot encoding as one-hot encoding will need you to create h-1 new features to accommodate one categorical feature with h categories. This implies that the model will not have to predict h-1 coefficients (bi) instead of 1. However, in WoE variable transformation, we will need to calculate a single coefficient for the feature in consideration.

5. Information Value

Having discussed the WoE value, the WoE value tells us the predictive power of each bin of a feature. However, a single value representing the entire feature’s predictive power will be useful in feature selection.

The equation for IV is

Note that the term (percentage of events – the percentage of non-events) follows the same sign as WoE hence ensuring that the IV is always a positive number.

How do we interpret the IV value?

The table below gives you a fixed rule to help select the best features for your model

Information Value	Predictive power
<0.02	Useless
0.02 to 0.1	Weak predictors
0.1 to 0.3	Medium Predictors
0.3 to 0.5	Strong predictors
>0.5	Suspicious

As seen from the above example, feature X has an information value of 0.399 which makes it a strong predictor and hence will be used in the model.

6. Conclusion

As seen from the above example, the calculation of the WoE and IV are beneficial and help us analyze multiple points as listed below

1. WoE helps check the linear relationship of a feature with its dependent feature to be used in the model.

2. WoE is a good variable transformation method for both continuous and categorical features.

3. WoE is better than on-hot encoding as this method of variable transformation does not increase the complexity of the model.

4. IV is a good measure of the predictive power of a feature and it also helps point out the suspicious feature.

Though WoE and IV are highly useful, always ensure that it is only used with logistic regression. Unlike other feature selection methods available, the features selected using IV might not be the best feature set for a non-linear model building.

Hope this article has helped you gain intuition into the workings of WoE and IV.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Kruthika

Beginner Core ML Machine Learning Python Regression

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Understand Weight of Evidence and Information Value!

Agenda

1. Introduction to Logistic Regression

2. Importance of Feature Selection

3. Need for a good imputer for categorical features

4. Weight of Evidence

5. Information Value

6. Conclusion

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect