How to Use Chi Square to Fuel A/B Test?

Aakash93 Makwana 15 Jun, 2023 • 9 min read

Introduction

You may have heard that “Customer is King“. This is because the customer decides which product will stay and which will not. Whether marketers treat a customer as a ‘King‘ or not, he is always a ‘King’. He has the money the marketers want. He is not going to give that away for free. So, how do we know which product is accepted more by customers? The answer is to do an experiment known as A/B tests.

In this article, I will talk about A/B Test and we will also see a method that works well for comparing click rates across different advertisements within a campaign. I will also explain how to calculate the same in Excel and in Python and talk about how to interpret the results.

This article was published as a part of the Data Science Blogathon.

Introduction
What is A/B Testing?
Limitations of Traditional A/B Test
What is Chi-square (or Chi-Sq)?
Actual Calculations Using Excel
Code to Implement Chi-Sq in Python
How to Count of Instances or Volume Plays an Important Role in Determining Significance?
Applications of Chi Square
Advantages of Using Chi-square Test
Conclusion
Frequently Asked Questions

What is A/B Testing?

A/B testing, also known as split testing, compares two versions of a variable (for example, a advertisement A vs advertisement B), showing each version to equal numbers of users at random to determine which performs better against a business goal.

A/B testing can be useful when you want to test metrics of two variables against each other. However, traditional A/B testing will only take you so far. Keep reading to learn more about the limitations of A/B testing and discover an alternative approach.

Let us understand the same from a Data Scientist’s perspective. For a Data Scientist, experiments are important part of day to day work. But how do we validate those experiments?

Let’s say you observe the below metrics for an email advertisement campaign.

Now, you want to understand whether the click rates of the advertisement are different.

So, you can calculate the click rate for each advertisement (like below)

Inference: You see that Advertisement B has better click rates. But is it significant enough??

Once, the Data Scientists create a solution to the problem, they usually deploy the solution and perform an A/B test to evaluate whether the solution is performing well practically.

Limitations of Traditional A/B Test

Traditionally, most Data Scientists use t-test (or z-test) to determine the significance of A/B test. It is generally used to compare the means of 2 groups to understand of there is any statistical difference. t-test assumes that the distribution is Gaussian (or normal). However, it may produce non-reliable metrics if the distribution is not normal.

Since, we cannot always have a normal distribution with the data, we need to use a better and more generic approach.

In this article, I will talk about a particular A/B Testing method that works well for comparing click rates across different advertisements within a campaign. I will also explain how to calculate the same in Excel and in Python and talk about how to interpret the results.

What is Chi-square (or Chi-Sq)?

Chi-sq test is used to determine associations between 2 or more categorical variables. The Chi-square formula is used in data that consist of variables distributed across various categories and helps us to know whether that distribution is different from what one would expect by chance.

Understanding with example:

We will use the same example we saw in the introduction. Since, we are doing a statistical test, lets understand a few terminologies.

Null Hypothesis : H0 : The two categorical variables have no relationship (independent)

Alternate Hypothesis : H1 : There is a relationship (dependent) between two categorical variables

We will define a significant factor to determine whether the relation between the variables is of considerable significance. Generally, a significant factor or alpha value of 0.05 is chosen. This alpha value denotes the probability of erroneously rejecting H0 when it is true.

A lower alpha value is chosen in cases when we expect more precision. If the p-value for the test comes out to be strictly greater than the alpha value, then we will accept our H0.

Actual Calculations Using Excel

Step 1: Create a contingency table for observed values

Create a Total column and row. Also, create a Click rate row. It will represent the Click/Total for each advertisement.

Step 2: Calculate Expected values

Create a contingency table for expected values.

Expected Frequency = (Row Total x Column Total)/Grand Total

Value for the 1st cell = (51*100)/200 = 25.5

You will get the below table for the expected values.

Notice how the total values are same, but other cell values have changed.

Step 3: Calculate Chi-sq statistic

Use the tables created in above 2 steps and calculate the chi-sq statistic using the below formula.

Here,
O => Observed values or the actual values that we saw in Step 1.

E=> Expected values that we computed in Step 2.

χ2 => Chi-Sq statistic

For each cell of the table,

Numerator = squared value of (Observed value – Expected value)
Denominator = Expected value
Chi-Sq statistic = Numerator/Denominator

For e.g. Value for the 1st cell = (25 – 25.5)^2/25.5 = 0.01

Similarly, calculate values across all cells and add them up. See the highlighted cell in Yellow.

We get an overall Chi-sq value of 0.03

Step 4: Calculate DOF (Degrees of freedom)

Definition: In statistics, Degrees of freedom are the number of independent variables that can be estimated in a statistical analysis and tell you how many items can be randomly selected before constraints must be put in place.

E.g. there are 100 people who were recommended Advertisement A through email. A person can either Click or not Click the advertisement (N=2). So, if there are 25 people who clicked on that Advertisement, we can calculate that (100-25=) 75 people did not click the advertisement. So, we need only 1 variable to ascertain that information. Hence, the DOF will be 1. It can be calculated using (N-1).

Importance of DOF:

Degrees of freedom are important for finding critical cutoff values for a statistical test.

Degree of Freedom for a Chi-sq Test = (rows − 1) * (columns − 1)

DOF = (2 − 1)*(2 − 1) = 1×1 = 1

Step 5a: Use Critical value table to accept/reject Null hypothesis.

Now, we need to find the critical value of the chi-square distribution.

We can obtain this from the chi-square distribution table.

Now, let us look at the table and find the value corresponding to 1 degrees of freedom and a 0.05 significance factor.

The tabular or critical value of chi-square here is 3.841

Since, our calculated chi-sq value <= Critical chi-sq value, we fail to reject null hypothesis.

In simpler terms, there is no significance in clicks between the two advertisements.

Step 5b: Calculate and interpret p-value to accept/reject Null hypothesis.

Assuming that we already have the observed values from step 1, use the below steps for Excel:

Calculate Expected value using Step 2 in Excel.
Now, we can directly use the CHISQ.TEST function in Excel
Formula = CHISQ.TEST(Observed Range, Expected Range). Here, we will use CHISQ.TEST(C4:D5,H4:I5) since the Observed data is in cells C4:D5 and the Expected data is in cells H4:I5. (See image Below)

Upon running this test, you will observe a p-value of 0.871

We can use the below grid to interpret the p-value.

Since the p-value>0.05, we will fail to Reject the Null Hypothesis.

It means that the clicks are independent of the advertisement.

Step 5c: Calculate and interpret statistical significance to accept/reject Null hypothesis.

Statistical Significance = 1-p-value = 1 – 0.871 = 0.129.

It means that we are only 12.9% confident that these click rates will be observed for the two advertisements. Since, this significance is less than 0.95, we fail to reject null hypothesis.

Code to Implement Chi-Sq in Python

# python
from scipy.stats import chi2_contingency

# defining the table
# [a_click, a_noclick], [b_click, b_noclick]
data = [[25,75], [26,74]]

stat, p, dof, expected = chi2_contingency(data,correction=False)

# interpret p-value
alpha = 0.05
print("p value is ",(p))

if p <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (H0 holds true)')

Make sure that you pass the correction=False argument in the function.

Output:

p value is 0.8711230866931309

Independent (H0 holds true)

How to Count of Instances or Volume Plays an Important Role in Determining Significance?

Let us increase the volume of the data as below:

Now let us rerun the same using Python snippet to see how the p-value changed.

# python
from scipy.stats import chi2_contingency

# defining the table
# [a_click, a_noclick], [b_click, b_noclick]
data = [[2500,7500], [2600,7400]]

stat, p, dof, expected = chi2_contingency(data,correction=False)

# interpret p-value
alpha = 0.05
print("p value is ",(p))

if p <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (H0 holds true)')#import csv

Output:

p value is 0.10473464597187702

Independent (H0 holds true)

You see – how the p-value reduced from ~0.87 to ~0.1. If the traffic increases further and the click rate remains same, the p-value will further decrease.

Applications of Chi Square

Given below are a few most common applications of the chi-square formula

Biologists use it to determine if there is a significant association between the two variables, such as the association between two species in a community.
Genetic analysts use it to interpret the numbers in various phenotypic classes.
It is used in various statistical procedures to help to decide if to hold onto or reject the hypothesis.
It is used in the medical literature to compare the incidence of the same characteristics in two or more groups.
It is used in A/B testing to compare multiple groups.

Advantages of Using Chi-square Test

Easy to understand
Does not require the distribution to be normal
Can be used to test multiple metrics across multiple groups. For instance, if we were tracking 3 metrics : Click_on_button, Click_on_anywhere_else, No_Click for 3 advertisements. We could have a contingency table of 3 X 3 and DOF (Degree Of Freedom) = (3-1) * (3-1) = 4

Conclusion

A/B testing, also known as split testing, is a method of testing that compares two versions of a variable (for example a advertisement A vs advertisement B), showing each version to equal numbers of users at random to determine which performs better against a business goal.
Traditionally, t-test (or z-test) is used to determine the significance of A/B test. t-test assumes that the distribution is Gaussian (or normal). Since, we cannot always have a normal distribution with the data, we need to use a better and more generic approach.
Chi-sq test is more reliable for such cases.
It is easy to calculate and interpret Chi-sq in Excel and in Python.
The significance value of a Chi-sq statistic increases with increase in traffic.

Feel free to connect with me on LinkedIn if you want to discuss this with me.

Frequently Asked Questions

Q1. Why is it called chi-square?

A. The term “chi-square” is used because the test statistic follows a chi-square distribution. The distribution was first introduced by the statistician Karl Pearson, who named it after the Greek letter “χ” (chi), which resembles the shape of the distribution curve.

Q2. What is a chi-square test used for?

A. A chi-square test is used to analyze categorical data and determine if there is a significant association between two variables. It helps to assess whether the observed frequencies in different categories deviate significantly from the expected frequencies, indicating a relationship or independence between the variables.

Q3. What is the difference between t-test and chi-square?

A. The main difference between a t-test and a chi-square test lies in the types of data they analyze. A t-test is used to compare means between two groups, typically for continuous numerical data. In contrast, a chi-square test is used for categorical data to examine the association or independence between variables.

Q4. What is chi-square test and its types?

A. A chi-square test is a statistical method used to analyze categorical data. It has different types based on the research question and nature of data, including the chi-square goodness-of-fit test (assessing whether observed data fits an expected distribution) and the chi-square test of independence (evaluating the association between two categorical variables).

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Aakash93 Makwana 15 Jun 2023

Beginner Excel Probability Python Statistics

How to Use Chi Square to Fuel A/B Test?

Introduction

Table of contents

What is A/B Testing?

Limitations of Traditional A/B Test

What is Chi-square (or Chi-Sq)?

Actual Calculations Using Excel

Step 1: Create a contingency table for observed values

Step 2: Calculate Expected values

Step 3: Calculate Chi-sq statistic

Step 4: Calculate DOF (Degrees of freedom)

Importance of DOF:

Step 5a: Use Critical value table to accept/reject Null hypothesis.

Step 5b: Calculate and interpret p-value to accept/reject Null hypothesis.

Step 5c: Calculate and interpret statistical significance to accept/reject Null hypothesis.

Code to Implement Chi-Sq in Python

How to Count of Instances or Volume Plays an Important Role in Determining Significance?

Applications of Chi Square

Advantages of Using Chi-square Test

Conclusion

Frequently Asked Questions

Frequently Asked Questions

Responses From Readers

Write for us