You may have heard that “**Customer is King**“. This is because the customer decides which product will stay and which will not. Whether marketers treat a customer as a ‘**King**‘ or not, he is always a ‘King’. He has the money the marketers want. He is not going to give that away for free. So, how do we know which product is accepted more by customers? The answer is to do an experiment known as A/B tests.

In this article, I will talk about A/B Test and we will also see a method that works well for comparing click rates across different advertisements within a campaign. I will also explain how to calculate the same in Excel and in Python and talk about how to interpret the results.

This article was published as a part of the Data Science Blogathon.

- Introduction
- What is A/B Testing?
- Limitations of Traditional A/B Test
- What is Chi-square (or Chi-Sq)?
- Actual Calculations Using Excel
- Code to Implement Chi-Sq in Python
- How to Count of Instances or Volume Plays an Important Role in Determining Significance?
- Applications of Chi Square
- Advantages of Using Chi-square Test
- Conclusion
- Frequently Asked Questions

A/B testing, also known as split testing, compares two versions of a variable (for example, a advertisement A vs advertisement B), showing each version to equal numbers of users at random to determine which performs better against a business goal.

A/B testing can be useful when you want to test metrics of two variables against each other. However, traditional A/B testing will only take you so far. Keep reading to learn more about the limitations of A/B testing and discover an alternative approach.

Let us understand the same from a Data Scientist’s perspective. For a Data Scientist, experiments are important part of day to day work. But how do we validate those experiments?

Let’s say you observe the below metrics for an email advertisement campaign.

Now, you want to understand whether the click rates of the advertisement are different.

So, you can calculate the click rate for each advertisement (like below)

Inference: You see that Advertisement B has better click rates. **But is it significant enough??**

Once, the Data Scientists create a solution to the problem, they usually deploy the solution and perform an A/B test to evaluate whether the solution is performing well practically.

Traditionally, most Data Scientists use t-test (or z-test) to determine the significance of A/B test. It is generally used to compare the means of 2 groups to understand of there is any statistical difference. t-test assumes that the distribution is Gaussian (or normal). However, it may produce non-reliable metrics if the distribution is not normal.

Since, we cannot always have a normal distribution with the data, we need to use a better and more generic approach.

In this article, I will talk about a particular A/B Testing method that works well for comparing click rates across different advertisements within a campaign. I will also explain how to calculate the same in Excel and in Python and talk about how to interpret the results.

Chi-sq test is used to determine associations between 2 or more categorical variables. The Chi-square formula is used in data that consist of variables distributed across various categories and helps us to know whether that distribution is different from what one would expect by chance.

Understanding with example:

We will use the same example we saw in the introduction. Since, we are doing a statistical test, lets understand a few terminologies.

**Null Hypothesis : H0 **: The two categorical variables have no relationship (independent)

**Alternate Hypothesis : H1 **: There is a relationship (dependent) between two categorical variables

We will define a significant factor to determine whether the relation between the variables is of considerable significance. Generally, a significant factor or **alpha value of 0.05 **is chosen. This alpha value denotes the probability of erroneously rejecting H0 when it is true.

A lower alpha value is chosen in cases when we expect more precision. If the p-value for the test comes out to be strictly greater than the alpha value, then we will accept our H0.

Create a Total column and row. Also, create a Click rate row. It will represent the Click/Total for each advertisement.

Create a contingency table for expected values.

Expected Frequency = (Row Total x Column Total)/Grand Total

Value for the 1st cell = (51*100)/200 = 25.5

You will get the below table for the expected values.

Notice how the total values are same, but other cell values have changed.

Use the tables created in above 2 steps and calculate the chi-sq statistic using the below formula.

Here,**O** => **Observed values **or the actual values that we saw in Step 1.

**E**=> **Expected values **that we computed in Step 2.

**χ2 ** => **Chi-Sq statistic**

For each cell of the table,

- Numerator = squared value of (Observed value – Expected value)
- Denominator = Expected value
- Chi-Sq statistic = Numerator/Denominator

For e.g. Value for the 1st cell = (25 – 25.5)^2/25.5 = 0.01

Similarly, calculate values across all cells and add them up. See the highlighted cell in Yellow.

We get an overall Chi-sq value of 0.03

**Definition:** In statistics, Degrees of freedom are **the number of independent variables that can be estimated in a statistical analysis and tell you how many items can be randomly selected before constraints must be put in place**.

**E.g. **there are 100 people who were recommended Advertisement A through email. A person can either Click or not Click the advertisement **(N=2)**. So, if there are 25 people who clicked on that Advertisement, we can calculate that (100-25=) 75 people did not click the advertisement. So, we need only 1 variable to ascertain that information. Hence, the DOF will be 1. It can be calculated using **(N-1)**.

Degrees of freedom are important **for finding critical cutoff values for a statistical test.**

Degree of Freedom for a Chi-sq Test = (rows − 1) * (columns − 1)

DOF = (2 − 1)*(2 − 1) = 1×1 = 1

Now, we need to find the critical value of the chi-square distribution.

We can obtain this from the chi-square distribution table.

Now, let us look at the table and find the value corresponding to 1 degrees of freedom and a 0.05 significance factor.

The tabular or critical value of chi-square here is 3.841

Since, our **calculated chi-sq value <= Critical chi-sq value, we fail to reject null hypothesis.**

In simpler terms, there is no significance in clicks between the two advertisements.

Assuming that we already have the observed values from step 1, use the below steps for Excel:

- Calculate Expected value using Step 2 in Excel.
- Now, we can directly use the CHISQ.TEST function in Excel
- Formula = CHISQ.TEST(Observed Range, Expected Range). Here, we will use CHISQ.TEST(C4:D5,H4:I5) since the Observed data is in cells C4:D5 and the Expected data is in cells H4:I5. (See image Below)

Upon running this test, you will observe a **p-value of 0.871**

We can use the below grid to interpret the p-value.

Since the p-value>0.05, we will fail to Reject the Null Hypothesis.

It means that the clicks are independent of the advertisement.

Statistical Significance = 1-p-value = 1 – 0.871 = 0.129.

It means that we are only 12.9% confident that these click rates will be observed for the two advertisements. Since, this significance is less than 0.95, we fail to reject null hypothesis.

```
# python
from scipy.stats import chi2_contingency
# defining the table
# [a_click, a_noclick], [b_click, b_noclick]
data = [[25,75], [26,74]]
stat, p, dof, expected = chi2_contingency(data,correction=False)
# interpret p-value
alpha = 0.05
print("p value is ",(p))
if p <= alpha:
print('Dependent (reject H0)')
else:
print('Independent (H0 holds true)')
```

Make sure that you pass the correction=False argument in the function.

Output:

p value is 0.8711230866931309

Independent (H0 holds true)

Let us increase the volume of the data as below:

Now let us rerun the same using Python snippet to see how the p-value changed.

```
# python
from scipy.stats import chi2_contingency
# defining the table
# [a_click, a_noclick], [b_click, b_noclick]
data = [[2500,7500], [2600,7400]]
stat, p, dof, expected = chi2_contingency(data,correction=False)
# interpret p-value
alpha = 0.05
print("p value is ",(p))
if p <= alpha:
print('Dependent (reject H0)')
else:
print('Independent (H0 holds true)')#import csv
```

Output:

p value is 0.10473464597187702

Independent (H0 holds true)

You see – how the p-value reduced from ~0.87 to ~0.1. If the traffic increases further and the click rate remains same, the p-value will further decrease.

Given below are a few most common applications of the chi-square formula

- Biologists use it to determine if there is a significant association between the two variables, such as the association between two species in a community.
- Genetic analysts use it to interpret the numbers in various phenotypic classes.
- It is used in various statistical procedures to help to decide if to hold onto or reject the hypothesis.
- It is used in the medical literature to compare the incidence of the same characteristics in two or more groups.
- It is used in A/B testing to compare multiple groups.

- Easy to understand
- Does not require the distribution to be normal
- Can be used to test multiple metrics across multiple groups. For instance, if we were tracking 3 metrics : Click_on_button, Click_on_anywhere_else, No_Click for 3 advertisements. We could have a contingency table of 3 X 3 and DOF (Degree Of Freedom) = (3-1) * (3-1) = 4

- A/B testing, also known as split testing, is a method of testing that compares two versions of a variable (for example a advertisement A vs advertisement B), showing each version to equal numbers of users at random to determine which performs better against a business goal.
- Traditionally, t-test (or z-test) is used to determine the significance of A/B test. t-test assumes that the distribution is Gaussian (or normal). Since, we cannot always have a normal distribution with the data, we need to use a better and more generic approach.
- Chi-sq test is more reliable for such cases.
- It is easy to calculate and interpret Chi-sq in Excel and in Python.
- The significance value of a Chi-sq statistic increases with increase in traffic.

Feel free to connect with me on LinkedIn if you want to discuss this with me.

A. The term “chi-square” is used because the test statistic follows a chi-square distribution. The distribution was first introduced by the statistician Karl Pearson, who named it after the Greek letter “χ” (chi), which resembles the shape of the distribution curve.

A. A chi-square test is used to analyze categorical data and determine if there is a significant association between two variables. It helps to assess whether the observed frequencies in different categories deviate significantly from the expected frequencies, indicating a relationship or independence between the variables.

A. The main difference between a t-test and a chi-square test lies in the types of data they analyze. A t-test is used to compare means between two groups, typically for continuous numerical data. In contrast, a chi-square test is used for categorical data to examine the association or independence between variables.

A. A chi-square test is a statistical method used to analyze categorical data. It has different types based on the research question and nature of data, including the chi-square goodness-of-fit test (assessing whether observed data fits an expected distribution) and the chi-square test of independence (evaluating the association between two categorical variables).

**The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.**