Statistics for Beginners: Power of “Power Analysis”
This article was published as a part of the Data Science Blogathon.
How much data is enough to state statistical significance? In other words, what should be the optimal sample size? Often, it is not entirely feasible to perform the statistical experiment multiple times to ensure enough power. At the same time, our machine learning model might not be statistically conclusive if we do not have an adequate sample size.
Power Analysis helps us to do such an analysis. It estimates how much sample size is necessary to capture the effect of the study at the desired significance level, effect size, and statistical power. Yes, there were a lot of new terms all of a sudden. Don’t worry, we will discuss them as we go along.
Let’s first discuss the statistical power in detail.
The statistical power of a hypothesis test implies the probability of detecting an effect, given that there exists a true effect. It is the confidence one derives from the results of a study and is inversely related to the type 2 error.
Note that type 2 error is False Negative where we fail to reject the false null hypotheses.
At this point, it is important to understand what is the Null Hypothesis? It is a statistical hypothesis test that assumes an outcome. For example, the null hypothesis in the KS statistic implies that the two populations belong to similar distribution.
Source: Wiki with additions from the author
It is important to note from the above illustration that the higher the power of a test, the lower is the β i.e. type 2 error
As the lower statistical power of an experiment leads to invalid conclusions about the result, the experiments are desired to have a minimum threshold of power. Generally, it is expected to be 80% or more. Power of 80% means there is an 80% chance of detecting an effect that exists (and in turn 20% probability of observing Type 2 error).
Now is the time to look at a bigger picture i.e. Power analysis which depends on four related variables as mentioned below:
1) Effect size: The more prominent effect the data carries, the lesser the random error
2) Sample size: larger sample size helps detects smaller effects
3) Level of Significance: α
4) Statistical Power
All four variables are linked with each other and changing one variable impacts the rest of the variables.
Power Analysis is the process of estimating one of the 4 variables given values for the 3 variables. It is commonly used to estimate the minimum sample size to carry out an experiment.
As we increase the sample size, we are able to detect the small effects as well, albeit at the cost of carrying statistical experiments multiple times. Even then, there comes a point where adding more data is not increasing the power any further.
Note that it could be entirely possible that the sample we are working with might not capture the effect even if it exists in the population. This is largely attributed to sampling error where sample is not a representative of population.
Power Analysis is also used to check and validate the results and findings from the experiment. For example, if we specify the effect size, sample size, and significance level, we can calculate the power of an experiment to check whether type 2 error probability is within an acceptable range.
As per documentation, we can solve any one of the 4 parameters in an independent 2 sample T-test:
We can also use plot power curves to check how varying the effect size and different sample size changes the power of the experiment at a given significance level.