- The square of Pearson’s correlation coefficient is the same as the one in simple linear regression
- Neither simple linear regression nor correlation answer questions of causality directly. This point is important, because I’ve met people thinking that simple regression can magically allow an inference that X causes. That’s preposterous belief.

**What’s the difference between correlation and simple linear regression?**

Now let’s think of few differences between the two. Simple linear regression gives much more information about the relationship than Pearson Correlation. Here are a few things which regression will give but correlation coefficient will not.

- The slope in a linear regression gives the marginal change in output/target variable by changing the independent variable by unit distance. Correlation has no slope.
- The intercept in a linear regression gives the value of target variable if one of the input/independent variable is set zero. Correlation does not have this information.
- Linear regression can give you a prediction given all the input variables. Correlation analysis does not predict anything.

### Answer – 6: Pearson vs. Spearman

The simplest answer here is Pearson captures how linearly dependent are the two variables whereas Spearman captures the monotonic behavior of the relation between the variables.

For instance consider following relationship :

*y = exp ( x )*

Here you will find Pearson coefficient to be 0.25 but the Spearman coefficient to be 1. As a thumb rule, you should only begin with Spearman when you have some initial hypothesis of the relation being non-linear. Otherwise, we generally try Pearson first and if that is low, try Spearman. This way you know whether the variables are linearly related or just have a monotonic behavior.

### Answer – 7: Correlation vs. co-variance

If you skipped the mathematical formula of correlation at the start of this article, now is the time to revisit the same.

Correlation is simply the normalized co-variance with the standard deviation of both the factors. This is done to ensure we get a number between +1 and -1. Co-variance is very difficult to compare as it depends on the units of the two variable. It might come out to be the case that marks of student is more correlated to his toe nail in mili-meters than it is to his attendance rate.

This is just because of the difference in units of the second variable. Hence, we see a need to normalize this co-variance with some spread to make sure we compare apples with apples. This normalized number is known as the correlation.

## 8 thoughts on "7 most commonly asked questions on Correlation"

## tvmanikandan says: June 24, 2015 at 5:51 am

Tavish, I tried the following in R. I get pearson correlation as 0.88 and not 0.25 as you explained in the post. Please clarify. > x=c(1:5) > y=exp(x) > y [1] 2.718282 7.389056 20.085537 54.598150 148.413159 > cor(x,y,method="pearson") [1] 0.8862751 > cor(x,y,method="spearman") [1] 1## Abhi says: June 24, 2015 at 5:56 am

Very informative article! Thank you for writing this. If possible, could you also publish an article giving insights into various kinds of hypothesis testing techniques?## Gaurav Kant Goel says: June 24, 2015 at 11:13 am

Read this one and "difference between correlation and causation" back to back. My concepts on corelation have been solidified. Thanks for writing such informative articles.## kalyanischakravarthi says: June 24, 2015 at 11:54 am

Falling in love with "AV" . You guys are rocking.## Nimesh Jha says: June 26, 2015 at 5:18 am

Excellent Article . Thanks for writing in. it helped a lot.## Tavish Srivastava says: June 27, 2015 at 5:30 pm

Try with x <- c(1:100) you should get 0.25.## dunk says: July 30, 2015 at 5:34 pm

What's about the sample size ? Is there any rule of thumb regarding the sample size to measure correlation? I believe that different sample size can result in a different direction of correlation eg. N=100 correlation >0 but when N = 200 correlation <0## M.Silambu says: November 02, 2015 at 1:23 pm

How to apply Correlation in retail business domain.could you please list some of the x and y values for retail business .Eg.total sales vs day of the week like that.