# Why do we need correlation

## Pearson Product Moment Correlation: Requirements

Product-moment correlation

The Pearson product-moment correlation is one of the parametric method. This means that certain requirements must be met so that the results are correct and we can interpret them.

• Scale level. The correlation coefficient provides reliable results if the variables are at least interval scaled or for dichotomous data (since dichotomous data are by definition metrically scaled).
• Linearity. The relationship between the two variables must be linear. If the relationship is not linear, the Pearson product-moment correlation will underestimate the strength of the relationship.
• There are no outliers in the groups. Most parametric statistics are not very robust against outliers, i.e. values ​​that are far removed from the mass of the other values. A single outlier cannot make an otherwise significant result non-significant. It is therefore particularly important to check the data for outliers.
• Finite variance and covariance. If the variance of one or both variables is not finite, the product-moment correlation will not provide reliable results. The same is true for covariance.

SPSS also automatically checks whether the correlations differ significantly from zero. For the interpretation of the significance, both variables have to be added bivariate normally distributed be.

### Finite variance and covariance

The formula for calculating r is based on the variance and covariance of both random variables. Finite (co-) variance means that if we take a sample of, for example N= 100, the variance would stabilize at a value similar to that at a higher value of N. If the variance were not finite, it would be greater N keep increasing.

If both variables have a bivariate normal distribution (like the variables in the figure on the right), finite variance is automatically given. In this case, the correlation coefficient of the sample is also the same Maximum likelihood estimator of the population correlation coefficient. He is with it asymptotically fair to expectation and efficient. In simple terms, this means that it is impossible to make a more accurate estimate of the correlation than the correlation coefficient. The correlation coefficient remains for samples that are not normally distributed almost true to expectationbut may no longer be efficient. Therefore, the sample correlation coefficient remains a constant estimate of the population correlation coefficient as long as the variance and covariance are finite (which is guaranteed by the law of large numbers).

That is why one often reads in some books that one of the requirements of the correlation coefficient is the bivariate normal distribution of the variable. This isNot the case. Normally distributed variables are important, however, if the significance is determined using the t-Tests should be checked. Similar requirements apply here as for the t-Test as a hypothesis test.

If there is no finite variance, a non-parametric method should be used, such as Spearman’s Rho or Kendall’s Tau.

Finite variance and covariance is an important requirement, but this cannot be checked with SPSS. We'll assume finite variance and covariance for this in the course of this tutorial (and don't worry: it's very unlikely to have a data set that doesn't do this).

### Linearity

Correlation is a measure of linear Addiction. If one variable cannot be written as a linear function of the other, a perfect correlation of -1 or +1 cannot be achieved. There are possibilities to change the distribution properties of the variables through transformations, but one should be careful and use these transformations with caution. Too rigorous use could improve the correlation, but at the expense of the actual applicability and interpretability of the findings. If there is a lack of linearity, non-parametric methods should be considered, such as Spearman’s Rho or Kendall’s Tau.

The easiest way to assume linearity is visual, with one Scatter plot, as we shall see later in this guide.

### Normal distribution

It is true that the two correlated variables themselves do not have to have a bivariate normal distribution for Pearson’s r To be able to calculate, however, you want to be able to calculate the significance check, further requirements must be met. These prerequisites correspond to those of the t-Tests, as a corresponding t- Statistics is used to check significance.

Unfortunately, SPSS does not have a method for checking the bivariate normal distribution. We shall therefore have to resort to a simpler (though not always accurate) method.

Pearson Product Moment Correlation in SPSS
Pearson product-moment correlation: distortions of the correlation coefficient