FAQ 6: What are random or spurious correlations?

In this series you will find answers to questions we get asked a lot when speaking about data, analytics, digital transformation, emissions reduction, industry 4.0, IoT and data-driven automation.

Katya Vladislavleva,posted on 30th March 2022
FAQAnalyticsDataSpurious correlations

What are random or spurious correlations?

Spurious correlations are correlations which are strong but observed purely due to chance, or randomness. The smaller is the sample size of two variables - the higher the chance for random correlations.

A fantastic collection of spurious correlations is maintained by Tyler Vigen here.

Look at an example: Correlation of mozzarella consumption over 10 years is strongly correlated with the number of civil engineering doctorates awarded at 95.9%.

Correlation of mozzarella consumption over 10 years is strongly correlated with the number of civil engineering doctorates awarded at 95.9%

We use these examples in every class and often as safety shares: do not trust high correelations if you have a small sample size!

Which sample size is safe? We say starting from 20 records.

[Technical Note] This is the number of records such as the 95th percentile of a distribution of correlations of random pairs of vectors is below 50%.
Follow us on Linkedin

Follow us on Linkedin to stay up to date for the next FAQ