Simpsons Paradox and the Ecological Fallacy in Data Science

Simpsons Paradox

I’m currently studying for a Master’s Degree in Global Health at The University of Manchester, and I’m absolutely loving it. Right now, we’re studying epidemiology and study design, which also involves a great deal of statistical analysis.

Some data was presented to us from an ecological study (a type of scientific study, that looks at large-scale, population level data) called The WHO MONICA Project that showed mean cholesterol vs mean height, grouped by population centre (E.g. China-Beijing or UK-Glasgow).

In this chart, you can see a positive correlation between height and cholesterol, with a coefficient of 0.36, suggesting that height may be a potential risk factor for higher cholesterol.

However, when the analysis was re-run using raw data (not averaged for each of the population centres), the correlation coefficient was -0.11.

So, when using mean measures of each population centre, it appears that height could be a risk factor for higher cholesterol, whilst the raw data actually shows the opposite is slightly more likely to be true!

This is known as an “ecological fallacy” – because it takes population level data and makes erroneous assumptions about individual effects.

This is a great example of Simpsons Paradox.

Simpsons paradox is when a trend appears in several different groups of data but disappears or reverses when the groups are combined.

Table 1 in Wang (2018) is a relatively easy example. (This is fictional test score data for two schools.)

(Also, please ignore for a moment the author’s possible bias in scoring male students higher – maybe this is a test about ability to grow facial hair.)










Alpha (1)





Beta (2)





It’s clear if you look at the numbers that the Beta school have higher average scores (85 and 81 for male students and female students respectively).

However, if you calculate the averaged scores for individuals in the schools, Alpha school has an average score of 83.8 and Beta has just 81.8.

So whilst Beta school *looks* like the highest performing school when broken down by gender, it is actually Alpha school that has the highest average scores.

In this case, it’s quite clear why: if you only look at the average scores by gender, it’s easy to assume that the proportion of male and female pupils for each school is roughly the same, when in fact 80 pupils at Alpha school are male (and 20 female), but only 20 are male at the Beta school, with 80 female.

Using gender to segment the data hides this disproportion of gender between the schools. This may be appropriate to show in some cases, but can lead to false assumptions being made.

The same issue can be seen in Covid-19 Case Fatality Rate (CFR) data when comparing Italy and China. Kegelgen et al (2020) found that CFRs were lower in Italy for every age group, but higher overall (see table (a)) in the paper.

The reason, when you see table (b), is clear. The CFR for the 70-79 and 80+ groups are far higher than for all other age groups, and these age groups are significantly over-represented in Italy’s confirmed cases of Covid-19. This means that Italy’s overall CFR is higher than China’s only by dint of recording a “much higher proportion of confirmed cases in older patients compared to China.” China simply didn’t report as many Covid-19 cases in older individuals, and the fatality rate is far higher in older individuals. Italy has a more elderly population (median age of 45.4 opposed to China’s 38.4), but other factors such as testing strategies and social dynamics may also be playing a part.

Another example of Simpsons Paradox is in gender bias among graduate admissions to University of California, Berkeley, where it was used in reverse. In 1973, the admission figures appeared to show that men were more likely to be admitted than women, and the difference was significant enough that it was unlikely to be due to chance alone. However, the data for the individual departments showed a “small but statistically significant bias in favour of women”. (Bickel et al, 1975). Bickel et al’s conclusions were that women were applying to more competitive departments such as English, whilst men were applying to departments such as engineering and chemistry, that typically had higher rates of admission.

(Whether this still constitutes bias is the subject of a different debate.)

The crux of Simpsons Paradox is: If you pool data without regard to the underlying causality, you could get the wrong results.


Bokai WANG, C. (2018) “Simpson’s Paradox: Examples”, Shanghai Archives of Psychiatry, 30(2), p. 139. Available at: (Accessed: 21 October 2020).

Julius von Kugelgen, Luigi Gresele, Bernhard Scholkopl, (2020) “Simpson’s paradox in Covid-19 case fatality rates: a mediation analysis of age-related causal effects.” Available at: (Accessed: 21 October 2020).

P.J. Bickel, E.A. Hammel and J.W. O’Connell (1975). “Sex Bias in Graduate Admissions: Data From Berkeley”(PDF). Science. 187 (4175): 398–404. doi:10.1126/science.187.4175.398. PMID 17835295.

WHO MONICA Project Principal Investigators (1988) “The world health organization monica project (monitoring trends and determinants in cardiovascular disease): A major international collaboration” Journal of Clinical Epidemiology 41(2) 105-114. DOI: 10.1016/0895-4356(88)90084-4

Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *