19 August 2012

Fooling Ourselves and Others with Stats - Fertility Rate Vs Home Prices...

Watch out for conclusion drawn from simple co-relation. In the Fertility Rate example, by introducing another factor, Free Education and Healthcare, I get the conclusion that Home Price relative to Income is actually not significant at 95% confidence level. The formula is:
Fertility Rate = 1.56 +0.5 FreeEdc - 0.02 HomePrice
Simple statistics course will tell us that correlation does not imply cause and effect relationship. Nevertheless, such statistics are used to study and 'prove' causal relationships in many fields like sociology, economics, medicines etc. A good example is shown in the post Fertility Rate Vs Home Prices....

The author show a chart of Fertility Rates against "Home Price to Income Ratio", "Mortgage as Percentage of Income" for countries in Singapore, Hong Kong and 4 Scandinavia Countries and draw the conclusion that:

"The amount people have to pay for their homes relative to their income is inversely correlated with Total Fertility Rate (TFR)."

It is indeed very convincing if you just look at the data. The correlation is indeed very high. The Green Table in the chart below shows the correlation to be -0.9645.

However, I was aware of other factors that could be significant as well in the Scandinavia Countries, namely, free education and free children healthcare. Education and healthcare costs are important factors, if not more, than living cost in the raising of children. In fact, I would argue that we already got a house as a couple and giving extra space to babies and toddlers actually cost about zero. But the raising up, such as, the healthcare and education of children, will be the real costs and the responsibilities concern of parenting.

Due to lack of data, only 6 data points presented, I can only add one more factor, with value of 0 or 1, to denote existence of Free-Education and Healthcare for children. So, Singapore and Hong Kong will be 0 (not free) and the other 4 countries will be 1 (Yes - Free). 

I run the data using Excel (I  don't have more sophisticated Stats Software) and Linear Regression. 

The full input and regression analysis output is shown in the chart below:


The points to note are:

1. The addition of another factor, FreeEdc (short hand for Free Education and Healthcare), the fit becomes even better.  Correlation increases to 0.9898 vs -0.9645.

2. FreeEdc is a stronger factor(0.51)  than Home(-0.02) (i.e. Home Price to Income Ratio). The t-stat is 2.7 which is significant of we choose a 95% confidence limit. 

3. In fact, Home factor is relative small -0.02, and is also statistically significant at a t-stat of 1.4937, which is way below the usual t-stat of 1.96 for 95% confidence limit and can be discarded!

4. Constant Intercept is 1.5623 with t-stat of 5.3 (almost 6-sigma, 1 part in a million) and tells us that leaving to nature, without encouragement or discouragement factors, people will reproduce at that rate. But it actually tells us that there are quite a number of factors that we don't understand and need to dig into the 1.5623.

Conclusions:
When the casual factors of something is complex, like Fertility Rate, simple 1 factor co-relationship analysis will lead us astray. We need to look at the problem in greater depth, and have to look into more factors first before we reduce every to just one factor. While simplicity is good, over simplification could be wrong.

Please watch out when people show you simple statistics to persuade you. 

Lim Liat (C) 19 Aug 2012
Post a Comment