Monday, October 31, 2011

The China Study II: Gender, mortality, and the mysterious factor X

WarpPLS and HealthCorrelator for Excel were used to do the analyses below. For other China Study analyses, many using WarpPLS as well as HealthCorrelator for Excel, click here. For the dataset used, visit the HealthCorrelator for Excel site and check under the sample datasets area. As always, I thank Dr. T. Colin Campbell and his collaborators for making the data publicly available for independent analyses.

In my previous post I mentioned some odd results that led me to additional analyses. Below is a screen snapshot summarizing one such analysis, of the ordered associations between mortality in the 35-69 and 70-79 age ranges and all of the other variables in the dataset. As I said before, this is a subset of the China Study II dataset, which does not include all of the variables for which data was collected. The associations shown below were generated by HealthCorrelator for Excel.


The top associations are positive and with mortality in the other range (the “M006 …” and “M005 …” variables). This is to be expected if ecological fallacy is not a big problem in terms of conclusions drawn from this dataset. In other words, the same things cause mortality to go up in the two age ranges, uniformly across counties. This is reassuring from a quantitative analysis perspective.

The second highest association in both age ranges is with the variable “SexM1F2”. This variable is a “dummy” variable coded as 1 for male sex and 2 for female, which I added to the dataset myself – it did not exist in the original dataset. The association in both age ranges is negative, meaning that being female is protective. They reflect in part the role of gender on mortality, more specifically the biological aspects of being female, since we have seen before in previous analyses that being female is generally health-protective.

I was able to add a gender-related variable to the model because the data was originally provided for each county separately for males and females, as well as through “totals” that were calculated by aggregating data from both males and females. So I essentially de-aggregated the data by using data from males and females separately, in which case the totals were not used (otherwise I would have artificially reduced the variance in all variables, also possibly adding uniformity where it did not belong). Using data from males and females separately is the reverse of the aggregation process that can lead to ecological fallacy problems.

Anyway, the associations with the variable “SexM1F2” got me thinking about a possibility. What if females consumed significantly less wheat flour and more animal protein in this dataset? This could be one of the reasons behind these strong associations between being female and living longer. So I built a more complex WarpPLS model than the one in my previous post, and ran a linear multivariate analysis on it. The results are shown below.


What do these results suggest? They suggest no strong associations between gender and wheat flour or animal protein consumption. That is, when you look at county averages, men and women consumed about the same amounts of wheat flour and animal protein. Also, the results suggest that animal protein is protective and wheat flour is detrimental, in terms of longevity, regardless of gender. The associations between animal protein and wheat flour are essentially the same as the ones in my previous post. The beta coefficients are a bit lower, but some P values improved (i.e., decreased); the latter most likely due to better resample set stability after including the gender-related variable.

Most importantly, there is a very strong protective effect associated with being female, and this effect is independent of what the participants ate.

Now, if you are a man, don’t rush to take hormones to become a woman with the goal of living longer just yet. This advice is not only due to the likely health problems related to becoming a transgender person; it is also due to a little problem with these associations. The problem is that the protective effect suggested by the coefficients of association between gender and mortality seems too strong to be due to men "being women with a few design flaws".

There is a mysterious factor X somewhere in there, and it is not gender per se. We need to find a better candidate.

One interesting thing to point out here is that the above model has good explanatory power in regards to mortality. I'd say unusually good explanatory power given that people die for a variety of reasons, and here we have a model explaining a lot of that variation. The model  explains 45 percent of the variance in mortality in the 35-69 age range, and 28 percent of the variance in the 70-79 age range.

In other words, the model above explains nearly half of the variance in mortality in the 35-69 age range. It could form the basis of a doctoral dissertation in nutrition or epidemiology with important  implications for public health policy in China. But first the factor X must be identified, and it must be somehow related to gender.

Next post coming up soon ...