Mathematics and Education

A slow blog

Greater Variability and the Right Tail

leave a comment »

Note: This is an online version of an article in the Association for Women in Mathematics Newsletter, Vol. 40, no. 5, pp. 20–23, September 2010, which can be downloaded as a pdf. Part 2 of this article is here.

In June, the New York Times published two articles on women in science: “Daring to Discuss Women in Science” (June 8) and “Legislation Won’t Close Gender Gap in Sciences” (June 15). They were written by John Tierney, who has been criticized previously for flaws in reporting on women in science and on climate change.[1]

These articles appear to make two claims:

1. There is “new evidence supporting Dr. Summers’s controversial hypothesis about differences in the sexes’ aptitude for math and science.” This “new evidence,” a study of mathematics SAT scores from seventh-grade students,[2] is essentially an update of the 1983 Benbow–Stanley article which reported that the gender ratio of 700-and-over scores was 13 to 1. The “new evidence” is that this ratio fell to 4 to 1 in 1991, but has not since changed.

2. The existence of gender bias is incompatible with the results of “careful studies that show that female scientists fare as well as, if not better than, their male counterparts in receiving academic promotions and research grants.”

I wrote “appear to make two claims” because parts of the articles seem to assume the truth of these claims. However, these, especially Claim 2, are not carefully discussed and supported. Instead, there is a lot of what might be called free association. Line-by-line discussion of each flaw would be quite lengthy, so here I will focus on a few main points. (I have given further details here.) These points are of two kinds:

• Connections—or lack thereof—between the findings and conclusions of the studies invoked and their interpretation.

• Criticism of two sources on which Tierney relies heavily. These are the Duke study of SAT scores mentioned above and Stephen Ceci and Wendy Williams’s book The Mathematics of Sex.

Before I begin discussion of Tierney’s apparent claims, here are a few notes on the context as I see it. Some bloggers have pointed out that the real audience for the New York Times articles is probably not women in science. Instead, the intent appears to be to discredit the gender equity workshops mandated by the America Competes Act.

My suspicion is that there may be a digital divide in audiences. Those who read on the Web can easily see the numerous comments at the Times that note mistakes and omissions in Tierney’s statements. Those who subscribe to the print edition of the Times may see only the articles and the four letters to the editor that were published with the June 15 article.

Part of the motivation for publishing articles such as Tierney’s may be—directly or indirectly—monetary. Like so many newspapers, the New York Times is concerned about financial survival. “Men are from Mars and women are from Venus” is a lot more exciting than “Men are from North Dakota and women are from South Dakota.” Sex differences are sexy. Gender similarities are a bore.

This phenomenon is illustrated by the recent success of The Female Brain, a Mars–Venus best-seller, which, according to a Nature review “fails to meet even the most basic standards of scientific accuracy and balance” and is “riddled with scientific errors.”[3]

As another review of The Female Brain said: “Let’s face it: Books on gender differences sell. There appears to be no end to the public hunger for scientific evidence that confirms men and women to be of different species.”[4] This is not to say that we should give up the attempt to communicate a more complicated story, but rather to suggest that such attempts confront deep-seated beliefs that, in various forms, have prevailed for centuries.[5]

For a non-scientific audience, trying to combat these beliefs by noting mistakes in articles such as Tierney’s is like cutting one head from a hydra or trying to clean the Augean stables in the standard manner (as opposed to using the method of Hercules—rerouting two rivers).[6] For a scientific audience, it’s a different matter, so onward into the muck . . .

To support Claim 1, Tierney seems to attribute differences in test performance to “innate aptitude.” He writes of “a biological factor: the greater variability observed among men in intelligence test scores and various traits.”

This has at least two mistakes: “biological factor” and “the greater variability observed.” Moreover, the remainder of the article concerns SAT scores, so one might also wonder if the SAT is being confused with an intelligence test.

Biological factor. Differences in test performance are not a “biological factor.” Although in the United States genetic inheritance is a popular explanation of differences in test performance, it is only one of three types of possible explanations:

  1. “innate aptitude” (as Tierney puts it) or “intrinsic aptitude” (as Summers put it).
  2. socio-cultural differences that are not considered “innate” although connected with biological sex. For example, in the past, girls were not allowed to attend mathematics classes, thus sex would have been a biological factor hindering their performance. In present times, stereotype threat is such a factor.
  3. differences arising from interaction between genetic inheritance and environment as described, for example, in the National Research Council report From Neurons to Neighborhoods.

The Duke study was not designed to rule out any of these types of explanations. In contrast to Tierney’s discussion, the researchers who conducted this study were careful to note explicitly that:

“Our findings are not inconsistent with previous explanations focusing on either biological . . . or social or cultural . . . aspects, but are likely best explained via frameworks that examine multiple perspectives simultaneously.”

As I understand it, the Duke study, like the Benbow–Stanley articles of the 1980s, is essentially a by-product of the talent searches. Just as high school students take the SAT or ACT in order to apply to Harvard, or Berkeley, or Yale, students who are interested in attending programs for academically gifted youth such as the Duke Identification Program or the Johns Hopkins Center for Talented Youth take the SAT or ACT as part of the application process. Talent search applicants’ scores, like those of college applicants, may be the subject of scholarly analysis.[7]

You might wonder what the Duke findings actually were. I’ve put some of its statistics in a table.

SAT-M Scores: Number, Ratio, Percent Female

1981–1985 1986–1990 1991–1995 1996–2000 2001–2005 2006–2010

700 and over

males 54 152 271 363 600 628
females 4 20 70 88 169 164
ratio 13.50 7.60 3.87 4.13 3.55 3.83
% female 7% 12% 21% 20% 22% 21%


males 0 4 4 12 28 79
females 0 0 0 3 5 12
ratio 4 5.6 6.6
% female 20% 15% 13%

Source: Wai et al., Table 1 and Appendix A

Looking at these statistics suggests several conjectures. Over time, Duke may have gotten better at recruiting students who scored well and its programs may have become better known. In one way or another, some students may be better prepared for testing than in the 1980s. For example, many students now take the SAT in middle school to document giftedness. Demographics and culture may also play an important role. Recently, a substantial portion of the Putnam winners and U.S. IMO team members have been immigrants or children of immigrants from China, Korea, Russia, and other countries where mathematical performance is highly valued. The same may be true of the high-scoring Duke applicants.

Greater variability. Tierney, like Summers, appears to be referring to the Greater Male Variability Hypothesis, the hypothesis that for a given measure the distribution of males’ measurements will vary more than the corresponding distribution for females. This hypothesis dates back to the 1800s. In modern times, it is formulated in terms of variance ratio (VR, the variance for males’ scores divided by the variance for females’ scores) and the question of interest is whether it is greater than, equal to, or less than 1. The Greater Male Variability Hypothesis is not supported by empirical data. In discussing current findings for mathematics tests, Janet Hyde and Janet Mertz state in the Proceedings of the National Academy of Sciences, “data from several studies indicate that greater male variability with respect to mathematics is not ubiquitous. Rather, its presence correlates with several measures of gender inequality.”[8] Tierney mentions the PNAS article in connection with Claim 1 and there is even a link to it in the online version of Tierney’s article. However, Tierney neglects to mention that its findings contradict “observed greater variability”—despite the fact that Janet Hyde pointed it out in email to him several days before his June 8 article was published and despite the fact that the Times noted it in March.[9]

Omission of Math Olympiad findings. The Duke article does not mention Hyde and Mertz’s PNAS article. Tierney mentions the article, but only part of its findings. He writes: “But some of the evidence for the disappearing gender gap involved standardized tests that aren’t sufficiently difficult to make fine distinctions among the brighter students.” This is correct. However, he didn’t mention the other evidence. Other tests discussed by Hyde and Mertz, namely the Math Olympiads and the Putnam, were sufficiently difficult to make fine distinctions among the brighter and very brightest students. This was a major part of their article.

Some Top-ranked IMO Teams: Percent Female

1989–1998 1999–2010
People’s Rep. China 5.6 2.8
USSR/Russian Fed. 21.7 2.8
USA 1.7 5.6
Rep. of Korea 5.0 8.3
Bulgaria 1.7 11.1
Vietnam 5.0 1.4
Japan 3.7 0

Source: Updated from Hyde & Mertz, p. 8805, courtesy of Janet Mertz.

Some readers may remember the announcement of the IMO study and the related articles in the AMS Notices and the New York Times.[10] One of the striking findings was the number of girls on some top-ranked International Math Olympiad teams. Bulgaria, East Germany/Germany, and the USSR/Russia have had 22, 19, and 15 different girls, respectively, on their teams over the decades since the first IMO was held in 1959. For example, Lisa Sauermann has been a recent star of the German team, ranking 12th, 3rd, and 4th in the world in 2008 through 2010, respectively. However, in the years prior to reunification in 1990, West Germany never had a girl on their team. The recent difference between Japan and the Republic of Korea in identification of IMO-caliber girls is similarly striking.[11] Such findings suggest that culture rather than genetics is an important explanation of gender differences in mathematics at this level.

One measure of culture is the World Economic Forum’s Gender Gap Index (GGI). Some 2007 GGI rankings are: Sweden, 1; Iceland, 4; Germany, 7; U.S., 31. The GGI is correlated with gender ratios of students scoring in the 95th percentile for one international test (the 2003 Programme for International Student Assessment, known as PISA). Hyde and Mertz found that the 2007 GGI is also correlated with the percentage of girls on a country’s IMO teams during the past two decades. They conclude that “gender inequality, not greater male variability, is the primary reason fewer females than males are identified as excelling in mathematics at the high and highest levels in most countries.” As they point out, gender inequality is complex and multi-faceted, and comes in many forms.

In the second part of this article I’ll discuss some forms of inequality that have been ignored, not just by Tierney, but also by Stephen Ceci and Wendy Williams in their book The Mathematics of Sex. In the mean time, if you are looking for something to read about the various forms of inequality, I suggest Claude Steele’s new book Whistling Vivaldi which is about how stereotypes affect us.

[1] For the latter, see here and here.

[2] Jonathan Wai et al., “Sex Differences in the Right Tail of Cognitive Abilities: A 30 Year Examination,” Intelligence, 38 (2010): 412–423.

[3] Rebecca M. Young & Evan Balaban, “Psychoneuroindoctrinology,” Nature, 443 (2006): 12.

[4] Nicole Else-Quest, “Biological Determinism and the Never-ending Quest for Gender Differences,” Psychology of Women Quarterly, 31 (2007): 322–323.

[5] See, e.g., Londa Schiebinger, The Mind Has No Sex?, Harvard University Press, 1989.

[6] Don’t remember the 12 labors of Hercules? See Wikipedia:

[7] This Berkeley study is one example: David Leonard & Jiming Jiang, “Gender Bias and the College Predictions of the SATs: A Cry of Despair,” Research in Higher Education, 40 (1999), 375–407.

[8] Janet S. Hyde & Janet E. Mertz, “Gender, Culture, and Mathematics Performance,” Proceedings of the National Academy of Sciences 106 (2009): 8801–8807. See also, “Culture, Not Biology, Key Factor to Math Gender Gap, UW Researchers Say” in the January–February 2010 AWM Newsletter.

[9] See, “Risk and Opportunity for Women in 21st Century.”

[10] Titu Andreescu et al., “Cross-Cultural Analysis of Students with Exceptional Talent in Mathematical Problem Solving,” November, 2008, Rimer, “Math Skills Suffer in U.S., Study Finds,” New York Times, March 8, 2008.

[11] Thanks to Janet Mertz for this update.


Written by CK

August 30, 2010 at 12:30 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: