Note: This is an online version of an article in the Association for Women in Mathematics Newsletter, Vol. 40, no. 6, November 2010, which can be downloaded as a pdf here. Part 1 of this article appeared in the September issue of the AWM Newsletter. It can be accessed here, or downloaded as a pdf.
In June, the New York Times published two articles on women in science by John Tierney: “Daring to Discuss Women in Science” (June 8) and “Legislation Won’t Close Gender Gap in Sciences” (June 15). These articles don’t give complete arguments, but they might lead readers to infer what I’ll call Claim 1 and Claim 2.
Claim 1 concerns “new evidence supporting Dr. Summers’s controversial hypothesis about differences in the sexes’ aptitude for math and science.” I’ve discussed it in part 1 of this article. In brief, “support” should be “is consistent with.”
Claim 2 concerns bias. “Careful studies” show that “female scientists fare as well as, if not better than, their male counterparts in receiving academic promotions and research grants.” From this, readers are apparently intended to infer that there is no gender bias in science. Tierney then asks, “So why are women still such a minority in math-oriented sciences?” He answers: There are “biological differences in math aptitude” (cf. Claim 1). However, “differences in aptitude are not the primary cause of the gender gap in academic science,” but “different personal preferences and choices of men and women.” This opinion is attributed to Stephen Ceci and Wendy Williams, two psychologists who have written a book called The Mathematics of Sex: How Biology and Society Conspire to Limit Talented Women and Girls. I’ll abbreviate the latter as MOS.
In my view, there are serious flaws in MOS, Claim 2, and the associated article. Some of the article’s statements are ambiguous and some are wrong. Some sequences of true statements are misleading. I’ve noted such local mistakes, omissions, and ambiguities here.
In this article, I’ll discuss more global issues: fellowship and funding decisions, hiring statistics, and forms of inequality that receive little or no attention from Tierney and MOS. In my view, all of these should be considered in explaining the “different personal preferences and choices of men and women.”
Fellowships and funding. In discussing bias, both Tierney and MOS give a lot of attention to Wold and Wennerås’s study of the Swedish Medical Research Council post-doctoral fellowships. Agnes Wold (an immunologist) and Christine Wennerås (a microbiologist) analyzed 114 applications and their scores. Three factors were independent determinants of “scientific competence” scores: scientific productivity (number of articles, etc.), gender (men received higher scores than women with the same productivity), and nepotism—those affiliated with a review committee member received higher scores than others with the same productivity.
In 1997, the resulting article, “Nepotism and Sexism in Peer Review,” made headlines when it was published in Nature.
“But how representative was that one Swedish study of 114 applicants?,” asks Tierney. He notes that a 2004 follow-up study did not find evidence of bias. MOS is skeptical too, telling readers that gender discrimination during grant applications is “a hypothesis in need of convergent empirical support.”
Neither mentions the “Wold Effect.” In 2000, a European Union report noted, “The study results were devastating for the research community and led to widespread reforms.“ These reforms were not simply knee-jerk responses, but occurred after research councils conducted their own studies.
Both Tierney and MOS contrast Wold and Wennerås’s findings for fellowships with those of large-scale studies of grant proposals—and overlook some details. For example, both discuss the 2005 RAND study of NSF and other federal agencies. As I wrote in 2007,
The RAND study found no gender differences in NSF funding, when the analysis controlled for investigator characteristics such as experience or institution type. However, a cursory glance indicates that the study (large scale), the data (investigator proposing and funding history), and the variables other than gender (e.g., number of investigators, subagency or program, type of grant, funding requested) are quite different from those used by Wold and Wennerås. The report notes, “None of the agencies capture information about the proposals—e.g., topics, scores from peer review—but they do provide information that likely relates to credentials.”
Not only may the data and decade be different, but, as some readers may already have noticed, the processes examined may be different. Grant proposals differ from post-doc applications. For example, letters of reference are a fairly standard requirement for post-doc applications in the U.S. One of the large-scale studies mentioned by Tierney notes,
Gender differences in peer reviews of fellowship applications are somewhat more ambiguous [than for grant applications]. There is a small, but highly statistically significant difference in favor of men. Hence, the juxtaposition between the gender differences for research grants and fellowship applications supports our a priori hypothesis. (italics added)
The hypothesis is: Gender differences will be larger for fellowship applications than for grant applications because
the more concrete information that reviewers have about applicants, the less influence superfluous characteristics such as gender are likely to have. Grant applications are typically written by established researchers with established research track records and place a strong emphasis on research track record as an indication that the proposed research will be fruitful. In contrast, fellowship applications are typically written by early-career researchers.
This study and others suggest that evidence of bias is less likely to appear in evaluation processes that reduce irrelevant information and require the same kinds of information from each applicant. Fellowship applications, hiring, promotion, awards, and honors—processes that are often less structured—are likely to afford more bias. Changing these processes to eliminate irrelevant information, such as how often an applicant smiles, may reduce the effect of evaluators’ biases.
Tierney does not discuss such ideas. MOS concludes that “biases, to the extent they exist, are small”—but not too small to study. Moreover, “even a tiny degree of discrimination or unconscious barriers can be deleterious to women’s progress in the academy” because small biases can accumulate over time, resulting in large differences in outcomes.
Hiring statistics. MOS is based on a research article, the result of a three-year review of 400 studies from seven fields. It was published in what MOS calls “the premier review journal in psychology” in 2009.
When I read the article, I was surprised to see that the first paragraph gives statistics with no date and no source, and says, “Women are not being hired as assistant professors at the rate that they are getting PhD degrees.” That was (for many fields) the finding of a survey conducted in 2002 for the “top 50” departments, but had changed by 2007. Why did an article called “Women’s Underrepresentation in Science” give statistics about women in science that were seven years old, without date or source?
In February of 2009, I sent a note that was forwarded to Ceci and Williams, saying that
the 2007 Nelson Diversity Survey shows the rate by field at which women earned PhDs in 1996–05 and the rate at which they were hired as assistant professors in 2007. The difference in rates is biggest for psychology where it is almost 20 percentage points. In some fields of engineering, the rate at which women are hired as assistant professors is greater than the rate at which they received PhDs. See page 14, Table 11 of the survey which can be downloaded here: http://chem.ou.edu/~djn/diversity/top50.html.
Ceci said they would be incorporated in the book, and they were—sort of (see pp. 7, 41). Presumably this occurred late in production because MOS appeared in August. Thus, I may be indirectly responsible for some of the typos in the book.
In my note, I didn’t mention the top 50 departments. (It’s obvious in the source.) Neither does MOS. Instead, it gives various percentages, some with incorrect descriptors, some with incorrect references, and some with no references. 
Does it matter if the references and numbers are wrong? After all, the numbers are pretty small.
Here’s why you might care.
It’s hard to see statistical trends, if you don’t get the statistics right. An important change (perhaps not yet a trend) found by the National Research Council’s Gender Differences at Critical Transitions in the Careers of Science, Engineering, and Mathematics Faculty as well as the Nelson 2007 Diversity Survey—is that for some fields, including mathematics, proportions of women hired as assistant professors at “top departments” in a given field are now more or less the same as the rate at which they earn PhDs. This is not the case for psychology, “in part the result of many PhDs specialized in clinical/practice and never intending to be academics” explains MOS.
Percentage Women: PhDs, Assistant Professors at Top 50 Departments
|PhDs: 96–05||2002||2007||Yield 2007|
Source: Nelson, pp. 14–16. *2003 data. Yield is percentage assistant professors divided by percentage PhDs.
Like the Wold Effect for the Swedish Medical Fellowships, I suspect that the change in hiring statistics was not due to business as usual. Was it due to more emphasis on relevant information about applicants, family-friendly policies, accommodation for dual-career couples, or improvements in departmental climate? All of these have received increased attention in academe.
Stereotype threat, and other forms of inequality. Another reason to be concerned about sloppy statistics is more subtle—and touches on other aspects of MOS that I find problematic. MOS discusses statistics with respect to achieving parity. It will come as no surprise that Ceci and Williams don’t think that it will occur any time soon. (However, it’s already happened, on average, for mathematics departments in two-year colleges.) In her report, Nelson focuses on critical mass which she puts as 15% to 30%. In her view (and mine), the overall percentage of faculty women in a given department is of interest—and small increases are important. According to Nelson’s 2007 findings, women are, on average, more than 15% of the faculty at the top 100 departments for sociology, psychology, political science, economics, life sciences, and astronomy. Math and computer science are close: 12.9% and 13.2%, respectively.
Why is critical mass important? Many AWM members may think this is obvious. Even if every one of your peers behaves like a colleague, in general, life seems better when you are not the only woman in your class, cohort, or department. Stereotype threat and implicit bias help to explain why.
Tierney characterizes these dismissively as “theories,” not mentioning that they have empirical support.
MOS does not discuss implicit bias directly, nor the gender schemas described by Virginia Valian in her book Why So Slow? It gives short shrift to stereotype threat, focusing mainly on test performance, little on how it may affect a sense of belonging, and not at all about how it may affect physical well being.
Fortunately, the different aspects of stereotype threat together with supporting empirical work are described by the psychologist Claude Steele in his book Whistling Vivaldi. As he details how this research developed over the past twenty years, Steele also illuminates what I find lacking in MOS:
Psychologists focus on the internal, the psychological. If women underperform on a math test, our tendency is to look for a characteristic internal to women that might cause it—the observer’s perspective.
In contrast, from the perspective of a woman taking the test—or a woman in a math department—context may be an important part of the explanation. Steele and his colleagues have shown how the methods of psychology can be used to test such explanations empirically.
Concluding remarks. I am aware that many who read this article are likely to be busy people. I have tried to be brief without being opaque, relegating most details here. But I hope this article illustrates important gaps in Tierney’s and MOS’s accounts. To summarize, an ahistorical observer’s perspective does not suffice.
 The nepotism effect seems quite subtle. Committee members did not review applications from people with whom they were affiliated.
 AWM Newsletter, September–October, 2007, p. 3.
 In this study, differences were measured by odds ratios, the odds of being approved among female applicants divided by the odds of being approved among male applicants.
 I remember thinking at the time that the difference in such rates is not the correct thing to examine, but for psychology it’s certainly eye-catching. The quotient is better (and an example of when division of fractions is useful).
 For example, the numbers in the bar graph on p. 7 don’t correspond with its caption or its description in the text. More details and other examples to come.
 See pp. 41, 187.
 Opinions differ about whether 15% suffices. What’s important for this discussion is that critical mass is not necessarily parity.