Save the Phenomena!
Last Halloween, the psychologists Wendy Williams and Stephen Ceci wrote an op-ed in New York Times claiming that “academic science isn’t sexist.” Among other things, they suggest that bias doesn’t occur in hiring, writing of “alleged” hiring bias. In a longer article, Ceci and three co-authors claim that “the evidence in support of biased hiring as a cause of under-representation is not well supported, and even points in the opposite direction.” The same evidence is interpreted as a “female hiring advantage” in Williams and Ceci’s “2:1 Faculty Preference for Women on STEM Tenure Track.”
I think the reason why this evidence “points in the opposite direction” is that Ceci et al. do not “save the phenomena” by accounting for crucial details of the findings they cite. Thus, these findings may not be consistent with the findings of Williams and Ceci’s experimental study. This raises concerns about the ecologically validity of the experimental study, e.g., that it may be not realistic to assume that a strong female applicant will often be described as “creative” or “a powerhouse.”
Here are details.
The first piece of evidence Ceci et al. (2014) give in support of their claim is Finding 3-10 of the National Research Council report Gender Differences at Critical Transitions in the Careers of Science, Engineering, and Mathematics Faculty. In the United States, there are thousands of institutions of higher education. This report focused primarily on Research 1 institutions, also known as “research-intensive institutions” (pp. 24–25).
A study of hiring between 2001 and 2003 at these 89 elite institutions was conducted for the NRC report. It found that:
The percentage of women who were interviewed for tenure-track or tenured positions was higher than the percentage of women who applied. (p. 7)
As featured in Science Magazine, this finding was for “89 universities.”
As featured by Ceci et al., this finding was for “research universities.” Williams and Ceci’s 2015 PNAS article again gives this finding for “research universities” (see Table S6).
Carnegie classifications have changed since the NRC study was conducted. Currently, about 300 universities are classified as “research universities.” These include:
- 108 research universities (very high research activity), e.g., Arizona State University.
- 99 research universities (high research activity), e.g., Auburn University, main campus.
- 90 doctoral/research universities, e.g., Adelphi University.
In contrast, the finding in the NRC report was for “Research 1 Institutions.”
Note that the rates at which women applied for jobs were lower than the rates at which they earned PhDs. (This was the NRC report’s Finding 3-3.) For example, 25% of PhDs in mathematics went to women between 1999 and 2003, but only 20% of the applicants for R1 positions in mathematics were women.
It is important to note that these higher rates of success do not imply favoritism, but may be explained by the possibility that only the strongest female candidates applied for Research I positions. This self-selection by female candidates would be consistent with the lower rates of application by women to these positions. (p. 54, emphasis and color added)
My interpretation of the selection bias hypothesis
I agree (see blog post). Here’s a story to illustrate how selection bias and gender bias might operate to produce these results. Colored text in my story corresponds to colored text in the excerpt above.
Once upon a time, there were ten recent PhDs in Sciencefield. Of these, 30% were female, as shown in the table.
The three Inferiors were identical in their abilities, publications, teaching, and research experience. So were the three Averages and the three Superiors.
Only the strongest female candidate, Susan Superior, applied for the tenure-track job in Sciencefield at Research 1 University. The three Inferiors didn’t apply. Neither did Jane and Jim Average. But Joe Average applied. This, along with the three Superiors, made an applicant pool that was 25% female (lower than the rate at which women earned PhDs in Sciencefield).
Due to gender bias, Susan Superior was rated below Steven and Simon Superior (even though her academic attributes were identical to theirs), but above Joe Average. All three Superiors were invited to interview, thus the interviewees were 33% female (higher than the rate at which women were present in the applicant pool).
However, along with Tom Terrific, Steven and Simon Superior had also applied to other places: Superduper Research 1, Amazing Research 1, and Prestigious University. Each got an offer from one of those universities. Consistent with NRC Findings 3-7 and 3-8, Susan had not applied to those universities because no women were on their search committees. As in the searches conducted in the University of California system (see Table 4 on p. 45 here), many search committees for R1 positions in Sciencefield had no women.
After the interviews at Research 1, Steven and Simon Superior withdrew from the search because they had accepted positions elsewhere. Research 1 offered the position to Susan and she accepted.
We hope that she will live happily ever after there but we aren’t so sure about it. Female faculty at Research 1 report that they are less likely to engage in conversation with their colleagues on a wide range of professional topics. “This distance may prevent women from accessing important information and may make them feel less included and more marginalized in their professional lives,” says the NRC report.
The short general version of this story is: The rate at which women earn PhDs is less than the rate at which women apply for tenure-track R1 positions. Those female applicants tend to be at the top of the distribution of academic abilities. If the distributions of abilities are the same for recent female and male PhDs, or even if the male distribution is skewed right as compared with the female distribution, it is still possible that for those R1 positions for which women apply, applicant pools have higher percentages of more able women.
Note that this story is about findings from a study of hiring between 2001 and 2003. Since then, rates at which women apply for R1 positions may have changed.
Ceci et al.’s interpretation of the selection bias hypothesis
Ceci et al. (2014) quote the NRC selection bias hypothesis, which mentions women’s lower applicant rates, but go on to say:
In the blogosphere, it is frequently suggested that female applicants are of higher quality than males by virtue of having survived a biased-pipeline process that weeded out many more women. (p. 102)
At the end of this sentence is a footnote that gives three examples of this “frequent” suggestion, including my blog post. Williams and Ceci (2015) make a similar comment, substituting “many studies” for “blogosphere” and referring the reader to their 2014 article.
Surviving a “biased-pipeline process,” e.g., getting a PhD or becoming a professor in a STEM field, can indeed be viewed as a form of selection, but it is not the type of selection that I described in my blog post and illustrated above.
Although Ceci et al. use the term “applicants,” their discussion of selection does not include measurements for “applicants to R1 positions.” This is understandable because measurements of the applicants in the NRC study were not available. But, this limitation is not mentioned. Instead, Ceci et al.’s discussion of selection bias considers only averages for various populations, concluding from these averages that “on the whole, there is no evidence for the superiority of either gender applying for tenure-track jobs” (p. 102).
To illustrate this interpretation, the story above might be changed to the following.
Once upon a time, there were nine recent PhDs in Sciencefield. Of these, 33% were female, as shown in the table.
On average, the men and women were equal in their abilities. The three Inferiors were identical in their abilities, publications, teaching, and research experience. So were the three Averages and the three Superiors.
All of the men applied for the tenure-track job in Sciencefield at Research University (not Research 1 University). For some unknown reason Jane Average did not apply, although the other women did. Thus, the applicant pool was 22% female (lower than the rate at which women earned PhDs in Sciencefield) and, on average, the quality of the female applicants was equal to that of the men. The three Superiors were invited to interview, making an interview pool that was 33% female, higher than the rate at which women were present in the applicant pool.
Susan Superior got the first offer. We don’t know if that was due to affirmative action or whether Steven and Simon got offers elsewhere and withdrew from the search.
The short general version of this story is: On average, academic abilities are the same for female and male recent PhDs. On average, academic abilities are the same for female and male applicants for tenure-track positions at research universities. For some reason, women apply for these positions at lower rates than they get PhDs, but their percentages of interviews and first offers are higher than their percentages among applicants.
The second story illustrates what happens if we assume three things:
- the universities in the NRC study were not necessarily Research 1 universities.
- that the percentage of female applicants was smaller than the percentage of female PhDs—consistent with NRC Finding 3-3.
- not all the strongest female candidates applied—inconsistent with the hypothesis of selection bias as formulated in the NRC report.
Because they do not attempt to account for Finding 3-3 or the fact that universities were Research 1 Institutions, Ceci et al. avoid the implausibility of my second story and suggest that the NRC findings concern research universities in general. That may be the case, but there seems to be no reason to draw this conclusion without more evidence.
In fact, there is reason not to draw this conclusion. Faculty demographics can be quite different at different types of institutions, as illustrated in the table below.
Table from Kessel & Nelson, Statistical trends in women’s participation in science: Commentary on Valla and Ceci (2011). Perspectives on Psychological Science, http://pps.sagepub.com/content/6/2.toc
In my opinion, any account of Finding 3-10 needs to save phenomena such as Finding 3-3, the statistical trends shown in the table above, and more current statistics such as the results of the 2013 survey of the American Mathematical Society.