Last year, in a fit of enthusiasm, I decided to teach a MSc seminar course around Cordelia Fine's Delusions of Gender: The Real Science of Sex Differences. In this talk, I will outline common misunderstandings which occur when people debate the topic of sex differences in cognition, and I'll make some suggestions for how the quality of discussion can be improved.

The topic of sex differences in cognition can be used to help understand how psychological research is made and consumed in general: Everybody has intuitions and direct experience of the topic, and celebrated research findings often rely on people's prior beliefs as much as they finesse them.

Talk slides as one PDF, Full talk transcript


Every year I run a MSc course called 'Current Issues in Cognitive Neuroscience, which I teach as a discussion class focussed around controversies in psychology and neuroscience. The idea is that we learn a lot about the discipline by looking at areas where people can't agree. I'm extremely grateful to all the students and staff who have contributed the discussions in this class over the years.


Last year we looked at the controversy over sex differences, and we organised our discussions around Cordelia Fine's book. The plan was to take a class to discuss each chapter, read the references cited and critiques and reviews relevant to the chapter topic.

I love Fine's book. I think of it as a sort of Bad Science but for sex differences research. Part of my argument in this talk is that Fine's book, and reactions to it, can show us something important about how psychology is conducted and interpreted. The book has flaws, and some people hate it, and those things too are part of the story about the state of psychological research.


I'll let you in on a secret - Fine's book isn't actually about whether or not sex differences exist.

Here, in her introduction: "There are sex differences in the brain"

Instead, this is a book about how we seek evidence for sex differences, how we interpret and employ that evidence.

You can read it as a case study of how psychology is done in areas where we all have strong intuitions (that is, most areas of psychology)


Yet despite that clear pronouncement in her introduction, how have some people responded to the arguments in Delusions of Gender?

Larry Cahill called Fine and others "anti sex-difference investigators", suggesting that their mission is to disprove the reality of sex differences.


Here's the heading of the article in which he makes that accusation, which positions Fine and others wishing to discredit evidence of sex differences because it might be a threat to equality


Here's grumpy biologist Jerry Coyne reviewing Fine's follow-up book ('Testosterone Rex'). The headline positions Fine as arguing that male and female brains are absolutely identical (which does indeed follow if you reject the reality of sex differences, but is a straw man).


Here's another example, which also illustrates something important about how the media works.

This is a headline for the Guardian on Sarah Ditum's review of Fine's "Testosterone Rex". Now in that book Fine doesn't claim that men and women's brains are not different, and Ditum knows that and doesn't repeat or endorse that claim in the text of her review.

Remember that journalists don't get to choose their own headlines - that's done by a sub-editor. And for this review, the sub-editor has reached back into the store of intuitions and clichés that make up an understanding of the world that is likely to resonate with readers and chosen this heading. The heading misrepresents both Fine and the reviewer.

Source: with original heading, current version


These are not isolated incidents. There is a magnetic power of the topic which seems to draw people into assume they know what Fine is arguing, which makes them issue counters to arguments she doesn't make, and accuse her of asserting views she doesn't hold.

Fine's arguments are more sophisticated than this, and accept them or not, we can do better if we are going to discuss sex differences in the brain. The rest of this talk is some thoughts about how we can improve the standard of debate. My ambition is not to convince you of a particular view of sex differences research, but sketch out some better things to argue about, and tools for arguing.


First off, we can quantify the size of the differences we are talking about. Talking about sex differences as if the only two options are that they exist or that they don't is a crude binary.

It is possible to quantify the size of differences. There are acknowledged statistical tools for quantifying the relationship between two populations. These are known as effect size measures.


Here's a screenshot from a great interactive demonstration by Kristoffer Magnusson of a standardised effect size measure called Cohen's d. Cohen's d is the difference between the average of two distributions given in terms of units of standard deviation of those populations. So, if our populations are men and women then the Cohen's d for any variable is the difference between the average of men on that value and the average of women on that value, divided by a measure of the variability of individuals.

The screenshot shows a Cohen's d of 0.4, which is a medium size effect - larger than many studied in psychology. Note the considerable overlap of the two populations.


So how big are sex differences, in effect size terms? I made a chart.

Let's take a sex difference which is so large it is undeniable and known to everyone - the difference in height between men and women. The Cohen's d for this difference is 1.72 - nearly two standard deviations gap between the average height of men and women. This means that, if you picked a man and a woman at random, there is a 90% probability that the man would be taller than the women.

What about another sex difference - the difference in sexual orientation. If, for the sake of argument, we binarise preference in sexual partner then men are more likely to be attracted to women than to other men (and vice versa for women). If you take some statistics on observed frequency of sexual preferences, you can convert this difference into a Cohen'd d effect size. It comes out at about 1.6. Again this is the kind of difference, which although not universal, is large enough to be foundational to our stereotype of men and women.

Men have larger brain's than women, effect size 1.4, but cognitive differences are of a different order of magnitude. The largest is mental rotation, an effect size of 0.62, meaning that a man picked at random has a 65% chance of being better rotating images in their mind's eye than a random women.

Other cognitive abilities, although they are celebrated as sex differences, have tiny effect sizes: maths (Cohen's d of 0.05), vocabulary (Cohen's d of 0.11)

These are the abilities for which reliable sex differences are found, and the actual differences are far smaller than you would expect given the rhetoric around 'fundamental' differences between men and women

Source: mindhacks.com Sex differences in cognition are small


The difficulty finding reliable sex differences led Janet Hyde to propose the 'Gender Similarities Hypothesis' which that "that males and females are similar on most, but not all, psychological variables”


Hyde, J. S. (2005). The gender similarities hypothesis. American psychologist, 60(6), 581-592

Hyde, J. S. (2014). Gender similarities and differences. Annual review of psychology, 65, 373-398.


Not everybody agrees with this position but the disjunction between the way we talk about sex differences and the paucity of the evidence is informative.

Here's a controversial paper, by Daphne Joel, which set out to look at neuroimaging data and ask if sex differences in the brain are categorical or not.

Consider, by analogy, the male and female genitals. When we speak of the female genitals we have in mind something completely distinct from the male genitals - a categorical distinction, not just one of a small difference or even a number of small differences. We also speak of the 'female brain' and the 'male brain', but when we do so are we right to imply the same kind of categorical distinction?

Joel says "no". There is not a "female brain" template which varies in a number of significant and consistent ways from a "male brain" template.

Responses to this paper have pointed out that it possible, using machine learning, to successful categorise the sex of brain's owner from the neuroimaging data, but the strength of this paper is not to deny the reality of brain sex differences (we already know they are different sizes, for example), but to show that the differences are not categorical. If you have female-typical sized amygdala it does not mean your cerebellum is also female-typical - it's a mosaic of difference which allows probabilistic classification of your sex based on your brain anatomy, but doesn't mean that every area in your brain conforms to the sex-consistent pattern.

Source: mindhacks.com no male and female brain types


Next area for improving the debate: we can recognise the limitations of our intuitions.

Psychology can be hard to think about because aspects of our selves which feel essential - our feelings, interests, or abilities - are also consequences of our biology and social environment. It feels like my preference for wearing trousers is intimately tied up with me and my maleness, but I have to recognise that this is a fact contingent on my particular historical situation. Born a male Highlander or an ancient Roman I'd probably prefer to wear something else. Trouser preference is a trivial example, but things like my preferences in school subjects, games, sexual partners, hobbies are all contingent on my social environment. Maybe not entirely, but the question the psychology of sex differences must address is: to what extent?.

Arguments that "Men just prefer X" or "Women will always want to do Y" may be based on strong intuitions, but they have no explanatory currency.


Key to this is the distinction between causes of sex differences and sex differences in outcomes. We can see, or at least measure, sex differences in outcomes. We see, for example, male superiority in mental rotation, or female superiority in vocabulary. But these are contingent facts - really we want to understand the causal path that lead to these differences in outcome, and causes are both less amenable to direct measurement and likely to be far more complex.

Anyone who asserts that a difference in outcome is definitely due to a biological sex difference has to also be making a bunch of assumptions about the relative strength of biological factors relative to social factors in determining outcomes. We need to spend more time discussing evidence of cause, and justifying those assumptions, rather than arguing about outcomes.


When we discuss biological causes of sex difference we tend simultaneously invoke the idea of innateness - some essential differences built into the biology of being male or female.

But innateness is an incoherent concept which brings along lots of rhetorical punch without a corresponding degree of causal clarity


Here are two papers which I enjoyed on the topic of innateness.

Mameli & Bateson discuss different possible meanings of 'innateness': - does innate mean a trait evolved, or also that it doesn't require any learning or environmental trigger? Does innateness mean that the trait can be linked to specific genes? Or that develops in a wide range of possible environments? All these meanings are implied by different authors, but they are not all the same.

Griffiths, Machery & Linquist argue that although innateness is a commonly invoked concept, when you interrogate most individuals they do not have a clear, consistent or coherent understanding of it. In other words, it is one of those words like God or Capitalism which facilitate agreement and disagreement but cause trouble when you try and pin down the definition.


There'a lovely section in Fine's book, which I think illustrates something important regarding the above points.

It's a part where she's reviews experiments on the development of a brain area called the spinal nucleus of the bulbocavernosus.

Fine motivates this little research tale by asking us to consider a model of how sex differences in behaviour come about. This simple story is that hormone differences lead to differences in the developing brains of boys and girls, which in turn lead to differences in the adult behaviour of men and women.

Source: mindhacks.com hormones, brain and behaviour, a not-so-simple story


So, in a 2009 comment Professor Simon Baron-Cohen appears to invoke this story to explain why - at that point - no woman had one the Field's Medal for mathematics. Boys, he explained, experience a prenatal surge in testosterone which affect pre-birth brain development (and hence their interest in and ability at mathematics). Hence fewer women at the highest levels of mathematics, hence, more or less, no female Field's medal winner.

Diagrammatically this gives us this.

Footnote: RIP Maryam Mirzakhani


That's the speculation, and one which cleaves to many people's intuitions about biology and sex differences.

As a tonic, Fine invites us to consider the role of hormones in the development of the spinal nucleus of the bulbocavernosus (abbreviated SNB).

The SNB is small cluster of neurons at the base of the brain which control the muscles around the penis. It is mostly studied in rats and is a model system for looking at the role of hormones in sex differences. After all, if you'd expect any bit of the brain to display a marked sex differences, a part responsible for penis control would be it.

Do sex differences in hormones, the brain, and penis control play out according to the simple story? Well, male rats do indeed have an enlarged SNB (and presumably enhanced penis control) compared to female rats.


They also have elevated testosterone levels, but the testosterone doesn't directly affect development of the SNB.

One effect is directly on the body (the right branch out from the testosterone box on the diagram). Testosterone stops the nerves controlling penis from disappearing during development (the nerves which will eventually be controlled by the SNB).

Excess testosterone is excreted my male rats in their urine, and this is where the developmental story gets interesting. I've added a line in the diagram to show where causation crosses between the brain/body of the rat pup and the brain/body of the pup's mother.


Rat mothers detect excess testosterone in the urine of male pups and lick their genital area more (you can show this experimentally by brushing female rat pups with testosterone, which provokes the same increase in licking)


This increased licking causes increased sensation in the genitals of the male pups, which encourages the development of the SNB. Note we crossed back from the mother's brain (detecting testosterone) and mother's behaviour (increased licking) to the pup's body (sensation of licking) and brain (enlarged SNB).

This isn't the only pathways by which the SNB of male rats comes to be larger than the that of female rats, but it does make a significant difference in the size of the difference in this area between male and female rats.

The genitals are highly important to evolutionary success, as well as very different between the sexes. It is no surprise that brain regions intimately connected to them are different between the sexes. I believe Fine tells this story to encourage our imaginations - if a difference this fundamental is supported by a developmental story so complex, involving multiple causal pathways, crossing individuals, brain, body and behaviour, even in the rat, how complex must the developmental pathways be that lead to observed sex differences in the behaviour of modern humans?


Fine says that many people rely on a "biology as fall back" assumption when thinking about sex differences: If you can't find obvious social causes, then just assume differences must be due to biology. A lot of Fine's book is devoted to establishing the plausibility of social causes as origins of sex differences, not in disproving the possibility of biological causes.

It seems to me that when you mix biological differences with complex social environments you can have two general schemes. One is that small initial differences, say in males' preference for mathematics or skills at visuospatial cognition, get magnified. In this scheme, differences add up so that a tiny average differences means that, say all the very best people in the world at a certain thing are of one gender.

The other scheme is that small initial differences wash out. For domains without positive feedback loops, where other forces are at play, any initial differences between the sexes are swamped by other factors.

We all have intuitions about which behaviours and which abilities might fall under which category, but until you can develop a plausible justification why those intuitions hold they are merely intuitions.


Debates over sex differences may be marked by the extent to which people makes claims beyond the warrant of the evidence. Compounding this problem, as with most of psychology, the evidence base is not reliable. This is something that has reached widespread recognition as part of what has been called ,the replication crisis', but which I prefer to call the methods renaissance.

Delusions of Gender was published in 2010, and so written before awareness of the problems with the evidence based of psychology became so widespread.


Let's look at this through the lens of one particular topic discussed in the book, one I've recently done some work on: stereotype threat.

Stereotype threat is when your awareness of a negative stereotype about you negatively affects your performance. So, for example, experiments have been done in which people declare their sex before completing a maths test, with the result - reportedly - that girls perform worse than in tests in which their female identity isn't made salient beforehand.

Stereotype threat is invoked in the book as one possible explanation for lower female performce in domains which are stereotypically masculine (such as maths). The phenomenon is also an example of why it isn't straightforward to move from an observed difference in test scores to certainty about essential sex differences in ability.

Stereotype threat is a fashionable topic - with something like $28.7 million of funding grants from the US National Science Foundation issued to projects mentioning the phrase.


And all this for a phenomenon which was only first named in the mid-90s.

Here's my graph of journal publications with various social-psychological phrases in the title, by year. Look at "Stereotype threat", in light blue, leaping from nothing to over 100 articles a year.

See also: mindhacks.com neurotransmitter fashion


Chess is a stereotypically masculine activity, and - as you might expect - stereotype threat has been reported for female chess players when they play, or think they are playing, men.


Maass, A., D’Ettole, C., & Cadinu, M. (2008). Checkmate? the role of gender stereotypes in the ultimate intellectual sport. European Journal of Social Psychology, 38 (2), 231–245.

Rothgerber, H., & Wolsiefer, K. (2014). A naturalistic study of stereotype threat in young female chess players. Group Processes & Intergroup Relations, 17 (1), 79–90.


But is this a robust finding? The two studies on stereotype threat and chess have relatively small sample sizes.

More generally, doubts have been raised about the reliability of the stereotype threat phenomenon. Mixed results have been reported in larger trials, and some people - such as Flore & Wicherts (2015) - report that the literature is affected by publication bias. If only positive findings get published, stereotype threat may be far less reliable or general than it would appear from the literature.


To test this, I took 5.5 million games of tournament chess played between 2008 and 2015 (that's all of the games recognised by the international chess body FIDE).

To see if women under perform when playing men I took games between two men ('MM') as the baseline and looked at how the difference in their ratings (x-axis) affected game outcome. I then calculated the difference from expected outcome (y-axis) when a women played another woman ('FF') or when a woman played a man ('FM' or 'MF', depending on which player was playing white).

This graph here shows the predictions if stereotype threat holds. The black line is the baseline, when men play men. The blue line is the difference from expected outcome when women play women. Stereotype threat shouldn't operate in this situation, so the blue and black lines are showing roughly overlapping. A prediction for when a women plays a man is shown in red. The left half of the curve is when the man has a higher rating than the women, so we can say the game is more challenging. Theory says that stereotype threat should operate in this circumstance to reduce women's performance.

What do we actually see in the data?


Women outperform expectations when they play men - the red curve is above the baseline, not below (the blue, 'FF', curve overlaps the baseline as we expect).

There more to this result, which you can read about in the paper, but the bottom line is that the stereotype threat phenomenon doesn't affect women's performance in stereotyped domains as consistently as you would predict from the literature.

I think Fine's argument survives these criticisms, since her main point is that other factors can affect observed differences than innate ability, and that women can be systematically disadvantaged by social arrangements. But the currently wobbly state of empirical results across the whole gamut of sex difference research should give everyone pause if they are basing key arguments on single results or simple phenomenon.

Reference: Stafford, T. (in press). Female chess players outperform expectations when playing men. Psychological Science. https://psyarxiv.com/bpy3t/


Part of the reason the debate on sex differences generates so much more heat than light is that, I believe, people passionately state their beliefs without taking time to describe or define the positions they are arguing against. A risk of this is that you can waste time vehemently asserting something that nobody disagrees with, and caricature your rhetorical opponents according to the most simplistic, naive or poorly phrased version of what they think.


Fine, to her credit, adopts a good strategy for avoiding this. When she wants to argue that a point of view is wrong, she takes the time to identify people who subscribe to that point of view, and then quotes them extensively.

This is in marked contrast to Fine's own critics, who often criticise a point of view they ascribe to Fine without first showing that she does indeed hold this view (see slides 5-9 for examples).


Related to this, my final piece of advice is based on the observation I started with.


We all learn an instinct for compromise. If one person says one thing and another person says another our first suspicion is that the truth is somewhere in the middle. You can use this reasonable habit to derail any proper understanding in your audience.

Do this by orientating debate around two extreme poles, with the conclusion you want in the middle.

So, faced with debate between "There are LARGE sex differences in cognition" and "There are NO sex differences in cognition", everybody must suspect that the truth is somewhere in the middle. Specifically this means "There are SOME sex differences in cognition". Then, if you claim this you can make anyone disagreeing with you look like they support one of the ridiculous poles of the continuum, and make yourself look like the voice of reasonableness.

This is why Fine's is so often mischaractertised. She has written a clear, well argued book, but the readers of this book often picture the rhetorical terrain in a way that makes disagreeing with the way sex difference research is typically intepreted seem like a vote for the unreasonable pole: "There are no sex differences."


This is a better understanding of Fine's book. The rhetorical poles are "Sex differences in cognition ARE systematically exaggerated" and "Sex differences in cognition are NOT systematically exaggerated".

Now, the reasonable middle ground is revealing rather that misleading: yes, there is some exaggeration of how large, biological and unavoidable sex differences are.

Fine's book is asking us to pause for a moment in endorsing our intuitions about stereotyped differences between men and women. Pause in our confident assumption that we know how large and inevitable those differences are. Some differences will be impossible to change, but other differences are contingent. They are due to vagaries of our history, society, upbringing and culture. These could be changed - some should be changed. The evidence isn't clear, our intuitions aren't as reliable as they tell us they are.

Progress requires higher standards of evidence, and higher standards of debate.


Andrew Wilson was kind enough to invite me to give this talk at Leeds Beckett on 9th May 2017. Thanks to him and to the staff and students who were kind enough to listen to and discuss the talk.

In Sheffield I owe a lot to the students on PSY6316 who spent a semester thinking, discussing and writing about this topic. My thanks also to everyone who came to the seminar in Sheffield where I gave an updated version of this talk, on 24th October 2017.

Talk slides as one PDF, Full talk transcript

Thanks to Matt Webb who shared his scripts, which I modified to make these webpages.

My academic webpages: tomstafford.staff.shef.ac.uk/

Twitter: @tomstafford