Skip to main content
Research and Innovation

Online Students Give Instructors Higher Marks If They Think Instructors Are Men

Photo credit: North Carolina State University

For Immediate Release

Lillian MacNell

A new study shows that college students in online courses give better evaluations to instructors they think are men – even when the instructor is actually a woman.

“The ratings that students give instructors are really important, because they’re used to guide higher education decisions related to hiring, promotions and tenure,” says Lillian MacNell, lead author of a paper on the work and a Ph.D. student in sociology at NC State. “And if the results of these evaluations are inherently biased against women, we need to find ways to address that problem.”

To address whether students judge female instructors differently than male instructors, the researchers evaluated a group of 43 students in an online course. The students were divided into four discussion groups of 8 to 12 students each. A female instructor led two of the groups, while a male instructor led the other two.

However, the female instructor told one of her online discussion groups that she was male, while the male instructor told one of his online groups that he was female. Because of the format of the online groups, students never saw or heard their instructor.

At the end of the course, students were asked to rate the discussion group instructors on 12 different traits, covering characteristics related to their effectiveness and interpersonal skills.

“We found that the instructor whom students thought was male received higher ratings on all 12 traits, regardless of whether the instructor was actually male or female,” MacNell says. “There was no difference between the ratings of the actual male and female instructors.”

In other words, students who thought they were being taught by women gave lower evaluation scores than students who thought they were being taught by men. It didn’t matter who was actually teaching them.

The instructor that students thought was a man received markedly higher ratings on professionalism, fairness, respectfulness, giving praise, enthusiasm and promptness.

“The difference in the promptness rating is a good example for discussion,” MacNell says. “Classwork was graded and returned to students at the same time by both instructors. But the instructor students thought was male was given a 4.35 rating out of 5. The instructor students thought was female got a 3.55 rating.”

The researchers view this study as a pilot, and plan to do additional research using online courses as a “natural laboratory.”

“We’re hoping to expand this approach to additional courses, and different types of courses, to determine the size of this effect and whether it varies across disciplines,” MacNell says.

The paper, “What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching,” was published online Dec. 5 in the journal Innovative Higher Education. Co-authors are Dr. Adam Driscoll of the University of Wisconsin-La Crosse and Dr. Andrea Hunt of the University of North Alabama. Driscoll and Hunt received their doctoral degrees from NC State.


Note to Editors: The study abstract follows.

“What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching”

Authors: Lillian MacNell, North Carolina State University; Adam Driscoll, University of Wisconsin-La Crosse; and Andrea N. Hunt, University of North Alabama

Published: Dec. 5, Innovative Higher Education

DOI: 10.1007/s10755-014-9313-4

Abstract: Student ratings of teaching play a significant role in career outcomes for higher education instructors. Although instructor gender has been shown to play an important role in influencing student ratings, the extent and nature of that role remains contested. While difficult to separate gender from teaching practices in person, it is possible to disguise an instructor’s gender identity online. In our experiment, assistant instructors in an online class each operated under two different gender identities. Students rated the male identity significantly higher than the female identity, regardless of the instructor’s actual gender, demonstrating gender bias. Given the vital role that student ratings play in academic career trajectories, this finding warrants considerable attention.

Leave a Response

Your email address will not be published. All fields are required.

  1. LOL at all the comments denying the existence of sexism

    Everyone is equal and there are no problems in society, stop being so divisive! hahaha

  2. Instead of saying “the results must be false because of small sample size”, we should say “how interesting, let’s try this with a larger sample size and see if the results still hold”.

    Those of you criticizing the instructors for not being double-blinded, consider the promptness rating. The homework was returned at the same time.

  3. Yikes if I have an N of less than 1000 for this kind of loosy goosy study- I would be embarassed to talk about it. Woah!

    1. What study would you run? Keep in mind that social science research (especially with human subjects) takes a long time to complete. They’re not experimenting with flies here…

      Also, how “loosy goosy” was it when Dr. Neil Barlett discovered that the so-called inert gases (particularly xenon) weren’t actually inert and could undergo reactions? I’m sure people thought he was crazy, but then he truly revolutionized how chemists view the noble gases. The best research often starts with some crazy idea, but often, the idea develops along the way into something that’s actually quite interesting.

  4. ‘Bill’ says: “A press release with a sample size of four? Is this what we now consider sociological research? I notice there are two female authors and one male author. From this I infer that NCSU is sexist against men. Sample size of 3.”
    Bill can’t read. There were 43 students divided into 4 sections. You could view the treatment groups as those where the instructor was not labeled by his or her actual name, and the control groups as those where they were. That’s about 10-11 subjects per section, or 20-22 subjects in each of treatment and control groups. That’s not 4 subjects, Bill.
    I won’t say that this study is perfect by any means, but pooh-poohing an article written by women without having the grace to actually read it is hardly scholarly behavior. It would appear you exhibit exactly the type of bias these authors are describing.

  5. This is a very interesting research topic. It did not mention from what I could read though, how many of the students were 18-20 year old young girls who of course would prefer an imaginary, potentially handsome male professor (if they were heterosexual girls) vs. 18-20 guys. Does the rest of the study give this detail?

    I think it is worthy of much more consideration. How to overcome it is even more important to understand. It may not be possible.

  6. Faculty Evaluations, in general, have not been tested for either validity, credibility or reliability. I wish we could know whether or not the study was focused on an instrument that actually measured what it states it’s supposed to measure.

  7. What about comparing that data to a group with ‘Professor X’ whom they’re never told if it’s a he or she? What kind of score does that teacher get? And does that type of result get replicated in classrooms with professors working face to face vs online? I think the work shows that greater research needs to be done but doesn’t really prove anything – it seems more of a piece of a greater puzzle.

    1. Ashley, this is a great idea! If you have a large class and never reveal the gender of the professor (perhaps by using the name of an organization so that the students are less aware that they are being studied), then you could do the evaluations and ask at the very end about the perceived gender of the professor and look for patterns of scores where the perceived gender was indicative of a favorable score.

      1. I’m not sure that I agree. Students like to imagine that they know something about the personal life of the faculty member. If the gender of the faculty member were anonymous, it would be impossible for the student to feel a connection to the faculty member because so many things are triggered by knowing a name and some details about another. My guess would be that they rate anonymous instructor lower than male of female.

  8. I agree that this study is way too small a sample size to be usefull on its own. However it does raise the important issue of teaching assesment in universities. In my institution student feedback is the only measure of teaching performance. Hiring and firing decisions are based on it. Strangely enough there is a correlation between those students who get the best marks and those who give high marks on teaching feedback scales.

  9. did the study really have only four instructors and the same two claiming to be male? if so then it would be very easy for the two ‘male’ instructors to be truly better. Similar findings are replicated elsewhere, so I believe the findings, but as a stand-alone study this lacks validity. it’s a good pilot study, perhaps, but it definitely needs follow up.

  10. How many men and women were in each group?
    Since students had email of instructor, they could easily find sex of instructor. Lots of questions!!!!

  11. what was the gender ratio of the subjects? It would be interesting to know, for example, if male students were more likely to give higher ratings to perceived male instructors, etc.

  12. Yes, keep up the good work. Exactly what many of us have suspected for a long time. I’d be interested to also know how age, ethnicity and disability affect the outcome of ‘evaluations’. I’d like to bet that a young, foreign, female academic has a harder time establishing her epistemic authority than her older or younger, indigenous, male counterpart – as long as he is not too old, of course.

  13. As someone who has spent much time in academia, I strongly urge you to investigate teaching styles. Even a feminist such as myself, preferred male instructors and found this alarming. Why? They were easier to deal with, fretted less about the small stuff, usually were better about organizing the course, and consistent with their feedback. Some female instructors had those characteristics too, but I found them most often in males.

  14. And now I have enough evidence that you only publish positive feedback. Amazing – on the same page that you promote your study of biased student responses, you bias the responses!

  15. Perhaps one improvement to the method would be to introduce the fiction that a woman teaches the first half of the course and a man teaches the second half of the course. That really more directly gets at the biases of the students more directly.

  16. Your conclusions are based on too small of a sample size. Do your homework and spend a year getting more data. Readers must feel that you got the result that you wanted, so you stopped. We try to teach students to wait and publish only what is statistically meaningful, and there are mathematical rules that define meaningful.

    1. I totally agree with you James, I know they said it was a pilot study but I just don’t think nc state should post about it until some more research is done. That’s journalism for ya.

    2. You are almost correct. There aren’t “mathematical rules” per se that define what meaningful is, as those rules are mathematical theorems and axioms… I believe you are intending to say “statistics”, which actually does define what is statistically significant. And this isn’t case of “to-may-to, to-mah-to” – the two fields are pretty distinct, even though there is considerable overlap.

      That being said, just because something is not statistically significant doesn’t mean that it’s not meaningful. There’s lots of research that’s published where the null hypothesis isn’t rejected. That doesn’t mean that this research is any less valuable than studies that did reject their null hypotheses. Also keep in mind that there are standard procedures when working with small sample sizes. In fact, people research this and teach methodology courses on how to do it. Further, a lot of research begins with small sample sizes because it’s super experimental. Once you get a general idea, only then can the study be ramped up.

      As for this study, they could have not written an immediate release, but remember that when an exciting result is found (e.g. the Higgs boson being discovered), researchers will want to share this with others in the academic community. I don’t think you can fault these guys for sharing.

  17. I am not surprised by the findings. Gender inequity is more subtle than is used to be but it is alive and well. The young women I meet at a local college are very aware of it and openly speak about it.

  18. I want to thank the authors of this study for looking into one of the everyday challenges that female instructors face in the classroom. Gender bias is pervasively present and manifests in multiple ways such as students’s resistance to address female professors by their academic title, engage in original teaching approaches, and simply accept us, female profs, as figures of authority.

  19. Serious scientists at respectable institutions would be embarrassed if they published conclusions based on such a small sample size. Do your homework. Readers must feel that you got the result that you wanted, so you stopped. We try to teach students to wait and publish only what is statistically meaningful, and there are mathematical rules that define meaningful.

  20. I would like to know, how many of the students in the study were male, and how many were female? If there was not an equal number of female and male in the study then this study is flawed.

  21. A press release with a sample size of four? Is this what we now consider sociological research? I notice there are two female authors and one male author. From this I infer that NCSU is sexist against men. Sample size of 3.

  22. How on earth did this get published? The methodological errors in this are dreadful! Sample sizes of 8-12 students and 2 professors are woefully small, for starters (especially given how many other parameters there can be – time of day for assignments, covariates of other class interactions etc. There are likely more parameters than data points by this stage!). Then, why weren’t the professors blinded to the gender they were presenting to the students? How do we know they didn’t (even subconsciously) bias the results in their interactions? It’s poor science like this that lets the rest of us in sociology down, and makes us look like poor scientists when compared with our colleagues. Hiding behind a “pilot study” doesn’t count either – if you’re going to blare this out loud at least try to have some legitimacy! Someone’s advisor needs to start giving some advice…