+1 Recommend
    • Review: found
    Is Open Access

    Review of 'Gender bias in student evaluation of teaching or a mirage?'

    Gender bias in student evaluation of teaching or a mirage?Crossref
    This article highlights several key issues that budding and seasoned researchers should remember.
    Average rating:
        Rated 5 of 5.
    Level of importance:
        Rated 5 of 5.
    Level of validity:
        Rated 5 of 5.
    Level of completeness:
        Rated 5 of 5.
    Level of comprehensibility:
        Rated 5 of 5.
    Competing interests:

    Reviewed article

    • Record: found
    • Abstract: found
    • Article: found
    Is Open Access

    Gender bias in student evaluation of teaching or a mirage?

    In a recent small sample study, Khazan et al. (2020) examined SET ratings received by one female teaching (TA) assistant who assisted with teaching two sections of the same online course, one section under her true gender and one section under false/opposite gender. Khazan et al. concluded that their study demonstrated gender bias against female TA even though they found no statistical difference in SET ratings between male vs. female TA ( p = .73). To claim gender bias, Khazan et al. ignored their overall findings and focused on distribution of six negative SET ratings and claimed, without reporting any statistical test results, that (a) female students gave more positive ratings to male TA than female TA, (b) female TA received five times as many negative ratings than the male TA, and (c) female students gave most low scores to female TA. We conducted the missing statistical tests and found no evidence supporting Khazan et al.s claims. We also requested Khazan et al.s data to formally examine them for outliers and to re-analyze the data with and without the outliers. Khazan et al. refused. We read off the data from their Figure 1 and filled in several values using the brute force, exhaustive search constrained by the summary statistics reported by Khazan et al.. Our re-analysis revealed six outliers and no evidence of gender bias. In fact, when the six outliers were removed, the female TA was rated higher than male TA but non-significantly so.

      Review information

      This work has been published open access under Creative Commons Attribution License CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

      statistical power,generalization,outliers,small samples,SET,gender bias,student evaluation of teaching,transparency

      Review text

      Among other things, this article highlights the need for researchers to attempt avoiding biases in building and disseminating research, to use methods appropriate for the research question(s), to keep their interpretations within the reach of their findings, and to encourage transparency in reporting -- all important calls for budding and seasoned researchers alike.

      Khazan et al. (2020) conducted a study on gender bias in student evaluations of teaching (SETs) using students from one course (2 sections online), divided into 4 groups: females students who perceived they had a female teaching assistant or TA (n = 36), females who perceived they had a male TA (n = 35), males who perceived they had a male TA (n = 25), and males who perceived they had a female TA (n = 19). They reported on one hand that the trending results of this study supports literature showing a bias against women in SET, and on the other hand that these same insignificant results may have been due to a perspective shift or absent gender bias in their online student sample.

      Uttl and Violo operated under sound assumptions and used appropriate methods when they reconstructed Khazan et al.’s data, discovered the outliers, and re-interpreted the results. Uttl and Violo then found that the trend toward gender bias dissapeared (and seemed on track to reverse) when removing the outliers, refuting Khazan et al.’s generalizations. While some may find a problem with the fact that Uttl and Violo replicate the same methods they criticize, the other side of the coin is that it would be unproductive to criticize the methods without replicating them. This replication and extension with outlier analysis shows that, if the intention of Khazan et al.’s study was hopefully to generalize to the entire population, they should have used an improved method and larger sample, because small samples and outliers are driving these results. 

      Some may argue that Uttl & Violo’s criticisms of Khazan et al.’s generalizations are based on misunderstanding the meaning of ‘outliers’. If an outlier is considered purely an error in reporting, then it would be true that none of the student evaluations in Khazan et al.’s partial reports or in Uttl and Violo’s reconstructed data were errors. However, if an outlier is considered an extreme report beyond the typical data, but the outlier is still a student report for consideration none-the-less, that outlier is going to skew any distribution with a small sample size. Uttl and Violo make the unmistakable point that removing the outliers causes the (insignificant) gender bias trend to disappear -- not claiming that biases do not exist in general, but rather showing that a claim of gender bias in the population should not be based on a small handful of students in a sample who responded in an extreme way for unknown reasons.

      Among other flaws, as argued by Uttl and Violo, is Khazan et al.’s use of photographs and biographies to produce the gender perception conditions is an inappropriate method for the desired manipulation. Photographs and biographies can be used to elicit emotions and attitudes, but in a gender study using manipulation, a researcher would want to take steps to ensure the groups are as similar as possible except for in the difference of gender perception. Eliciting emotions and attitudes would more than likely introduce many more differences than intended in the manipulation.         

      In essence, Uttl and Violo’s work refutes Khazan et al.’s (2020) claim that their study supports literature reporting evidence of bias against women in SET, and shows how their method is not ideal for the research question. Not examining how outliers and biases change results in a small sample has implications for performance management decisions in these specific organizational contexts. Further, results like Khazan et al.’s are not meant to be generalized as they can impact public perceptions and policy developments out of context. In other words, extreme reports should not be deleted or included without scrutiny, but should be analyzed carefully to see how they drive the main findings, and conclusions should remain in context. Overall, Uttl and Violo’s work is valid, explicit, and useful, and their work highlights that researcher decisions, including the way we handle the research process and knowledge dissemination, can have impacts at multiple individual, organizational, social and political levels.


      Comment on this review