Average rating: | Rated 5 of 5. |
Level of importance: | Rated 5 of 5. |
Level of validity: | Rated 5 of 5. |
Level of completeness: | Rated 5 of 5. |
Level of comprehensibility: | Rated 5 of 5. |
Competing interests: | None |
Among other things, this article highlights the need for researchers to attempt avoiding biases in building and disseminating research, to use methods appropriate for the research question(s), to keep their interpretations within the reach of their findings, and to encourage transparency in reporting -- all important calls for budding and seasoned researchers alike.
Khazan et al. (2020) conducted a study on gender bias in student evaluations of teaching (SETs) using students from one course (2 sections online), divided into 4 groups: females students who perceived they had a female teaching assistant or TA (n = 36), females who perceived they had a male TA (n = 35), males who perceived they had a male TA (n = 25), and males who perceived they had a female TA (n = 19). They reported on one hand that the trending results of this study supports literature showing a bias against women in SET, and on the other hand that these same insignificant results may have been due to a perspective shift or absent gender bias in their online student sample.
Uttl and Violo operated under sound assumptions and used appropriate methods when they reconstructed Khazan et al.’s data, discovered the outliers, and re-interpreted the results. Uttl and Violo then found that the trend toward gender bias dissapeared (and seemed on track to reverse) when removing the outliers, refuting Khazan et al.’s generalizations. While some may find a problem with the fact that Uttl and Violo replicate the same methods they criticize, the other side of the coin is that it would be unproductive to criticize the methods without replicating them. This replication and extension with outlier analysis shows that, if the intention of Khazan et al.’s study was hopefully to generalize to the entire population, they should have used an improved method and larger sample, because small samples and outliers are driving these results.
Some may argue that Uttl & Violo’s criticisms of Khazan et al.’s generalizations are based on misunderstanding the meaning of ‘outliers’. If an outlier is considered purely an error in reporting, then it would be true that none of the student evaluations in Khazan et al.’s partial reports or in Uttl and Violo’s reconstructed data were errors. However, if an outlier is considered an extreme report beyond the typical data, but the outlier is still a student report for consideration none-the-less, that outlier is going to skew any distribution with a small sample size. Uttl and Violo make the unmistakable point that removing the outliers causes the (insignificant) gender bias trend to disappear -- not claiming that biases do not exist in general, but rather showing that a claim of gender bias in the population should not be based on a small handful of students in a sample who responded in an extreme way for unknown reasons.
Among other flaws, as argued by Uttl and Violo, is Khazan et al.’s use of photographs and biographies to produce the gender perception conditions is an inappropriate method for the desired manipulation. Photographs and biographies can be used to elicit emotions and attitudes, but in a gender study using manipulation, a researcher would want to take steps to ensure the groups are as similar as possible except for in the difference of gender perception. Eliciting emotions and attitudes would more than likely introduce many more differences than intended in the manipulation.
In essence, Uttl and Violo’s work refutes Khazan et al.’s (2020) claim that their study supports literature reporting evidence of bias against women in SET, and shows how their method is not ideal for the research question. Not examining how outliers and biases change results in a small sample has implications for performance management decisions in these specific organizational contexts. Further, results like Khazan et al.’s are not meant to be generalized as they can impact public perceptions and policy developments out of context. In other words, extreme reports should not be deleted or included without scrutiny, but should be analyzed carefully to see how they drive the main findings, and conclusions should remain in context. Overall, Uttl and Violo’s work is valid, explicit, and useful, and their work highlights that researcher decisions, including the way we handle the research process and knowledge dissemination, can have impacts at multiple individual, organizational, social and political levels.