+1 Recommend
    • Review: found
    Is Open Access

    Review of 'An Evaluation of Course Evaluations'

    An Evaluation of Course EvaluationsCrossref
    A nice overview of the problems with student evaluations of teaching.
    Average rating:
        Rated 4.5 of 5.
    Level of importance:
        Rated 5 of 5.
    Level of validity:
        Rated 3 of 5.
    Level of completeness:
        Rated 4 of 5.
    Level of comprehensibility:
        Rated 5 of 5.
    Competing interests:

    Reviewed article

    • Record: found
    • Abstract: found
    • Article: found
    Is Open Access

    An Evaluation of Course Evaluations

    Student ratings of teaching have been used, studied, and debated for almost a century. This article examines student ratings of teaching from a statistical perspective. The common practice of relying on averages of student teaching evaluation scores as the primary measure of teaching effectiveness for promotion and tenure decisions should be abandoned for substantive and statistical reasons: There is strong evidence that student responses to questions of “effectiveness” do not measure teaching effectiveness. Response rates and response variability matter. And comparing averages of categorical responses, even if the categories are represented by numbers, makes little sense. Student ratings of teaching are valuable when they ask the right questions, report response rates and score distributions, and are balanced by a variety of other sources and methods to evaluate teaching.

      Review information

      Review text

      The paper provides a nice overview of the problems with student evaluations of teaching. These include statistical issues and problems with the overall approach when gathering and interpreting teaching evaluations. The authors recommend alternative methods, such as peer observations of teaching and creation of teaching portfolios.

      The paper does a good job of making a convincing argument for the ineffectiveness of traditional teaching evaluations. The paper appears to cite a large number of relevant papers related to the limitations of teaching evaluations. The paper is also exceptionally well-written and interesting.

      While the argument against traditional teaching evaluations is rather convincing, there are a few aspects of the paper that somewhat undermine the argument. The tone is often fairly casual, and much of the content is devoted to examples and anecdotes. Some of this is fine, but it seems that this has come at the cost of paying too little attention to the empirical work on teaching evaluations. Much of this work is cited, and all of it appears to be highly relevant to the paper’s thesis, but very little of that work is discussed in detail. There are also some strong statements that are not supported by sound arguments or data. For example, the authors spend much time talking about why teaching evaluations, as currently measured, do not quite measure teaching effectiveness. Then the statement is made students simply cannot rate effectiveness, but it is not clear why this has to be the true. Perhaps it is, but such a strong claim should be explained.

      There is also an imbalance in the discussion about the limitations of teaching evaluations. A large amount of space is given to the statistical problems, but these are simply the same limitations that are always true whenever convenience sampling or measures of central tendency are used. These limitations do still apply and so are worth discussing. But I would have thought it appropriate to devote more space to issues unique to teaching evaluations. Again, there seems to be ample work to discuss in that regard, but relatively little of that discussion takes place.

      I also think the paper could do a better job of explaining why the alternatives to traditional teaching evaluations should be pursued. I think most faculty are aware of the limitations of teaching evaluations. But they continue to use them for lack of a better alternative. The authors’ recap states that it is practical and valuable to have faculty observe each other’s classes and that it is practical and valuable to create and review teaching portfolios. But a case for these claims is not effectively made. The “What is better?” section lacks evidence to support these claims. There is no evidence that teaching portfolios or classroom observations are better in any way than teaching evaluations. They certainly do not seem practical. To be fair, there is a mention of classroom observation taking about four hours, which is debatable as a practical amount of time. But practicality also should include ease of evaluating. The sample letter at the end of the paper provides a case that is fairly easy to assess. But this is a rather unrealistic case. How should faculty evaluate everyday faculty’s teaching and portfolios?


      Comment on this review