Developing a video‐based method to compare and adjust examiner effects in fully nested OSCEs

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Although averaging across multiple examiners’ judgements reduces unwanted overall score variability in objective structured clinical examinations ( OSCE), designs involving several parallel circuits of the OSCE require that different examiner cohorts collectively judge performances to the same standard in order to avoid bias. Prior research suggests the potential for important examiner‐cohort effects in distributed or national examinations that could compromise fairness or patient safety, but despite their importance, these effects are rarely investigated because fully nested assessment designs make them very difficult to study. We describe initial use of a new method to measure and adjust for examiner‐cohort effects on students’ scores.

Methods

We developed video‐based examiner score comparison and adjustment ( VESCA): volunteer students were filmed ‘live’ on 10 out of 12 OSCE stations. Following the examination, examiners additionally scored station‐specific common‐comparator videos, producing partial crossing between examiner cohorts. Many‐facet Rasch modelling and linear mixed modelling were used to estimate and adjust for examiner‐cohort effects on students’ scores.

Results

After accounting for students’ ability, examiner cohorts differed substantially in their stringency or leniency (maximal global score difference of 0.47 out of 7.0 [Cohen's d = 0.96]; maximal total percentage score difference of 5.7% [Cohen's d = 1.06] for the same student ability by different examiner cohorts). Corresponding adjustment of students’ global and total percentage scores altered the theoretical classification of 6.0% of students for both measures (either pass to fail or fail to pass), whereas 8.6–9.5% students’ scores were altered by at least 0.5 standard deviations of student ability.

Conclusions

Despite typical reliability, the examiner cohort that students encountered had a potentially important influence on their score, emphasising the need for adequate sampling and examiner training. Development and validation of VESCA may offer a means to measure and adjust for potential systematic differences in scoring patterns that could exist between locations in distributed or national OSCE examinations, thereby ensuring equivalence and fairness.

Abstract

Finding that scores by different groups of examiners can differ by a whole standard deviation of student ability, the authors offer a video‐based method to address examiner‐cohort effects in OSCEs.

Related collections

Most cited references 41

Record: found
Abstract: found
Article: not found

Statistical methods for assessing agreement between two methods of clinical measurement.

J Bland, D Altman (1986)

In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analysed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

0 comments Cited 2547 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Validating the Interpretations and Uses of Test Scores

Michael Kane (2013)

0 comments Cited 293 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Assessing professional competence: from methods to programmes.

Cees van der Vleuten, Lambert Schuwirth (2005)

We use a utility model to illustrate that, firstly, selecting an assessment method involves context-dependent compromises, and secondly, that assessment is not a measurement problem but an instructional design problem, comprising educational, implementation and resource aspects. In the model, assessment characteristics are differently weighted depending on the purpose and context of the assessment. Of the characteristics in the model, we focus on reliability, validity and educational impact and argue that they are not inherent qualities of any instrument. Reliability depends not on structuring or standardisation but on sampling. Key issues concerning validity are authenticity and integration of competencies. Assessment in medical education addresses complex competencies and thus requires quantitative and qualitative information from different sources as well as professional judgement. Adequate sampling across judges, instruments and contexts can ensure both validity and reliability. Despite recognition that assessment drives learning, this relationship has been little researched, possibly because of its strong context dependence. When assessment should stimulate learning and requires adequate sampling, in authentic contexts, of the performance of complex competencies that cannot be broken down into simple parts, we need to make a shift from individual methods to an integral programme, intertwined with the education programme. Therefore, we need an instructional design perspective. Programmatic instructional design hinges on a careful description and motivation of choices, whose effectiveness should be measured against the intended outcomes. We should not evaluate individual methods, but provide evidence of the utility of the assessment programme as a whole.

0 comments Cited 273 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Peter Yeates:

ORCID: https://orcid.org/0000-0001-6316-4051

p.yeates@keele.ac.uk

Journal

Journal ID (nlm-ta): Med Educ

Journal ID (iso-abbrev): Med Educ

Journal ID (doi): 10.1111/(ISSN)1365-2923

Journal ID (publisher-id): MEDU

Title: Medical Education

Publisher: John Wiley and Sons Inc. (Hoboken )

ISSN (Print): 0308-0110

ISSN (Electronic): 1365-2923

Publication date (Electronic): 21 December 2018

Publication date (Print): March 2019

Volume: 53

Issue: 3 ( doiID: 10.1111/medu.2019.53.issue-3 )

Pages: 250-263

Affiliations

[ ¹ ] Medical School Education Research Group (MERG) Keele University School of Medicine Keele UK

[ ² ] Department of Acute Medicine Fairfield General Hospital Pennine Acute Hospitals NHS Trust Bury UK

[ ³ ] Royal Stoke Hospital University Hospital of North Midlands NHS Trust Stoke on Trent UK

[ ⁴ ] Institute for Primary Care and Health Sciences Keele University Keele UK

[ ⁵ ] School of Education University of Leeds Leeds UK

Author notes

[*] [* ] Correspondence: Peter Yeates, School of Medicine, David Weatherall Building, Keele University, Keele, Staffordshire ST5 5BG, UK. Tel: 00 44 1782 733930; E‐mail: p.yeates@ 123456keele.ac.uk

Author information

Peter Yeates https://orcid.org/0000-0001-6316-4051

Matt Homer https://orcid.org/0000-0002-1161-5938

Article

Publisher ID: MEDU13783

DOI: 10.1111/medu.13783

PMC ID: 6519246

PubMed ID: 30575092

SO-VID: b710baa6-82a7-4aae-9712-ad06f0fabdc0

License:

This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

History

Date received : 22 June 2018

Date revision received : 14 August 2018

Date accepted : 07 November 2018

Page count

Figures: 4, Tables: 1, Pages: 14, Words: 8323

Funding

Funded by: National Institute for Health Research (NIHR) Clinician Scientist Award

Funded by: NIHR

Custom metadata

source-schema-version-number 2.0

component-id medu13783

cover-date March 2019

details-of-publishers-convertor Converter:WILEY_ML3GV2_TO_NLMPMC version:5.6.2.1 mode:remove_FC converted:15.05.2019

ScienceOpen disciplines: Education

Keywords: assessment,osces,assessor variability,psychometrics

Data availability:

ScienceOpen disciplines: Education

Keywords: assessment, osces, assessor variability, psychometrics

Comments

Comment on this article

scite_

Cited by 14

See all cited by

Most referenced authors 704

See all reference authors

Developing a video‐based method to compare and adjust examiner effects in fully nested OSCEs

Read this article at

Abstract

Background

Methods

Results

Conclusions

Abstract

Related collections

Journal of Medical Education Research

Most cited references 41

Statistical methods for assessing agreement between two methods of clinical measurement.

Validating the Interpretations and Uses of Test Scores

Assessing professional competence: from methods to programmes.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 175

Cited by 14

Most referenced authors 704