Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Bias of clinical examiners against some types of candidate, based on characteristics such as sex or ethnicity, would represent a threat to the validity of an examination, since sex or ethnicity are ‘construct-irrelevant’ characteristics. In this paper we report a novel method for assessing sex and ethnic bias in over 2000 examiners who had taken part in the PACES and nPACES (new PACES) examinations of the MRCP(UK).

Method

PACES and nPACES are clinical skills examinations that have two examiners at each station who mark candidates independently. Differences between examiners cannot be due to differences in performance of a candidate because that is the same for the two examiners, and hence may result from bias or unreliability on the part of the examiners. By comparing each examiner against a ‘basket’ of all of their co-examiners, it is possible to identify examiners whose behaviour is anomalous. The method assessed hawkishness-doveishness, sex bias, ethnic bias and, as a control condition to assess the statistical method, ‘even-number bias’ (i.e. treating candidates with odd and even exam numbers differently). Significance levels were Bonferroni corrected because of the large number of examiners being considered.

Results

The results of 26 diets of PACES and six diets of nPACES were examined statistically to assess the extent of hawkishness, as well as sex bias and ethnicity bias in individual examiners. The control (odd-number) condition suggested that about 5% of examiners were significant at an (uncorrected) 5% level, and that the method therefore worked as expected. As in a previous study ( BMC Medical Education, 2006, 6:42), some examiners were hawkish or doveish relative to their peers. No examiners showed significant sex bias, and only a single examiner showed evidence consistent with ethnic bias. A re-analysis of the data considering only one examiner per station, as would be the case for many clinical examinations, showed that analysis with a single examiner runs a serious risk of false positive identifications probably due to differences in case-mix and content-specificity.

Conclusions

In examinations where there are two independent examiners at a station, our method can assess the extent of bias against candidates with particular characteristics. The method would be far less sensitive in examinations with only a single examiner per station as examiner variance would be confounded with candidate performance variance. The method however works well when there is more than one examiner at a station and in the case of the current MRCP(UK) clinical examination, nPACES, found possible sex bias in no examiners and possible ethnic bias in only one.

Related collections

Most cited references 11

Record: found
Abstract: found
Article: found

Is Open Access

Ethnicity and academic performance in UK trained doctors and medical students: systematic review and meta-analysis

Katherine Woolf, Henry Potts, Ian McManus (2011)

Objective To determine whether the ethnicity of UK trained doctors and medical students is related to their academic performance. Design Systematic review and meta-analysis. Data sources Online databases PubMed, Scopus, and ERIC; Google and Google Scholar; personal knowledge; backwards and forwards citations; specific searches of medical education journals and medical education conference abstracts. Study selection The included quantitative reports measured the performance of medical students or UK trained doctors from different ethnic groups in undergraduate or postgraduate assessments. Exclusions were non-UK assessments, only non-UK trained candidates, only self reported assessment data, only dropouts or another non-academic variable, obvious sampling bias, or insufficient details of ethnicity or outcomes. Results 23 reports comparing the academic performance of medical students and doctors from different ethnic groups were included. Meta-analyses of effects from 22 reports (n=23 742) indicated candidates of “non-white” ethnicity underperformed compared with white candidates (Cohen’s d=−0.42, 95% confidence interval −0.50 to −0.34; P<0.001). Effects in the same direction and of similar magnitude were found in meta-analyses of undergraduate assessments only, postgraduate assessments only, machine marked written assessments only, practical clinical assessments only, assessments with pass/fail outcomes only, assessments with continuous outcomes only, and in a meta-analysis of white v Asian candidates only. Heterogeneity was present in all meta-analyses. Conclusion Ethnic differences in academic performance are widespread across different medical schools, different types of exam, and in undergraduates and postgraduates. They have persisted for many years and cannot be dismissed as atypical or local problems. We need to recognise this as an issue that probably affects all of UK medical and higher education. More detailed information to track the problem as well as further research into its causes is required. Such actions are necessary to ensure a fair and just method of training and of assessing current and future doctors.

0 comments Cited 106 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A systematic review of the reliability of objective structured clinical examination scores.

Matthew Prewett, Erin M Brannick, H. Korkmaz (2011)

The objective structured clinical examination (OSCE) is comprised of a series of simulations used to assess the skill of medical practitioners in the diagnosis and treatment of patients. It is often used in high-stakes examinations and therefore it is important to assess its reliability and validity. The published literature was searched (PsycINFO, PubMed) for OSCE reliability estimates (coefficient alpha and generalisability coefficients) computed either across stations or across items within stations. Coders independently recorded information about each study. A meta-analysis of the available literature was computed and sources of systematic variance in estimates were examined. A total of 188 alpha values from 39 studies were coded. The overall (summary) alpha across stations was 0.66 (95% confidence interval [CI] 0.62-0.70); the overall alpha within stations across items was 0.78 (95% CI 0.73-0.82). Better than average reliability was associated with a greater number of stations and a higher number of examiners per station. Interpersonal skills were evaluated less reliably across stations and more reliably within stations compared with clinical skills. Overall scores on the OSCE are often not very reliable. It is more difficult to reliably assess communication skills than clinical skills when considering both as general traits that should apply across situations. It is generally helpful to use two examiners and large numbers of stations, but some OSCEs appear more reliable than others for reasons that are not yet fully understood. © Blackwell Publishing Ltd 2011.

0 comments Cited 76 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Assessment of examiner leniency and stringency ('hawk-dove effect') in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling

IC McManus, M. M. Thompson, J Mollon … (2006)

Background A potential problem of clinical examinations is known as the hawk-dove problem, some examiners being more stringent and requiring a higher performance than other examiners who are more lenient. Although the problem has been known qualitatively for at least a century, we know of no previous statistical estimation of the size of the effect in a large-scale, high-stakes examination. Here we use FACETS to carry out a multi-facet Rasch modelling of the paired judgements made by examiners in the clinical examination (PACES) of MRCP(UK), where identical candidates were assessed in identical situations, allowing calculation of examiner stringency. Methods Data were analysed from the first nine diets of PACES, which were taken between June 2001 and March 2004 by 10,145 candidates. Each candidate was assessed by two examiners on each of seven separate tasks. with the candidates assessed by a total of 1,259 examiners, resulting in a total of 142,030 marks. Examiner demographics were described in terms of age, sex, ethnicity, and total number of candidates examined. Results FACETS suggested that about 87% of main effect variance was due to candidate differences, 1% due to station differences, and 12% due to differences between examiners in leniency-stringency. Multiple regression suggested that greater examiner stringency was associated with greater examiner experience and being from an ethnic minority. Male and female examiners showed no overall difference in stringency. Examination scores were adjusted for examiner stringency and it was shown that for the present pass mark, the outcome for 95.9% of candidates would be unchanged using adjusted marks, whereas 2.6% of candidates would have passed, even though they had failed on the basis of raw marks, and 1.5% of candidates would have failed, despite passing on the basis of raw marks. Conclusion Examiners do differ in their leniency or stringency, and the effect can be estimated using Rasch modelling. The reasons for differences are not clear, but there are some demographic correlates, and the effects appear to be reliable across time. Account can be taken of differences, either by adjusting marks or, perhaps more effectively and more justifiably, by pairing high and low stringency examiners, so that raw marks can be used in the determination of pass and fail.

0 comments Cited 45 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Med Educ

Journal ID (iso-abbrev): BMC Med Educ

Title: BMC Medical Education

Publisher: BioMed Central

ISSN (Electronic): 1472-6920

Publication date Collection: 2013

Publication date (Electronic): 30 July 2013

Volume: 13

Page: 103

Affiliations

[1 ]Academic Centre for Medical Education, Division of Medical Education, University College London, Gower Street, London WC1E 6BT, UK

[2 ]Research Department of Clinical, Educational and Health Psychology, Division of Psychology and Language Sciences, University College London, Gower Street, London WC1E 6BT, UK

[3 ]Examinations Department, MRCP(UK) Central Office, 11 St. Andrews Place, Regent’s Park, London NW1 4LE, UK

[4 ]College of Medicine and Veterinary Medicine, The University of Edinburgh, The Queen’s Medical Research Institute, 47 Little, Crescent, Edinburgh EH16 4TJ, France

Article

Publisher ID: 1472-6920-13-103

DOI: 10.1186/1472-6920-13-103

PMC ID: 3737060

PubMed ID: 23899223

SO-VID: 3c16722d-9ef3-498e-8545-a150b4bf4866

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 16 January 2013

Date accepted : 24 July 2013

Comments

Comment on this article

scite_

Cited by 15

See all cited by

Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations

Read this article at

Abstract

Background

Method

Results

Conclusions

Related collections

Pneumonia, sex, and the environment

Most cited references 11

Ethnicity and academic performance in UK trained doctors and medical students: systematic review and meta-analysis

A systematic review of the reliability of objective structured clinical examination scores.

Assessment of examiner leniency and stringency ('hawk-dove effect') in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 72

Cited by 15