+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Case for Using the Repeatability Coefficient When Calculating Test–Retest Reliability

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The use of standardised tools is an essential component of evidence-based practice. Reliance on standardised tools places demands on clinicians to understand their properties, strengths, and weaknesses, in order to interpret results and make clinical decisions. This paper makes a case for clinicians to consider measurement error (ME) indices Coefficient of Repeatability (CR) or the Smallest Real Difference (SRD) over relative reliability coefficients like the Pearson’s (r) and the Intraclass Correlation Coefficient (ICC), while selecting tools to measure change and inferring change as true. The authors present statistical methods that are part of the current approach to evaluate test–retest reliability of assessment tools and outcome measurements. Selected examples from a previous test–retest study are used to elucidate the added advantages of knowledge of the ME of an assessment tool in clinical decision making. The CR is computed in the same units as the assessment tool and sets the boundary of the minimal detectable true change that can be measured by the tool.

          Related collections

          Most cited references 12

          • Record: found
          • Abstract: found
          • Article: not found

          Intraclass correlations: uses in assessing rater reliability.

          Reliability coefficients often take the form of intraclass correlation coefficients. In this article, guidelines are given for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges. Relevant to the choice of the coefficient are the appropriate statistical model for the reliability and the application to be made of the reliability results. Confidence intervals for each of the forms are reviewed.
            • Record: found
            • Abstract: found
            • Article: not found

            Measures of reliability in sports medicine and science.

             W. Hopkins (2000)
            Reliability refers to the reproducibility of values of a test, assay or other measurement in repeated trials on the same individuals. Better reliability implies better precision of single measurements and better tracking of changes in measurements in research or practical settings. The main measures of reliability are within-subject random variation, systematic change in the mean, and retest correlation. A simple, adaptable form of within-subject variation is the typical (standard) error of measurement: the standard deviation of an individual's repeated measurements. For many measurements in sports medicine and science, the typical error is best expressed as a coefficient of variation (percentage of the mean). A biased, more limited form of within-subject variation is the limits of agreement: the 95% likely range of change of an individual's measurements between 2 trials. Systematic changes in the mean of a measure between consecutive trials represent such effects as learning, motivation or fatigue; these changes need to be eliminated from estimates of within-subject variation. Retest correlation is difficult to interpret, mainly because its value is sensitive to the heterogeneity of the sample of participants. Uses of reliability include decision-making when monitoring individuals, comparison of tests or equipment, estimation of sample size in experiments and estimation of the magnitude of individual differences in the response to a treatment. Reasonable precision for estimates of reliability requires approximately 50 study participants and at least 3 trials. Studies aimed at assessing variation in reliability between tests or equipment require complex designs and analyses that researchers seldom perform correctly. A wider understanding of reliability and adoption of the typical error as the standard measure of reliability would improve the assessment of tests and equipment in our disciplines.
              • Record: found
              • Abstract: found
              • Article: not found

              Measuring agreement in method comparison studies.

              Agreement between two methods of clinical measurement can be quantified using the differences between observations made using the two methods on the same subjects. The 95% limits of agreement, estimated by mean difference +/- 1.96 standard deviation of the differences, provide an interval within which 95% of differences between measurements by the two methods are expected to lie. We describe how graphical methods can be used to investigate the assumptions of the method and we also give confidence intervals. We extend the basic approach to data where there is a relationship between difference and magnitude, both with a simple logarithmic transformation approach and a new, more general, regression approach. We discuss the importance of the repeatability of each method separately and compare an estimate of this to the limits of agreement. We extend the limits of agreement approach to data with repeated measurements, proposing new estimates for equal numbers of replicates by each method on each subject, for unequal numbers of replicates, and for replicated data collected in pairs, where the underlying value of the quantity being measured is changing. Finally, we describe a nonparametric approach to comparing methods.

                Author and article information

                Role: Editor
                PLoS One
                PLoS ONE
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                9 September 2013
                : 8
                : 9
                [1 ]School of Occupational Therapy and Social Work, Centre for Research into Disability and Society, Curtin University, Perth, Western Australia, Australia
                [2 ]School of Occupational Therapy and Social Work, Curtin Health Innovation Research Institute, Curtin University, Perth, Western Australia, Australia
                [3 ]School of Occupational Therapy, La Trobe University, Melbourne, Vic. Australia
                [4 ]Rehabilitation Medicine, Department of Medicine and Health Sciences (IMH), Faculty of Health Sciences, Linköping University & Pain and Rehabilitation Centre, UHL, County Council, Linköping, Sweden
                [5 ]Department of Community Health and Epidemiology, Dalhousie University, Halifax, Nova Scotia, Canada
                RAND Corporation, United States of America
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: SV RP AEP. Performed the experiments: SV. Analyzed the data: SV RP PA. Contributed reagents/materials/analysis tools: SV AEP. Wrote the manuscript: SV RP TF AEP PA. Critically reviewed submission: TF RP.


                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                This project was funded by the first author's Doctoral scholarship provided by the Centre for Research into Disability and Society and the School of Occupational Therapy and Social Work, Curtin University, Perth, Australia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Research Article



                Comment on this article