+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Statistical Methods Used to Test for Agreement of Medical Instruments Measuring Continuous Variables in Method Comparison Studies: A Systematic Review

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.



          Accurate values are a must in medicine. An important parameter in determining the quality of a medical instrument is agreement with a gold standard. Various statistical methods have been used to test for agreement. Some of these methods have been shown to be inappropriate. This can result in misleading conclusions about the validity of an instrument. The Bland-Altman method is the most popular method judging by the many citations of the article proposing this method. However, the number of citations does not necessarily mean that this method has been applied in agreement research. No previous study has been conducted to look into this. This is the first systematic review to identify statistical methods used to test for agreement of medical instruments. The proportion of various statistical methods found in this review will also reflect the proportion of medical instruments that have been validated using those particular methods in current clinical practice.


          Five electronic databases were searched between 2007 and 2009 to look for agreement studies. A total of 3,260 titles were initially identified. Only 412 titles were potentially related, and finally 210 fitted the inclusion criteria. The Bland-Altman method is the most popular method with 178 (85%) studies having used this method, followed by the correlation coefficient (27%) and means comparison (18%). Some of the inappropriate methods highlighted by Altman and Bland since the 1980s are still in use.


          This study finds that the Bland-Altman method is the most popular method used in agreement research. There are still inappropriate applications of statistical methods in some studies. It is important for a clinician or medical researcher to be aware of this issue because misleading conclusions from inappropriate analyses will jeopardize the quality of the evidence, which in turn will influence quality of care given to patients in the future.

          Related collections

          Most cited references 55

          • Record: found
          • Abstract: found
          • Article: not found

          Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed.

          Results of reliability and agreement studies are intended to provide information about the amount of error inherent in any diagnosis, score, or measurement. The level of reliability and agreement among users of scales, instruments, or classifications is widely unknown. Therefore, there is a need for rigorously conducted interrater and intrarater reliability and agreement studies. Information about sample selection, study design, and statistical analysis is often incomplete. Because of inadequate reporting, interpretation and synthesis of study results are often difficult. Widely accepted criteria, standards, or guidelines for reporting reliability and agreement in the health care and medical field are lacking. The objective was to develop guidelines for reporting reliability and agreement studies. Eight experts in reliability and agreement investigation developed guidelines for reporting. Fifteen issues that should be addressed when reliability and agreement are reported are proposed. The issues correspond to the headings usually used in publications. The proposed guidelines intend to improve the quality of reporting. Copyright © 2011 Elsevier Inc. All rights reserved.
            • Record: found
            • Abstract: found
            • Article: not found

            Validation and reproducibility of food frequency questionnaire for Korean genome epidemiologic study.

            To evaluate validity and reliability of the food-frequency questionnaire (FFQ) developed for the Korean Genome Epidemiologic Study (KoGES). FFQ was administered twice at 1-year interval (first FFQ (FFQ1) at the beginning and second FFQ (FFQ2) at the end of the study) and diet records (DRs) were collected for 3 days during each of the four seasons from December 2002 to May 2004 for those who attended the health examination center. At the end of the study period, we collected the 12-day DRs of 124 participants. The nutrient intakes from the DRs were compared with both FFQ1 and FFQ2. The intakes of energy and some nutrients estimated from FFQ1 and FFQ2 were different from those assessed by the DRs. Especially, the consumption of carbohydrates was higher in FFQ1 and FFQ2 than in the DRs. The de-attenuated, age, sex and energy intake adjusted correlation coefficients between the FFQ2 and the 12-day DRs in Korean population ranged between 0.23 (Vitamin A) and 0.64 (carbohydrate). The median for all nutrients was 0.39. The correlations were similar when we compared nutrient densities of both methods. Joint classification of calorie-adjusted nutrient intakes assessed by FFQ2 and 12-day DRs by quartile ranged from 25.8% (vitamin A) to 39.5% (carbohydrate, iron) for exact concordance. Except vitamin A, the proportion of subjects classified into distant quartile was less than 7% in all nutrients. The median of correlations between the two FFQs 1 year apart were 0.45 for all nutrient intakes and 0.39 for nutrient densities. We conclude that the FFQ we have developed appears to be an acceptable tool for assessing the nutrient intakes in this population. Further studies for calibration of the FFQ collected from multicenters participating in the KoGES are needed. This study was supported by the budget of the National Genome Research Institute, Korea National Institute of Health (2002-347-6111-221).
              • Record: found
              • Abstract: found
              • Article: not found

              A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement.

              The intraclass correlation coefficient (rI) has been advocated as a statistic for assessing agreement or consistency between two methods of measurement, in conjunction with a significance test of the difference between means obtained by the two methods. We show that neither technique is appropriate for assessing the interchangeability of measurement methods. We describe an alternative approach based on estimation of the mean and standard deviation of differences between measurements by the two methods.

                Author and article information

                [1 ]Julius Centre University of Malaya, Department of Social and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
                [2 ]Department of Applied Statistics, Faculty of Economics and Administration, University of Malaya, Kuala Lumpur, Malaysia
                University of East Piedmont, Italy
                Author notes

                Conceived and designed the experiments: RZ AB RI NAI. Performed the experiments: RZ AB RI NAI. Analyzed the data: RZ AB RI NAI. Contributed reagents/materials/analysis tools: RZ AB RI NAI. Wrote the paper: RZ AB RI NAI.

                Role: Editor
                PLoS One
                PLoS ONE
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                25 May 2012
                : 7
                : 5
                Zaki et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                Pages: 7
                Research Article
                Statistical Methods
                Clinical Research Design
                Statistical Methods
                Systematic Reviews
                Diagnostic Medicine
                Test Evaluation
                Drugs and Devices
                Medical Devices
                Clinical Epidemiology
                Epidemiological Methods
                Non-Clinical Medicine
                Evidence-Based Medicine
                Public Health
                Preventive Medicine



                Comment on this article