Scale validation in applied health research: tutorial for a 6-step R-based psychometrics protocol

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background: Applied health science research commonly measures concepts via multiple-item tools (scales), such as self-reported questionnaires or observation checklists. They are usually validated in more detail in separate psychometric studies or very cursorily in substantive studies. However, methodologists advise that, as validity is a property of the inferences based on measurement in a context, psychometric analyses should be performed in substantive studies as well. Until recently, performing comprehensive psychometrics required expert knowledge of different, often proprietary, software. The increasing availability of statistical techniques in the R environment now makes it possible to integrate such analyses in applied research. Methods: In this tutorial, I introduce a 6-step protocol which allows detailed diagnosis of core psychometric properties (e.g. structural validity, internal consistency) for scales with binary and ordinal response options aiming to measure differences in degree or quantity, the most common in applied research. The protocol includes investigations of (1) item distributions and summary statistics, item properties via (2) non-parametric and (3) parametric item response theory, (4) scale structure using factor analysis, (5) reliability via classical test theory, and (6) calculation and description of global scores. I illustrate the procedure on a measure of self-reported disability, the 24-item Sickness Impact Profile Roland Scale (RM-SIP), administered in a survey of 222 chronic pain sufferers. An R Markdown script is provided that generates reproducible reports. Results: In this sample, 15 of 24 RM-SIP items formed a unidimensional ordinal scale with good homogeneity ( H = 0.43) and reliability ( α = .86[.84–.89]; ω = .87[.85–.88]). The two versions were highly correlated ( r = .96), and regression models predicting RM-SIP disability produced comparable results. Conclusions: The example analysis illustrates how psychometric properties may be assessed in substantive studies and identify avenues for measure improvement. Applied researchers can adapt this script to perform and communicate these analyses as part of questionnaire validation and substantive studies.