We present a fully articulated protocol for the Hamilton Rating Scale for Depression (HAM-D), including item scoring rules, rater training procedures, and a data management algorithm to increase accuracy of scores prior to outcome analyses. The latter involves identifying potentially inaccurate scores as interviews with discrepancies between two independent raters on the basis of either scores (≥ 5-point difference) or meeting threshold for depression recurrence status, a long-term treatment outcome with public health significance. Discrepancies are resolved by assigning two new raters, identifying items with disagreement per an algorithm, and reaching consensus on the most accurate scores for those items.
These methods were applied in a clinical trial where the primary outcome was the Structured Interview Guide for the Hamilton Rating Scale for Depression—Seasonal Affective Disorder version (SIGH-SAD), which includes the 21-item HAM-D and 8 items assessing atypical symptoms. 177 seasonally depressed adult patients were enrolled and interviewed at 10 time points across treatment and the 2-year followup interval for a total of 1,589 completed interviews with 1,535 (96.6%) archived.
Inter-rater reliability ranged from ICCs of .923 to .967. Only 86 (5.6%) interviews met criteria for a between-rater discrepancy. HAM-D items “Depressed Mood,” “Work and Activities,” “Middle Insomnia,” and “Hypochondriasis” and Atypical items “Fatigability” and “Hypersomnia” contributed most to discrepancies.