      PROMIS® Health Organization (PHO) 2020 Conference Toward Patient-Centered Care: PROMIS Implementations and Advances Abstracts : Virtual. 25-27 October 2020

      Journal of Patient-Reported Outcomes

      Springer International Publishing

      6TH PROMIS International Conference: Toward Patient-Centered Care: PROMIS Implementations and Advances (PHO 2020)

      25-27 October 2020

          101-O. Developing PROMIS-based research and clinical profiles for patients with heart failure Faraz S. Ahmad, Kathryn L. Jackson, Leilani Lacson, Susan E. Yount, Nan E. Rothrock, Michael A. Kallen, Karl Y Bilimoria, Abel N. Kho, David Cella All authors: Northwestern University Feinberg School of Medicine, Chicago, Illinois Correspondence: Faraz Ahmad (faraz.ahmad@northwestern.edu) Objectives Heart failure (HF) is a common and morbid condition. We previously reported the development of the PROMIS®-Plus-HF Profile Measure, a complete assessment of health that combines generic and HF-specific items. To facilitate patient-centered research and care, we sought to develop research and clinical profiles of the PROMIS-Plus-HF measure with overall, physical, mental, and social summary scores. Methods Candidate items (n=31) for the Research and Clinical Profiles were selected from the 86 items in the PROMIS- Plus-HF Profile Measure based on psychometric properties and to ensure coverage of the range of symptoms experienced by HF patients. In a web-based survey, HF clinicians (n=43) rated item importance and clinical actionability. Informed by these results, the study team developed a 27-item Research Profile and 10-item Clinical Profile. Overall, physical, mental, and social health summary scores on a scale of 0 to 100 were calculated. In a cross-sectional (n=600) sample, we measured the reliability: internal consistency with Cronbach’s alpha and test-retest in sample of 100 participants. Known groups validity was assessed using one-way ANOVA, modeling difference in HF score across New York Heart Association (NYHA) Class and Kansas City Cardiomyopathy Questionnaire (KCCQ) summary scores and subscales. Differences in change in Profile scores (responsiveness) by KCCQ change was evaluated using linear mixed regression. Results In the 600-person cross-sectional sample, the summary scores for the 27-item PROMIS-Plus-HF Research Profile and 10-item Clinical Profile were normally distributed. Internal consistency for domain scores were excellent (all α >0.8). Test-retest intraclass correlation coefficients for domain and overall scores were ≥0.90. Both profiles demonstrated known groups validity for overall score and physical sub-score based on NYHA Class, and for all scores based on KCCQ groups (p<0.05). In the 75-person longitudinal sample, the Research Profile demonstrated evidence of responsiveness (p<0.05) for the overall and domain scores. For the Clinical Profile, the point estimates for the overall and social and mental health scores reflected responsiveness and the change in physical score reached statistical significance (p<0.05). Conclusion The PROMIS-Plus-HF Research and Clinical Profiles demonstrated overall good psychometric characteristics. The PROMIS-Plus-HF Research and Clinical Profiles may be used to facilitate patient-centered research and clinical care. 102-P. Withdrawn 103-P. Is there a difference in outcomes between patients treated with different implants for hammertoe correction? Amanda Holleran, Adolf S. Flemister, Benedict DiGiovanni, Irvin Oh, John Ketz, Gabriel Ramirez, Caroline Thirukumaran, Judith F. Baumhauer All authors are from University of Rochester Medical Center Correspondence: Judith F. Baumhauer (Judy_Baumhauer@urmc.rochester.edu) Objective Hammertoe surgery is one of the most commonly performed musculoskeletal surgeries. Thirty years ago, a simple 10 cents wire was used to stabilize the repair. In the past 10-15 years a multitude of implants have been suggested to replace the simple wire technique due to recurrence rates of the deformity. With new implants comes significant cost increases. This study examines the physical function, pain, recurrent and other complications of patients treated with 4 different surgical implants for hammertoe correction. Methods A retrospective review of prospectively collected patient reported outcome measurement information system (PROMIS) physical function PF and pain interference PI data was performed in 248 patients who had a hammertoe correction January 2015-December 2019. Categorical (yes/no) for recurrence and complications was obtained by chart review. Mann-Whitney U, Chi-square test and mixed linear regression models for were used to compare groups for demographics and assess PF and PI differences at final follow up time point for each implant group (k-wire, nextra implant, retrograde fusion screw, Trim it pin) correcting for confounding demographic variables. Results Baseline demographics demonstrated implants were used in slightly older aged patients (2 years average). Other confounding variables included BMI (larger had lower PF), Smoking history (past smokers had lower PF and higher PI), insurance (governmental products had lower PF and higher PI). Implants had a higher recurrence rate (OR 1.9) however no increase in other complications. At final follow up when controlling for confounding variables, PF was better with nextra and trim-it pins than k- wire. There was no difference in PI between K-wire and implant groups. Conclusions There is variation in the surgical implants used for the commonly performed hammertoe procedure. The choice of implant should be based on patient reported outcomes (function and pain improvement) as well as the risk of recurrence or complications. In this case, the cost differential between the k-wire and the few implants reviewed is nearly 1,000 dollars. Objective assessments of outcomes will aid in determining value, eliminate variation and improve the alignment of provider and health care cost allocation. 104-P. Is there a difference in outcomes between double or triple arthrodesis for foot deformity? Amanda Holleran1, Judith F. Baumhauer1, Jeff Houck 2, Daniel Homeier1, Adolf S. Flemister1, Benedict DiGiovanni1, Irvin Oh1, John Ketz 1 1University of Rochester Medical Center; 2George Fox University, Newberg, OR Correspondence: Judith F. Baumhauer (Judy_Baumhauer@urmc.rochester.edu) Objective Triple arthrodesis (fusion of the talonavicular, subtalar and calcaneocuboid joints) has historically been considered the standard of treatment for arthritis of the hindfoot. The complications of this surgery include non-union, malunion, nerve injury, infection, and wound healing problems. Double arthrodesis (fusion of the talonavicular and subtalar joints) is capable of producing a similar reduction in motion and correction of foot deformity, however, may cause less patient morbidity due to one less joint being incorporated into the fusion procedure and less cost due to shorter operative time and fewer hardware needs. The purpose of this study is to evaluate the patient reported outcomes (PROMIS physical function PF and pain interference PI) and complication rates for surgically corrected foot deformity using a triple arthrodesis compared to using a double arthrodesis. Methods A retrospective review of prospectively collected patient reported outcome measurement information system (PROMIS) data was performed in 57 patients who had either undergone a double or triple arthrodesis from January 2015-December 2019. PF and PI scores were collected. Linear mixed models were used to assess differences over time and between groups (Double versus Triple) pre- operation, 3 months, 6 months, 9 months and 12 months post-surgery. Medical records were reviewed for complications (yes/no). Results There were no statistical differences between groups in terms of age (p=0.65), BMI (p=0.32), pre-operative diagnosis (p=0.79), ASA rating (p=0.4), or complications (p=0.49) occurred. Coefficient of variation at each time point per group varied from 11.9% to 21.8%. Both groups were significantly improved in physical function (p<0.01) and pain interference (p<0.01) without a significant difference between groups at 9 or 12 months. Conclusion Double arthrodesis can allow for similar correction of foot deformities without the increased risk of wound complication and nonunion. Both groups demonstrated a significant improvement in their PROMIS PF and PI at 1 year demonstrating either a double or triple arthrodesis is a feasible operation however a double arthrodesis may potentially save time and health care costs. 105-O. Methodology for selecting and evaluating items from PROMIS® Item Banks to develop novel short-form questionnaires Steven I. Blum1, Larissa Stassek2, Donald M. Bushnell2, Sejin Lee3, James W. Shaw1, Mona L. Martin2 1Bristol Myers Squibb, Lawrenceville, NJ USA; 2Evidera | PPD, Bethesda, MD USA; 3University of North Carolina, Chapel Hill, NC USA Correspondence: Steven I. Blum (steven.blum@bms.com) Objectives PROMIS® measures can be administered via computer adaptive testing or through use of existing short-forms and profile measures. Customized short-forms can be developed by selecting items from PROMIS item banks. We describe a systematic approach for selecting PROMIS items when creating custom short-forms and for generating evidence to support content validity within specific patient populations. Methods A modified Delphi process was used to initially evaluate items from the PROMIS and PROMIS-Cancer Physical Function items banks and to reduce the number of items to be further evaluated in qualitative interviews. The Delphi panel (n=10) included both measurement experts and patient representatives who evaluated the items using predefined criteria and voted over three rounds to keep or drop each item. Retained items were subsequently evaluated in combined concept elicitation/cognitive interviews. Interviews (n=150) were planned with patients diagnosed with one of five different cancers (i.e., lung, renal, hepatocellular, melanoma, and head and neck). Interviews were conducted at multiple sites in the United States and incorporated card sorting and rating exercises, which facilitated discussion on the relevance and importance of each item and any difficulty answering. The interviews were audio recorded, transcribed, and analyzed. Results The Delphi panel evaluated 169 PROMIS Physical Function items, voting to drop 93 and retain 76 items for further evaluation in the qualitative interviews. While recruitment is ongoing, preliminary results from the interviews have provided evidence for selecting PROMIS items most relevant to patients with each tumor type. The final deliverables will include a disease-specific short-form for each tumor type and an evidence dossier suitable for submission to regulatory authorities. This methodology can be applied to other measurement systems (e.g., EORTC, PRO-CTCAE) with item banks/libraries to select subsets of items relevant to a specific target population. Conclusions Qualitative patient interviews incorporating card sorting and rating exercises can be used to select a subset of items relevant to a specific population, while simultaneously generating additional evidence to support content validity of the novel short-form measures. A modified Delphi process helped to reduce the number of items that needed to be evaluated, thus making the interviews more manageable and efficient. 106-P. PROMIS® Paediatric Self Report Profile-25 distinguishes subgroups of children with two common paediatric knee injuries Chaplin, J.E.1,2 Danielsson, A.3,4, Janarv, P-M.4,5,6, Askenberger, M.4,7 1Dept. of Pediatrics, Institute of Clinical Sciences, Sahlgrenska Academy at Gothenburg University, Gothenburg, Sweden, 2Swedish Association of Local Authorities and Regions (SALAR); 3Dept. of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy at Gothenburg University, Gothenburg, Sweden; 4Swedish Paediatric Orthopaedic Quality register (SPOQ); 5Capio Artro Clinic, Stockholm, Sweden.; 6Dept. of Molecular Medicine and Surgery, Karolinska Institute, Stockholm, Sweden.; 7Dept. of Women’s and Children’s Health, Karolinska Institute, Solna, Sweden Correspondence: John Eric Chaplin (john.chaplin@gu.se) Objective To test the sensitivity of the generic PROMIS-25 and the child version of the illness- specific measure Knee Injury and Osteoarthritis Outcome Score (KOOS-Child) to two specific knee injuries which involve different symptoms and treatment regimens. Methods The qualitative control of treatment of severe knee injuries is followed by the Swedish Paediatric Orthopaedic Quality register (SPOQ). All patients were invited to complete paper versions of the two instruments at predefined follow-ups. Data collected in 2017–2019 are presented. Analyses were made of floor and ceiling effects. Agreement between domains was tested using bivariate correlation. Sensitivity of injury and treatment outcomes was investigated using a receiver operating characteristic (ROC) curve. Results Data from 272 paediatric patients (49% female; mean age 13-years at follow-up, range: 9–14) were gathered. Diagnoses: patellar dislocation 43%; anterior cruciate ligament injury (ACL) 22%; other diagnoses 33%; unknown 1%. The missing data rate was negligible: PROMIS 0.4%, KOOS 1.6%. Ceiling effects were found in all KOOS variables. The highest correlations between the domains of PROMIS (p) and the KOOS (k) were between ‘(p)mobility’ and ‘(k)sport’ (r=0.717), between ‘(p)pain interference’ and ‘(k)pain’ (r=0.634) and between ‘(p)anxiety’ and ‘(k)QoL’ (r=-0.509). Sensitivity to diagnosis (patellar dislocation and ACL) was more pronounced in PROMIS, where ‘(p)anxiety’ and ‘(p)mobility’ had the largest AUC (0.5525 and 0.5461, respectively) of all domains in both instruments. Conclusions The expected agreements between similar domains in the two instruments were found. Our results suggest that the PROMIS-25 was more sensitive to differences in anxiety and mobility between different injury locations than the KOOS-Child. Further analysis of the differences between these instruments will help to identify the measure of first choice in registry data collection. 107-P. PROMIS® Paediatric Self Report and Proxy Profile-25 compared to a quality-of-life instrument Chaplin, J.E.1,2, Peterson, C.3, Danielsson, A4,5 1Dept. of Paediatrics, Institute of Clinical Sciences, Sahlgrenska Academy at Gothenburg University, Gothenburg, Sweden; 2Swedish Association of Local Authorities and Regions (SALAR); 3School of Health and Welfare, Jönköping University, Jönköping, Sweden; 4Dept. of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy at Gothenburg University, Gothenburg, Sweden; 5Swedish Paediatric Orthopaedic Quality Register (SPOQ) Correspondence: John Eric Chaplin (john.chaplin@gu.se) Objective To compare the generic paediatric PROMIS profile-25 (p) with the multi-dimensional quality-of-life instrument DISABKIDS-31 (d). Methods Paper versions were administered to children aged 8-17 years and their parents, and to parents only for younger children, at the orthopaedic outpatient clinic of a university hospital. PROMIS-25 is a six-dimensional PROM profile; DISABKIDS-31 is a five-dimensional quality-of- life measure. Intraclass correlation coefficient assessed agreement between child and proxy. Multiple linear regression was calculated to predict DISABKIDS total QoL from the PROMIS profile for self-report and proxy-report. To visualize the relationships between the variables, multidimensional scaling with PROXSCAL was used. Results Data on 35 children (4-17 yrs, 60% girls) were collected, which included 17 child/proxy diads (9-17 yrs), 1 child (10 yrs) and 17 parent-reports without child self-report (4-15 yrs). The most frequent reason for visiting the clinic (77%) was leg or muscle injuries; other diagnoses included injury or deformity to the feet or back. Missing data was negligible: PROMIS 3%, DISABKIDS 4%. There was good to excellent agreement between child and parent DISABKIDS (r=0.673-0.903); poor to excellent for PROMIS (r=0.272-0.975), with (p)peer relationships (r=0.272) having the lowest agreement. Predicting DISABKIDS QoL scores, a significant regression equation was found for PROMIS self-report (F(7,9)=4,931, p=0.015) with an adjusted R2 of 0.632, and for proxy-report (F(7,24)=14,608, p<0.001), adjusted R2=0.54, with (p)physical function, (p)fatigue and (p)pain intensity being reliable predictors (p<0.001, p=0.040; p=0.020, respectively). Multidimensional scaling revealed a good separation between variables for both instruments, with the possible exception of PROMIS anxiety and depression. Conclusions The two questionnaires demonstrated mixed inter-rater reliability, with PROMIS peer relations having the lowest ICC, indicating a possible greater sensitivity to differences in child/proxy reporting in PROMIS. A large part of the variation in DISABKIDS total-QoL can be explained by the PROMIS profile scores, indicating overlap between the instruments. The most reliable predictor of total-QoL was the physical functioning/mobility variable in PROMIS. Multidimensional scaling suggests that PROMIS-25 has a better separation between domains than DISABKIDS, with the possible exception of (p)anxiety and (p)depression. Further analysis of the differences between these instruments would benefit from a larger and more diverse population. 108-P. Patient-centered approach to response-level missing data Chapman, R. Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University, Chicago, USA Correspondence: Robert Chapman (Robert.Chapman@northwestern.edu) Objective To use mixture modelling as a patient-centered method to explain item-level missing data in PROMIS measures. These methods are used to evaluate the presence of item missingness across patient demographics, conditions and treatment, and patient-reported symptom severity and impact. These results inform novel Missing Not at Random models of missingness for PROMIS measures and help guide the clinician’s selection of PROMIS measures to minimize patient non-response. Methods Data used in analyses were obtained from HealthMeasures Dataverse, including data collected as a part of the development of the PROMIS Neuropathic and Nociceptive Pain Quality measures. PROMIS measures were scored using IRT item parameters and pattern- response scoring methods. Mixture modelling analyses were conducted with patient demographics, clinical information, number of items missing and PROMIS T scores. All analyses were conducted in R statistical computing software. Results Initial results with the PROMIS Neuropathic & Nociceptive Pain Dataset show that more item- level missing data is associated with better neuropathic pain scores (as indicated by lower PROMIS Neuropathic Pain Quality T scores) and with patients who have a condition associated with nociceptive pain (rheumatoid arthritis & fibromyalgia). Conversely, less missing item-level data is associated with worse pain (indicated by higher PROMIS Neuropathic Pain Quality T scores) and with patients who have conditions associated with neuropathic pain (diabetic neuropathy & cancer chemotherapy induced peripheral neuropathy). Conclusion These results provide important guidance for researchers or regulators who may be concerned about the item-level missing data in PROMIS measures: missing data is less likely to occur with patients who have worse health-related quality of life (greater symptom severity and impacts). Missing data is also less likely when patients are answering items relevant to their condition or severity. Researchers seeking to model item-level missingness for data imputation methods should focus on missingness in patients with lower symptom severity and impact. These findings reinforce the importance of administering item content relevant to the patient, which is appropriate to either the patient’s condition or severity. Clinical users can incorporate these findings into their practice by administering condition-relevant PROMIS short forms or by administering PROMIS Computer Adaptive Tests to minimize irrelevant items. 109-P. Measurement of minimal disease activity in psoriatic arthritis using PROMIS-Physical Function or the Health Assessment Questionnaire-Disability Index Erin Chew1, 2, Jamie Perin3, Thomas Grader-Beck1, Ana-Maria Orbai1 1Johns Hopkins University School of Medicine Division of Rheumatology, Baltimore, MD; 2Johns Hopkins Hospital. Baltimore, MD; 3Johns Hopkins University School of Public Health, Department of International Health, Baltimore Correspondence: Erin Chew, MD (echew6@jhmi.edu or erinychew@gmail.com) Background Minimal disease activity (MDA), is a treat-to-target strategy (T2T) objective in psoriatic arthritis (PsA). MDA criteria include physical function, traditionally assessed via the Health-Assessment Questionnaire Disability Index (HAQ-DI). It is of interest to assess the performance of more current physical function instruments such as the Patient-Reported Outcomes Measurement Information System-Physical Function Profile (PROMIS-PF). Objectives To assess the interchangeability of the HAQ-DI with the PROMIS-PF in the calculation of MDA in PsA. Methods Longitudinal PsA data were collected including HAQ-DI and PROMIS-PF in a PsA cohort. MDA definitions were built substituting the HAQ-DI criterion with the PROMIS-PF short form 4a (PROMIS-PF4a) or with the PROMIS-PF computer adaptive test (PROMIS-PF Bank). We assessed agreement/accuracy between HAQ-DI based and PROMIS-PF based MDA definitions at each visit and longitudinally through the kappa statistic/ROC curve analysis. Results One hundred participants contributed 352 observations with up to five visits. Mean (SD) age was 52 (12) years, 60% were female, and 43% were in MDA at baseline. Kappa statistic for PROMIS-PF based MDA reflected almost perfect agreement with HAQ-DI MDA: kappa=0.94 (95% CI 0.90-0.97) for MDA PROMIS-PF Bank and kappa=0.90 (95% CI 0.80-0.95) for MDA PROMIS-PF4a. Higher longitudinal agreement was seen between MDA HAQ-DI and MDA PROMIS-PF Bank versus MDA PROMIS-PF4a between consecutive visits: kappa ranged between 0.81-0.94 versus 0.72-0.84, respectively (Table 1). Area under ROC curve for predicting MDA HAQ-DI was 0.97 for MDA PROMIS-PF Bank and 0.95 for MDA PROMIS-PF4a. Conclusions Excellent agreement was seen between HAQ-DI and PROMIS-based MDA definitions statically and longitudinally. The PROMIS-PF Bank and PROMIS-PF4a are accurate replacements for the HAQ-DI in calculating MDA state in PsA. 110-O. Interpretation of PROMIS Fatigue CAT scores in solid organ transplant recipients Sumaya Dano1, Ali Rezaeishahreza1, Areej Ali1, Nathaniel Edwards1, Setareh Aghamohammadi1, Nasab El-Dassouki1, Jasleen Gill1, Marta Novak2, Susan J. Bartlett3†, Istvan Mucsi1† †Susan J. Bartlett and Istvan Mucsi are co-senior authors 1Multi-Organ Transplant Program and Division of Nephrology, University Health Network, Toronto, Canada; 2Centre for Mental Health, University Health Network, Toronto, ON, Canada; 3Center for Health Outcomes Research, McGill University, Montreal, Quebec, Canada Correspondence: Istvan Mucsi (istvan.mucsi@utoronto.ca) Objective Relating PROMIS T-scores to functional impacts can help clinicians and patients to meaningfully interpret T-scores. Here we assess the relationship between T-scores vs the last items and responses in solid organ transplant recipients (kidney (KTRs), kidney-pancreas (KPRs) and liver (LTRs)) using the PROMIS Fatigue Computer Adaptive Test (CAT). Methods A cross-sectional, convenience sample of adult KTRs, KPR, and LTRs completed the PROMIS Fatigue CAT on an electronic data capture system (DADOS, TECHNA Institute, UHN). The number of items answered, and the unique last items administered from the PROMIS Fatigue item bank were tabulated. Final T-scores were ordered from low to high, and last questions and responses at different T-scores are reported. Results Of the 373 participants, the mean (SD) age was 53(14), 235 (63%) were male, 199 (53%) were KTRs, 46 (12%) were KPRs and 128 (34%) were LTRs. T-scores were <50 (46%), 50-60 (35%), >60 (19%).A total of 18 unique last questions were completed in this study sample. Patients with T-scores ranging from 24-40 had last questions and responses that reflected no to very little fatigue. Unique last questions to this T-score range included questions about strenuous exercise and feeling “sluggish”. Responses to these questions suggested that patients were able to perform strenuous exercises and did not feel tired. Patients with T-scores 60 had last questions and responses reflecting moderate to severe fatigue. Unique last questions administered to patients with T-scores 60 included questions about fatigue interfering with physical functioning, and for patients with T-scores >70, the ability to eat and carry a conversation. Responses to questions in this T-score range suggested that fatigue limited the ability to perform even basic daily activities of living. Conclusion We reported a relationship between PROMIS Fatigue CAT T-scores, and the last question and response administered. This relationship can help improve the interpretation of PROMIS Fatigue T-scores and help clinicians and patients understand how PROMIS Fatigue T-scores relate to limitations in daily life. 111-P. Reducing questionnaire burden when screening for depressive symptoms in patients with end-stage kidney disease Sumaya Dano1, Evan Tang1, Faisal Jamil1, Dean Christidis1, Madeline Li3, Doris Howell4, John Devin Peipert5, Susan J. Bartlett6, Istvan Mucsi1 1Multi-Organ Transplant Program and Division of Nephrology, University Health Network, Toronto, Canada; 2Centre for Mental Health, University Health Network, Toronto, ON, Canada; 3Department of Supportive Care, Princess Margaret Hospital, Toronto, Ontario, Canada 4Princess Margaret Cancer Center, Faculty of Nursing, University of Toronto, Toronto, Ontario, Canada; 5Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, Illinois; 6Center for Health Outcomes Research, McGill University, Montreal, Quebec, Canada Correspondence: Sumaya Dano (sumaya.dano@mail.utoronto.ca) Objective Routine screening for depressive symptoms can be time-consuming and burdensome for patients. However, patients without depressive symptoms can be quickly screened out using ultra-brief screening tools and avoid the need of completing more precise, but longer, questionnaires. In this study we compare the questionnaire burden of completing the Patient Health Questionnaire (PHQ9) or PROMIS Depression Computer Adaptive Test (D-CAT) vs using various two-step screening combinations for depressive symptoms in patients with end-stage kidney disease (ESKD). Methods A cross-sectional, convenience sample of adult kidney transplant recipients and patients on maintenance dialysis completed the Edmonton Symptom Assessment Survey-revised (ESASr), PROMIS D-CAT and PHQ9. PHQ9 score ≥10 was used as reference to identify moderate/severe depressive symptoms. ESASr depression (ESASr-D) and PHQ2 score of ≥1 and ≥2 were evaluated for the pre-screening step. In the second step, D-CAT T-score ≥55 was used to identify patients with potentially significant depressive symptoms. The total number of questions completed were calculated for the different scenarios. Results Mean(SD) age of the 164 participants was 52(17), 68% were male, 62% Caucasian. Based on PHQ9, 16% (n=26) had depression. In the single step screening scenarios, the sample would complete a total of 1476 PHQ9 or 1020 D-CAT items, respectively (9 or 6 items per participant on average, respectively). All the different 2-step screening combinations would reduce the total number of items completed by the total sample by at least half. A 2-step method combining PHQ2 ≥2 and D-CAT (Sensitivity:65% Specificity:94%), required a total of 510 items (both PHQ2 and D-CAT together; 3.1 per participant on average). A 2-step screening combining ESASr-D ≥1and D-CAT (Sensitivity:58% Specificity:94%) required a total of 435 items (both ESASr-D and D-CAT together; 2.7 per participant on average). Conclusion Compared to administering either PHQ9 or PROMIS Depression CAT to all participants, a 2-step process including an ultra-brief pre-screening tool reduced the number of questions completed by the total sample substantially. 112-P. Psychometrics of three Swedish pediatric item banks from the Patient-Reported Outcomes Measurement Information System (PROMIS)® Frida Carlberg Rindestig1, Marie Wiberg2, Eva Henje Blom1, Inga Dennhag1 1Child and Adolescent Psychiatry, Department of Clinical Science, Umeå University, Sweden; 2Department of Statistics, USBE, Umeå University Correspondence: Inga Dennhag (inga.dennhag@umu.se) Objective The Patient-Reported Outcomes Measurement Information System (PROMIS®) aims to provide self-reported item banks for several dimensions of physical, mental and social health. Here we investigate the psychometric properties of the Swedish pediatric versions of the item banks for pain interference, fatigue and physical activity. Methods 12-19 years old participants (n = 681) were recruited in public school settings, at a child- and psychiatric outpatient clinic, and a youth health outpatient clinic confirmatory factor analyses (CFA) were performed to evaluate scale dimensionality and local dependence. Item Response Theory (IRT) analyses were then used to finalize item banks and assure that each item is valid and weighted as a standalone assessment. Results CFA results confirmed that pain interference, fatigue and physical activity are separate constructs. Items with low item fit and items with Differential Item Functioning (DIF) were removed resulting in 14 items of pain interference and 15 items of fatigue items, and 6 items of physical activity. Conclusions Swedish Item banks were developed to assess pain interference, fatigue and physical activity in 12-19 year olds by using item response theory. These instrument offers precise, efficient and flexible assessment and allow researchers to select only the most useful items to study. 113-P. Parental caregiver burden and recovery of adolescent Anorexia nervosa after multi-family therapy Inga Dennhag, Eva Henje Blom, Karin Nilsson All authors: Child and Adolescent Psychiatry, Department of Clinical Science, Umeå University, Sweden Correspondence: Inga Dennhag (inga.dennhag@umu.se) Objective Parental involvement in the treatment of anorexia nervosa has shown to be extremely important, especially for adolescents. This study investigated whether parental caregiving burden changed during adjunct multi-family therapy of adolescent anorexia nervosa and eating disorders not otherwise specified (EDNOS) and whether caregiver burden at baseline and changes in caregiver burden during treatment were associated with treatment outcome. Methods Twenty-four females, 13 to 16 years old, and their parents, participated in the study. Caregiver burden was measured with the Eating Disorders Symptom Impact Scale, by mothers (n=23) and fathers (n=22). Treatment outcome was measured by adolescent body mass index, level of global functioning and self-rated eating disorder symptoms by the Eating Disorders Examination Questionnaire 4.0. Results All patient outcomes improved and overall caregiver burden decreased significantly during treatment. When broken down in aspects of caregiver burden the decrease in parental perceived isolation, was found to be associated with improvement of BMI and Children’s Global Assessment Scale. When analyzing fathers and mothers separately, we found that maternal feelings of guilt and paternal perceived burden of dysregulated behaviors at base-line were correlated to treatment outcome. Conclusions Multi-Family Therapy shows preliminary effectiveness as an adjunct treatment for anorexia nervosa and eating disorders not otherwise specified. Fathers might be more important than seen before in treatment, especially in the participation of Multi-Family Therapy. Caregiver burden can be a potential mediator of treatment results in the future. 114-P. Measuring function in a multidisciplinary Osteogenesis Imperfecta clinic Maureen Donohoe, Cristina McGreal, Jeanne M. Franzone, Richard W. Kruse, Michael B. Bober, Kenneth Rogers, Robert Wellmon All authors: Nemours/ Alfred I. duPont Hospital for Children, Wilmington, DE Correspondence: Maureen Donohoe (Reenee.Donohoe@nemours.org) Objective Our objective is to report on early results of data collected during multidisciplinary clinic visits using PROMIS, functional mobility scores (FMS), and BMI, identifying relationships between type of Osteogenesis Imperfecta (OI) and function. Methods This is a single center retrospective review of OI patients attending a clinic visit including Genetics, Orthopaedics, and Physical Therapy between January, 2016– October, 2019. Demographic, clinical, operative data, PROMIS dimensions including physical mobility, upper extremity function, pain interference, fatigue, and peer relationships (pediatric) or social participation (adult) and FMS were collected. Individuals’ presentations were sorted by mild, moderate, or severe and by BMI into categories of ideal, overweight, and obese. Results 49 met criteria and were grouped based on OI severity. OI severity was associated with higher BMI and lower levels of function on PROMIS Physical Mobility and Upper Extremity Function dimensions. BMI was negatively associated with PROMIS Physical Mobility score. Individuals with OI who scored higher on PROMIS Physical Mobility and Upper Extremity Function had lower levels of Pain and Fatigue based on reported scores. Statistical significance between group differences for BMI, and PROMIS scores for Physical Mobility and Upper Extremity Function. Participants with mild or moderate OI severity had significantly lower BMI than those with severe OI. PROMIS Physical Mobility: participants with mild and moderate OI had significantly higher scores than those with severe OI; individuals with mild OI also scored significantly higher than those with moderate OI severity. PROMIS Upper Extremity Function: participants with mild OI had significantly higher scores than those with moderate or severe OI. Conclusions Patient reported outcome (PRO) measures are helpful in understanding individuals’ functional levels and identifying needs. Mild OI presentation tend to have lower BMI and greater activity as noted on PROMIS. Fatigue and Pain Interference on PROMIS did not have a significant relationship based on severity of OI or BMI. Severe presentation of OI tend to have higher BMI and less physical activity and upper extremity function on PROMIS. Past year fracture history, surgical intervention, and bisphosphonate use had no statistically significant impact on PRO across this population. 115-O. Cross-walking PROMIS-29 to the Roland-Morris Disability Questionnaire and Oswestry Disability Index for chronic back pain Maria Orlando Edelen1, Anthony Rodriguez1, Patricia M. Herman1, Ron D. Hays1,,2 1RAND, 2UCLA Correspondence: Maria Orlando Edelen (orlando@rand.org) Objective There is extensive literature on the effectiveness of pharmaceutical and nonpharmacologic interventions for chronic low back pain (CLBP) based on different samples and outcome measures. The NIH Research Task Force (RTF) on CLBP noted that these differences make it difficult to compare studies of similar or competing interventions. These differences limit the usefulness of the results in answering questions such as ‘Which therapies work best? And for whom?’ This study reports empirical links of the PROMIS-29 with the Roland-Morris Disability Questionnaire (RMDQ) and the Oswestry Disability Index (ODI) to enable comparisons across more studies. Methods Secondary analyses of three datasets: 1) RAND Center of Excellence for the Appropriateness of Care (CERC) data (n=1677) were collected on chiropractic patients being treated for CLBP and CNP; 2) Assessment of Chiropractic Treatment for Low Back Pain (ACT) data (n=750) were collected on active military personnel participating in chiropractic clinical trials for LBP; and 3) Amazon Mechanical Turk (MTurk) data were obtained from a general population sample (n=5755) that included a subgroup that reported CLBP (n=1444). The PROMIS-29 was administered in all three datasets, the RMDQ in the ACT, and the ODI in the CERC and MTurk datasets. We develop ordinary least squares regression equations to predict the RMDQ and the ODI from PROMIS-29 scales. Results R 2 values ranged from 54 to 61% with normalized mean absolute error (NMAE) ranging from 0.51 to 0.53 standard deviations in regression models predicting the RMDQ from the PROMIS-29. Physical function, pain interference, and sleep disturbance were consistently retained. R 2 values ranged from 65 to 67% in CERC data and 63% in MTurk data with NMAE ranging from 0.43 to 0.47 in CERC and 0.46 in MTurk data for predicting the ODI. Physical function, social function, sleep disturbances, and average pain intensity were consistently retained. Conclusions The RMDQ and ODI “legacy” scores can be predicted from the PROMIS-29 with sufficient accuracy for group-level comparisons. These crosswalks enable comparisons of studies that use legacy measures with those that administer the PROMIS-29. In addition, these results can be used for the harmonization required for individual patient data meta-analyses. 116-O. Dutch reference values for the PROMIS Scale v1.2 – Global Health Ellen BM Elsman1, Leo D Roorda2, Martine HP Crins2, Maarten Boers1, Caroline B Terwee1 1 Amsterdam UMC, Vrije Universiteit Amsterdam, Department of Epidemiology and Biostatistics, Amsterdam Public Health research institute, Amsterdam, the Netherlands 2 Amsterdam Rehabilitation Research Center | Reade, Amsterdam, the Netherlands Correspondence: Ellen Elsman (e.elsman@amsterdamumc.nl) Objective In order to add context to the health impact of diseases and conditions, it is important to interpret and compare patient-reported outcomes across studies and populations. This study aims to estimate and evaluate Dutch reference values for the Patient- Reported Outcomes Measurement Information System Global Health (PROMIS-GH) scale. Methods The PROMIS-GH v1.2 was administered through a web-based survey to 4370 Dutch persons, representative for the Dutch general population in 2016. T-scores for the mental health (GMH) and physical health (GPH) subscales, and their shorter two-item subscales, were calculated for the entire population, age groups and gender. T-scores for GMH and GPH were compared to the US reference population, which has a mean T-score of 50 and a standard deviation of 10, and to age-range and gender subpopulation reference scores. US reference population T-scores are representative for the 2000 US general population. Results The Dutch population had a GMH T-score of 44.7 and a GPH T-score of 45.2, both substantially lower, and thus worse, than the US reference population T-score of 50. Lower T-scores for the Dutch general population were found for both age-range and gender subpopulations compared to US subpopulation reference values. T-scores of the Dutch general population showed a similar pattern compared to US reference values: T-scores worsened with increasing age, but improved again for the oldest age groups; males scored better than females. Conclusions This study reports reference values for the PROMIS-GH scale for the Dutch general population, including age-range and gender subpopulations. PROMIS can improve the assessment of physical and mental health, but appropriate population reference values are essential for their interpretation. This study provides these values for the Netherlands; they are notably worse from the US reference values of 2000; perhaps the US data is outdated and no longer representative of the current US health status. Nevertheless, this study fuels the discussion on whether or not we should anchor the mean and standard deviation of PROMIS scales on the US population. 117-P. PROMIS in English speaking countries – A systematic review of the evidence for measurement invariance of PROMIS tools Alex Matthews, Jonathan P Evans, Jose Valderas All authors: Health Services and Policy Research Group, University of Exeter Medical School, Exeter, UK Correspondence: Jonathan P Evans (j.p.evans2@exeter.ac.uk) Objective Measurement invariance across different populations defined in terms of language and culture must be quantified and confirmed to ensure that Patient Reported Outcome Measures (PROMs) maintain their metric properties. The Patient Reported Outcomes Measurement Information System (PROMIS) was designed and tested on a US reference population. Assumptions of validity and cross-cultural equivalence in other English-speaking countries is based on a universal translation approach, but remains untested and should be confirmed alongside evaluation of other psychometric properties such as reliability and responsiveness. We aimed to investigate the use of PROMIS instruments in non-USA English speaking countries, and the evidence of measurement invariance within these populations. Methods We performed a systematic search of MEDLINE and Embase for contemporary literature from 2017 onwards. Articles were included if they provided evidence of use or assessment of metric properties of PROMIS instruments in UK, Australian or New Zealand populations. Secondary searches of published abstracts from conference proceedings and trial registries were also undertaken. Results Twenty-two articles met our inclusion criteria and 12 (55%) used a PROMIS instrument as an outcome measure without any evaluation of their metric properties in the target populations. The remaining 10 articles analysed the metric properties of PROMIS tools. Six Australian psychometric analyses focused on mental health metrics for the Depression, Anxiety and Emotional Distress item banks. Three studies provided evidence to support validity, responsiveness to change was confirmed in two and measurement invariance was assessed in one. Only four studies including UK populations studied either the validity, responsiveness or invariance. Sixty-nine registered clinical trials were identified. The majority planned to use PROMIS tools to assess outcomes. There was no evidence of cross-cultural adaptation or testing for cross-cultural equivalence of PROMIS item banks. Conclusion Evidence on the measurement properties of PROMIS instruments in populations from English speaking countries outside of the US and Canada is sparse. Lack of confirmation of measurement invariance places the interpretation of PROMIS instruments at risk. There is a pressing need for the evaluation of cross-cultural validation amongst English speaking populations to ensure appropriate interpretation and acceptance of the PROMIS instruments. 118-O. Monotonic polynomials to model flexible item response curves for PROMIS Physical Function Carl F. Falk1, Felix Fischer2 1Department of Psychology, McGill University, Montréal, Québec, Canada; 2Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany Correspondence: Felix Fischer (felix.fischer@charite.de) Objective The PROMIS Instrument Development and Validation Scientific Standards suggest to investigate each items’ measurement properties by inspecting initial probability functions from non- parametric IRT models. Typically, items are excluded when their response function is misfitting a parametric model. Monotonic polynomials allow to parametrically model aberrant response curves and therefore to retain such items in the measurement model. We investigated suitability of this approach in the PROMIS Physical Function item bank. Methods Using PROMIS Wave 1 data (N = 15,725) for Physical Function, we fitted a monotonic polynomial model as well as the standard graded response model. We compared both models in terms of overall model fit, latent trait estimates, and item as well as test information. We investigated item-level differences between both models using common measures of differential item functioning and simulated the impact of model differences on scoring of 5 and 10 item tests. Results The monotonic polynomial showed better fit to the data indicated by a significant likelihood ratio test and a lower AIC (but higher BIC) compared to the graded response model. The difference of theta estimates between both models was less than 0.12 in 95% of the cases, but the monotonic polynomial model had higher information in the lower ranges of the construct. The high concordance between both models could be due to the fact that items with aberrant response curves have not been included in the PROMIS Physical Function itembank. Conclusions Monotonic polynomials as flexible intermediates between parametric and non- parametric models appear to be a useful addition to PROMIS developers’ toolbox. 119-P. Development and pilot testing a self-reported pediatric PROMIS app for young children aged 5-7 years Wenjun Gao, Changrong Yuan, Yuchen Zou, Huan Lin all authors: School of Nursing, Navy Medical University, Shanghai, China. Correspondence: Wenjun Gao (zerowenjun@163.com) Objective The aims of this study are threefold. Firstly, using the state of science PROMIS (Patient-Reported Outcomes Measurement Information System) methods to develop a smartphone application to monitor the emotional distress for young children aged 5-7 years old; Secondly, to test the usability of this application; and thirdly, to determine the level of agreement between reports by parents and young children's self-report. Methods A multidisciplinary research team, made up of senior pediatric nurses and doctors, software engineers' team, and pediatric health researchers worked together to develop this application. Three phases of stakeholders and user studies were conducted. Phase 1 focused on prototype development; Phase 2 involved cognitive interview and usability testing; Phases 3 focused on the pilot testing of this application. Results We included the original parent proxy reporting version of Patient Reported Outcome Measurement Information System-emotional distress in the application, as well as self- reporting animated version for young children. After many rounds of modification, all participants felt that this application was easy to use and the animated items were easy to understand for young children aged 5-7 years. Correlations between parents-children reports are significant and moderate, parents underestimated child depression, and overestimated child anger and anxiety compared to child self-report. Conclusions This smartphone application and its Web-based administration portal demonstrate good usability and are well accepted by young children aged 5-7 years, which can be used to promote young children's participation when reporting or assessing symptoms of young pediatric patients. Parent reports cannot be substituted for child reports and evaluations of pediatric patients' perspectives regarding treatment outcomes should be included in pediatric clinic. This animated application can be used as a smart measurement to investigate the symptoms for young children aged 5-7 years, so as to amplify young children's voice in clinical care. 120-P. Difficulties in conducting online surveys among children and adolescents using translated Short Form of the PROMIS Ped SF v2.0 – Depressive symptoms 8b and modified Korovessis questionnaire Emilia Wołyniec 1, Bożena Glinkowska.2, Wojciech Glinkowski 3,4,5 1Department of Rehabilitation, Medical University of Warsaw, Poland; 2Department of Sports and Physical Education, Medical University of Warsaw, Poland; 3Polish Telemedicine and eHealth Society, Warsaw, Poland; 4Center of Excellence "TeleOrto" for Telediagnostics and Treatment of Injuries and Disorders of the Locomotor System, Medical University of Warsaw, Warszawa, Poland; 5 Polish PROMIS National Center, Warsaw, Poland Correspondence: Wojciech Glinkowski (w.glinkowski@gmail.com) Introduction Surveys are one of the basic and commonly used measuring tools to describe the phenomenon of interest to us. A quick and straightforward to implement online form facilitates and shortens the time of the entire process. Tests and research instruments must meet the criterion of reliability. The study aimed to determine the difficulties that should be considered in conducted surveys among children and adolescents based on questionnaires about body posture, physical activity, back pain, and symptoms of depression. Aim of the study The research was carried out at randomly selected schools in Warsaw and Tczew. The study involved 85 teenagers attending elementary school classes. Material and methods The study was conducted in 2 groups (32 participants - average age 12.3 years and 53 participants average age 11.8 years). The study was conducted using the internet "mini-questionnaire" http://mini-ankieta.azurewebsites.net/ with the consent of the Bioethics Committee. In both groups, the study was conducted twice, with the second one after a one-week break, as recommended for the reliability studies. The questionnaire consisted of 53 items. Questions include data on age, weight, and height as well as on carrying a backpack/school bag, school and sports activity, and the presence of posture defects (Korovessis, Glinkowska). Besides, the PROMIS Ped SF v2.0 – Depressive Symptoms 8b (eight items) was used to assess the participants' mood for back and neck pain. The groups differed in information resources during the procedure of signing informed consent (standard vs. enriched with additional instructions and introduction to the problem of back and neck pain and problems with posture). Retest test reliability testing was performed, and Cronbach's alpha values were calculated using Medcalc version 19.1 software. Results In both groups, reliability in the questions asked sex, body build, and basic data from everyday life (e.g., backpack weight, number of hours spent on various school activities and outside school) showed good Cronbach's alpha results (> 0.7). In the group in the standard procedure, Cronbach's alpha values were insufficient (from 0.1 to 0.56), especially questions about sadness, weakness, fatigue, and exhaustion. Student information about themselves was highly consistent (Height - alpha 0.97; Weight 0.86). In the second group, data from 53 students about themselves were good - Cronbach's alpha> 0.7. Discussion The too-short range of information provided before testing among children and adolescents may result in low compliance of the responses in the test-tester, which could affect the reliability of the research instrument. Analysis of potential causes suggests that among the reasons there may have been motivational problems for the scrupulous and faithful answering of questions by children and adolescents. Conclusions Research confirms the need to inform children accurately and young people about issues related to surveys; otherwise, there is a risk of unreliable research. 121-O. Feasibility using PROMIS-CAT in a sports medicine center and regenerative medicine registry in outpatient setting Marc Gruner, Mark Nyman, Kelsey Wolff, Jacob Sellon, Karina Gonzalez All authors: Mayo clinic Rochester, MN Correspondence: Marc Gruner (grunerm@gmail.com) Background PROMIS-CAT is a patient reported outcome (PRO) tool used to assess the health status of patients. Prior to working on this pilot, many challenges existed for PRO collection. For example, patients coming to the sports medicine center, there were no universal PRO. Additionally, a regenerative medicine registry existed at the sports medicine center for ambulatory procedures via third party software (TPS). The regenerative registry had a low percentage follow-up outcome response and select providers were able to utilize the registry. This pilot aimed to evaluate the completion rate of PROMIS questionnaires among patients presenting for outpatient evaluation to the Sports Medicine. A second aim was to compare the completion rate of follow-up data of PROMIS/Epic data implementation to legacy measures/TPS. Methods PROMIS-CAT was implemented via the EHR using patient online services (POS) portal. The first aim consisted of collecting PROMIS-CAT Pain Interference (PI) and Physical function (PF) as the instruments for measuring outcomes on all patients coming to the sports medicine center. The second aim consisted of autonomously identifying ICD-10 codes prior to a regenerative procedure as lower body vs. upper body procedure. For a lower body procedure, PROMIS-CAT PI/PF was used. For an upper body procedure, PROMIS-CAT PI/UPF was used. Baseline and Follow up outcome measures were sent after a procedure at 6 weeks, 3 months, 6 months, 1 year, and 2 years. Results A review was performed monthly to assess evaluation of the first aim. 728/1028 patients seen in the Sports Medicine center completed PROMIS measures during one month for a completion percentage of 76%. A second aim was comparing regenerative registry data after a procedure via PROMIS/EHR to legacy measures/ TPS. At baseline, 95% of patients completed PROMIS measures to 83% for TPS. Six week data had 61% completion percentage of PROMIS compared to 52% of TPS. At 3 months, 53% completed PROMIS measures compared to 43% TPS. The PROMIS registry collected 57 procedures compared to 7 in the TPS during a three-month review. Conclusion EHR linked PROMIS had higher completion rates and allowed for tracking of significantly more procedures than the TPS. Using PROMIS-CAT via the EHR for registries can improve capture rate. 122-P. Digital application was superior to physical therapy for orthopedic knee injuries assessed by PROMIS® measures Marc Gruner, Jacob Sellon, Ike Hasley, Jared Hoffmann, Karina Gonzalez All authors: Mayo Clinic Rochester, MN Correspondence: Marc Gruner (grunerm@gmail.com) Background Knee pain is one of the most prevalent musculoskeletal disorders in the US. Physical therapy (PT) is often the initial treatment for conservative care. Efficacy of a PT exercise program delivered via a digital application (Limber Health app) compared to standard PT has not been thoroughly assessed. The use of PROMIS® measures for PT in orthopedic knee injuries is limited. The aim was that Digital Home-Exercise Therapy Application (DETA) will be superior to the standard of care (PT) after 8 weeks with respect to improvement in PROMIS® pain (PI) and function (PF) Computerized Assessment Test (CAT) measures. Methods This was a multi-center, prospective; single-blind randomized clinical trial comparing PT to DETA. A total of 60 patients prescribed PT were randomly assigned. The PT groups were assigned to therapy twice a week for 8 weeks. The DETA group was assigned to 15-25 minute videos 3 times a week for 8 weeks that were tailored based on the patient’s disability and health status. The DETA’s algorithm adjusted the intensity of DETA’s program progression based on results from a 4-week interim follow-up measuring changes in PROMIS® scores. The primary outcome was change in PROMIS® scores. Patients were reviewed at baseline and at 8 weeks. Results Thirty patients completed the 8 week intervention (17 control, 13 treatment) at the time of submission. No differences existed between the groups in age or gender (p>.05). Preliminary analysis suggests changes in PI (control: -1.8±7.8, Limber app: -6.3±6.7) and PF (control: 0.46±6.6, Limber app: 5.7±7.0). Independent t-tests revealed absolute changes in PROMIS Physical Function were significantly greater in the DETA group compared with control, indicating a greater improvement in function; a large effect size was noted (p<.05, Hedge’s g = 0.77). Changes in Physical Function and Pain Interference surpassed MCID in the Limber group, but not in the control group. Conclusion An 8-week DETA program was superior to the standard of care of PT program at the time of submission. The study supports that a DETA could have similar outcomes with respect to pain and function compared to PT. This study describes an innovative approach to risk stratify patients to appropriate exercise based off of their disability. 123-P. PROMIS and PROs in the Symptoms System - Visualizing health in clinical care Emelie Gustafson1, Martin Wohlin1, John Eric Chaplin2 1Uppsala University, Uppsala, Sweden; 2 University of Gothenburg, Gothenburg, Sweden Correspondence: Emelie Gustafson (emelie.gustafson@symptoms.se) OBJECTIVES This presentation will explore means to: Equalize asymmetry between needs and expectations in health care using patients’ perceptions about symptoms, functions and quality of life; Balance knowledge and preferences in point-of-care interactions, leading to better outcomes and enhanced value in health care; Impower patients to take responsibility for quality of care with scientifically based methods to contribute to safer, more efficient and equal care; Implement PROMIS and other PROM instruments in a patient-driven digital system, where the combination and visualization of PROMIS measures together with other PROMs facilitates usage, with benefits for both clinical care and patients. METHODS An evaluation protocol designed according to universal and co-design principles will be described. This will explore how to visualize results and combine PROMIS measures with other PROs, facilitate long-term implementation, support patient empowerment, self- management, and improve clinical care. A mixed-methods approach will be used to explore patient and multidisciplinary perspectives on the visualization of data, and the feasibility of implementation in clinical care and for patient self-management. RESULTS Measuring patient reported outcomes (PROs) with standardized questionnaires is a scientifically sound method to gain insight into patients’ symptoms, functions and quality of life. In certain contexts, PRO collection has been linked to increased survival, improved symptom management, and good treatment results in randomized studies. PROMIS provides a set of person-centered measures that evaluates and monitors physical, mental, and social health. With its generic approach, and possibilities for modern methods of administration, it offers great advantages over historical paper questionnaires and facilitates use at many stages both for clinical care and patients. This protocol will explore how to combine and visualize PROMIS measures together with legacy questionnaires. Processes to visualize data for patients as well as clinicians, while upholding the quality of the data collected, will be explored. In the presentation we will illustrate the visualizations tested. CONCLUSIONS Equalizing asymmetry between needs and expectations of PROs visualization for clinicians and patients requires careful consideration of the overall purpose of the data and health management. 124-P. Do PROMIS measures correlate with fitness and satisfaction with social roles in participants of a university wellness clinic? 1Jeff Houck,1Dan Kang, 2Mary Imboden 1School of Physical Therapy, George Fox University, Newberg, Oregon; 2School of Exercise Science, George Fox University, Newberg, Oregon Correspondence: Jeff Houck (jhouck@georgefox.edu) Objective Studies determining the concurrent validity of patient reported outcomes and performance outcomes are useful for application to clinical care. To determine the correlation (bivariate and multivariate) between a set of biopsychosocial PROMIS measures with 1) physiologic measure (VO2 Max) of fitness and 2) Satisfaction with Social Roles in attendees of a University Wellness Clinic. Methods From January to March 2020, 44 of 58 attendees (age=23.7±9.6 y.o., VO2 max=42.6±8.3 ml/kg/ml) of a University Wellness Clinic completed PROMIS computer adaptive tests (physical function [PF], pain interference [PI], fatigue, self-efficacy [SE] of managing emotions, SE of managing social, anxiety, depression and satisfaction with social roles[SSR]) and short forms (SE of daily activities [SF8]) in addition to physiologic testing (i.e. VO2 Max). Univariate correlations and multivariate linear analysis were used to assess the convergence of age, gender, and different PROMIS measures with 1) VO2 max and 2) PROMIS SSR. Results Age (r= -0.31, p=0.02), PF (r=0.46, p<0.01) and fatigue (r=-0.40, p<0.01) showed significant univariate convergence with VO2 max. Younger age, higher physical function and lower fatigue correlated with higher VO2 max values. A multivariate model including age (p=0.05), PROMIS PF (p=0.05), fatigue (p<0.01), and PI (p=0.04) resulted in a r-value of 0.62 for predicting VO2 max. Age (r= -0.40, p<0.01), PROMIS PF (r=0.44, p<0.01), PI (r=-0.44, p<0.01) and SE daily activities (r=0.35, p=0.02) showed significant convergence with SSR. Younger age, higher physical function, lower PI and higher SE with daily activities correlated with higher SSR values. A multivariate model including PROMIS PF (p<0.01), depression (p=0.05), and SE of emotions (p=0.02) resulted in a r-value of 0.56 for predicting SSR. Conclusions Perceptions of function detected by PROMIS measures associated with physical health rather than psychosocial health show better convergence with fitness in mostly younger people attending a Wellness Clinic. In contrast, measures of physical health (PF) and mental health (depression and SE emotions) showed convergence with satisfaction with social roles. These outcomes support the use of PROMIS measures of physical health to counsel young participants seeking to improve fitness and a combination of physical and mental health measures when focusing on social roles. 125-P. Is unacceptable self-efficacy associated with unacceptable physical health domain function and symptoms? Houck, Jeff,Kang, Dan, Philbrook, Li-Zandre, Jacobson, Ryan All authors: George Fox University, Newberg, OR, United States Correspondence: Jeff Houck (jhouck@georgefox.edu) Objective Interpretation and application of the Patient-Reported Outcomes Measurement Information System (PROMIS) Self-Efficacy for Managing Symptoms (SEsx) for orthopedic physical therapy patients is unclear. Self-efficacy is theorized to mediate PROMIS physical domain measures such as pain interference (PI), physical function (PF) and fatigue. However, no current studies document the association between acceptable levels of physical domain measures and self-efficacy. Although there are several self-efficacy measures, managing symptoms is thought to be the most applicable to orthopedic patients. The purpose of this analysis was to evaluate the associations between unacceptable SEsx with physical health domain measures (PF, PI, and Fatigue). Methods PROMIS computer adaptive tests (PF, PI, Fatigue, SEsx) were administered at initial evaluation(n=199) for spine (44.7%), lower extremity (35.7%), upper extremity (17.6%) and other reasons (2.0 %) in physical therapy. Unacceptable T-scores were coded (0,1): PF < 40, PI> 60, Fatigue>55, SE<45. Odds ratios (OR) and 95% confidence intervals (CI) were calculated to examine the associations of unacceptable SEsx with other unacceptable PROMIS measures. A logistic regression model including age, gender, unacceptable PROMIS PF, SEsx, and Fatigue was evaluated for ability to independently predict unacceptable PROMIS PI. Results Patient (age=42.5 (19.5), 60% female). The proportion of patients with unacceptable symptoms were: PF 33.5%; PI 52.5%; Fatigue 40.7%, and SEsx 46.7%. The proportion of patients with any unacceptable symptoms was 69.7%. A total of 14.6% reported all symptoms at unacceptable levels. Unacceptable SEsx was significantly associated with: unacceptable PI (OR = 8.3, CI 4.4 to 15.7), unacceptable PF (OR=7.5, 95%CI 3.8 to 14.9), and unacceptable Fatigue (OR=3.5, CI 1.9 to 6.2). Logistic regression showed that unacceptable PF (OR 8.20, CI 2.23 to 30.86) and unacceptable SEsx (OR 4.5, CI 2.2 to 9.3) were independent predictors of unacceptable PI. Conclusion The strong association of SEsx with PF and PI, and prevalence of unacceptable SEsx measures suggests providers should develop methods to address SEsx in patients with physical health measures indicating unacceptable function and symptoms. This finding supports the theory that addressing patient confidence and beliefs (SEsx) may enhance care directed at physical health. 126-P. Estimating power for clinical trials with PROMIS endpoints using Item Response Theory Jinxiang Hu, Yu Wang 1University of Kansas Medical Center Correspondence: Jinxiang Hu (jhu2@kumc.edu) Background Patient reported outcomes (PRO) are important in patient-centered health outcomes research, epidemiological studies, quality of life (QOL) studies, and clinical trials. Patient-Reported Outcomes Measurement Information System (PROMIS) is a set of standardized, generic PRO questionnaires developed for clinical and research purpose. In clinical trials, it is crucial to estimate power to avoid waste of resources while still able to detect the treatment effect. However, for clinical trials with PRO as end points, Classical Test Theory (CTT) using observed scores (e.g. total/ average scores) are routinely used for power estimation. The purpose of this project is to provide guidance for power and sample size estimate for clinical trials with PROMIS measures as endpoints using IRT. Methods Motivated from PROMIS depression scales (4a, 6a, 8a), we conducted a simulation study in order to estimate power differences between IRT- and CTT-based scoring for a two- armed prospective randomized clinical trial (control vs active arm). We simulated data using various sample size, allocation ratio, number of items, effect sizes, and missing data. Three models were fit to each simulation: IRT with MLE, IRT with Bayesian estimator, and CTT. Results and conclusion Our results showed missing data, effect size, and sample size are important indicators of IRT power. Number of items is not significantly associated with power. For rare diseases or early stage trials, it is important to use IRT framework for accurate power estimation. IRT and CTT both provides good power with large sample size and effect size. Future work can examine the IRT power for detecting change over time and non-normal distribution of latent scores. 127-O. Validation of PROMIS measure of itch impact and intensity in pediatric patients with Atopic Dermatitis Kathryn L Jackson, Jin-Shei Lai, Amy Paller, Cynthia Nowinski, Stephanie Rangel, Divya Ramachandran, Neha Puar, Vidhi Patel, Jonathan Silverberg, David Cella All authors: Northwestern University, Feinberg School of Medicine, Chicago Illinois, USA Correspondence: Kathryn Jackson (kathryn.jackson1@northwestern.edu) Background Itch is the most common symptom of pediatric skin diseases, including atopic dermatitis (AD), and greatly affects patient quality of life (QOL). Assessments of itch exist, but lack comprehensiveness and psychometric validity. To fill this gap, we have developed the new PROMIS Itch Questionnaire (PIQ-C). The PIQ-C was developed using mixed-methods approaches and consists of 45 unidimensional items, calibrated using a graded response model based on item responses from 600+ children with itch conditions. Here, we report clinical validity of the PIQ-C using cross-sectional and longitudinal data. Methods Children aged 8-17 were recruited from Chicago-area dermatology clinics. Children completed the PIQ-C and additional clinical assessments of disease severity/QOL (Itch NRS, EASI, POEM, IGA, CDLQI) at baseline and 6-month follow-up. Severity measures were categorized as mild/moderate/severe and change in severity from baseline to 6- and 12-months were calculated and categorized as improved/ same/worse change. Convergent validity was assessed by evaluating correlations of PIQ-C and an itch-related clinical measure at baseline. Known groups validity was assessed using one-way Analysis of Variance (ANOVA), modelling difference in PIQ-C score across severity group at baseline. Responsiveness to change was assessed using mixed linear regression; differences in change in PIC-Q score from baseline to six months was evaluated for differences between clinical change group. Results 181 patients aged 8-17 completed baseline PIQ-C; 59 completed the 6-month follow- up. At baseline, PIQ-C was highly correlated with CDLQI (0.73), POEM (0.64), and moderately correlated with Itch NRS (0.54). Significant increase in PIQ-C was found as severity of AD increased across all clinical measures used to define severity (p<0.05 for all). The PIQ-C was responsive to change across time; patients with improved clinical score also had a significantly improved PIQ-C, and the change in PIQ-C differed across improved/same/worse change groups in the expected direction (p<0.0001 for all). Conclusion The PIQ-C measure includes aspects of itch important to assessing overall symptoms and impact. Correlations with known measures, ability to distinguish among severity groups, and responsiveness across time suggest clinical validity. Next steps include evaluating replicability of results in patients from other clinics and validation in children with other itch conditions. 128-P. Investigating parameter stability in the presence of high slopes Aaron J Kaat1, Stein Arne Rimehaug2 1Northwestern University, Chicago IL; 2University of Oslo, Norway Correspondence: Aaron Kaat (aaron.kaat@northwestern.edu) Objective There is a growing recognition that large slopes in IRT models are not as desirable of a trait as originally believed. Larger slopes suggest greater information and thus higher reliability and a shorter computer adaptive testing experience; however, slopes may be inflated when the IRT model fails to account for locally-dependent item subsets, or when there is a preponderance of individuals at the floor or ceiling of the domain. The objective of this study was to investigate the sampling distribution of the PROMIS® Pain Interference 8-item short form (where slope inflation may be occurring) using data from PROMIS 1 Wave 1, using both a standard normal latent distribution and when estimating the latent distribution using Davidian curves. Methods We utilized general population data from PROMIS 1 Wave 1, for participants with item-level data on at least 5 of the 8 items from the Pain Interference short form. In order to investigate the effect of sample size on parameter stability, we conducted a bootstrap resampling of sample size 500, 750, and 1259 (i.e., the total eligible number of participants). The primary outcome was the slope estimates across replications. We utilized factorial analysis of variance to investigate whether the slopes were significantly different by latent density, sample size, and their interaction. Each item was analyzed separately. Results There was a main-effect for sample density in all 8 items, with higher slopes with DC-IRT models. The difference by sample size was less consistent, with only 3 items showing a difference in slopes by sample size. The interaction was nonsignificant for all items. Conclusions: Contrary to expectations, slopes were larger when the latent density was estimated using Davidian Curves. Additionally, there was a higher frequency of nonconvergence (even with 10,000 cycles) with DC-IRT models. The lack of significance for sample size was encouraging, insofar as it suggests the parameters are robust to sampling conditions. However, while the means were similar across sample sizes, the range varied more widely with the smaller sizes (as would be expected). Future research should evaluate whether a zero-inflated model would also provide consistent slope estimates as here. 129-P. Comparing PROMIS® Global Health-10 and EQ-5D: sensitivity to clinical cut-off scores for anxiety and depression. Kabakibi B.1, Chaplin JE.2,3, Wicksell R.4 1Dept. of Public and Global Health, Gothenburg University, Gothenburg, Sweden; 2Dept. of Paediatrics, Institute of Clinical Sciences, Sahlgrenska Academy at Gothenburg University, Gothenburg, Sweden; 3Swedish Association of Local Authorities and Regions (SALAR); 4Dept. of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden Correspondence: John Chaplin (john.chaplin@gu.se) Objective To investigate the psychometric properties of the Swedish translation of the GH-10 questionnaire. Methods PROMIS GH-10, EQ5D, GAD7 and PHQ9 were electronically collected from consecutive attendees of an emergency clinic from Sept 2018 to May 2019. Confirmatory factor analysis evaluated the two-factor structure of the GH-10: physical (PCS) and mental (MCS). Goodness-of-fit was defined as comparative fit index above .9 and standardized root mean square residual (SRMR) above 0.08. Internal consistency and discriminant validity were assessed. Analyses were repeated, stratified by cutoffs for clinical treatment, and sensitivity analysis was conducted using receiver operating characteristic (ROC) curves. Results Of 164 patients (58% female) aged 18–88 (mean: 49 years), 58% were in full-time employment; 56% were overweight or obese. The two-factor solution indicated acceptable CFI: .935, but the SRMR was .0567, thus below goodness-of-fit levels. Pain had the lowest factor score on PCS. Internal consistency for the two sub-domains was good: Cronbach’s alpha for PCS was 0.730, for MCS 0.862 and for the whole instrument 0.906. Hypothesized relationships between GH10 subdomains and the other instruments were confirmed and in line with previous published reports. Pearson’s correlations showed strong correlations of the mental health subscale to the PHQ-9 (r=0.702) and the GAD-7 (r=0.704). Moreover, the physical subscale of the GH-10 showed a good correlation with the EQ-5D index (r=0.550) and with the EQ-5D VAS (r=0.565). The area under the curve (AUC) of the MCS and PCS was higher than for EQ-5D against the GAD-7 PHQ-9 cutoffs. Conclusions Taking into account the sample size, the Swedish version of the GH10 has good psychometric properties. The less well performing item concerning pain should be investigated further. 130-O. A Comparison of the measurement properties of the PROMIS-Fatigue (MS) 8b against legacy fatigue questionnaires Paul Kamudoni1, Jeffrey Johns2, Karon Cook5, Rana Salem4, Sam Salek 2, 3, Jana Raab1, Rod Middleton6, Christian Henke1, Dagmar Amtmann4 1Global Evidence & Value Development – R&D, Merck Healthcare KgaA, Darmstadt; 2School of Life and Medical Sciences, University of Hertfordshire, Hatfield, UK; 3Institute of Medicines Development, Cardiff, UK; 4Department of Rehabilitation Medicine, University of Washington, Seattle, USA: 5Feral Scholars, Broaddus, Texas, USA; 6UK MS Register, Swansea Medical School, Swansea, UK Correspondence: Paul Kamudoni (paul.kamudoni@merckgroup.com) Objectives Amidst the growing number of patient-reported outcome (PRO) measures of fatigue being used in MS clinical trials and clinics, evidence-based consensus on generalizable and the most appropriate measures across different settings would be beneficial for clinical research as well as patient care. To compare the validity and responsiveness of the PROMIS SF v1.0 - Fatigue (MS) 8b with the Fatigue Severity Scale (FSS) and the Modified Fatigue Impact Scale (MFIS), across US and UK populations Methods Two observational studies were performed in MS populations, as part of a PRO measure development project, including a cross-sectional study in two tertiary MS centers in the US (n=296) (US sample) and a 96-week longitudinal study in the UK MS Register cohort (still ongoing) (n = 384) (UK sample). Analyses included examination of: 1) relative validity based on ability to discriminate across patient subgroups according to fatigue or functional status at baseline [i.e. ANOVA-F PRO X ÷ ANOVA-F PROMIS-F(MS)8b]; and 2) relative responsiveness, based on baseline-to-week-52 score change (Effect size) across fatigue or functional status response groups (UK sample only). Results The mean age was 44.5±11.2 / 50.7±9.4; and 74 %/ 75.9% were female (US /UK Samples). The mean PROMIS-F(MS)8b T-score at baseline was 57.4±10.5 / 59.9±9.4 (US sample / UK sample). Compared with the PROMIS-F (MS)8b, relative validity (anchor: GHS fatigue global question) was 86% for MFIS symptom score, 87% for MFIS total score, and 42% for the FSS. Relative to the FSS, PROMIS-F(MS) 8b scores were more sensitive to worsening (ES = -0.44 vs. -0.18) as well improvement (ES = 0.5 vs. 0.2) in fatigue (>=1- point increase/decrease in GHS fatigue global question) over 52 weeks of follow-up. A similar pattern of score change was observed based on other anchors. Conclusion The PROMIS-F(MS)8b scores showed a higher precision when differentiating levels of fatigue than the FSS or the MFIS physical or total scores, and higher responsiveness to fatigue changes than the FSS. These differences have practical implications on the application of these questionnaires in both clinical practice and research settings e.g. in sample size estimation in clinical trials. 131-P. Validation of the PROMIS® Pediatric Item Banks Anxiety and Depressive Symptoms in a general Dutch population Leonie H. Klaufus1,2, Michiel A.J. Luijten3,4, Eva Verlinden1, Marcel F. van der Wal1, Caroline B. Terwee4, Pim Cuijpers5, Mai J.M. Chinapaw2, Lotte Haverman3 1Public Health Service Amsterdam, Department of Epidemiology, Health Promotion, and Health Care Innovation, Amsterdam, Netherlands; 2Amsterdam UMC, Vrije Universiteit Amsterdam, Department of Public and Occupational Health, Amsterdam Public Health research institute, Amsterdam, Netherlands; 3Amsterdam UMC, Emma Children’s Hospital, Psychosocial Department, Amsterdam, Netherlands; 4Amsterdam UMC, Vrije Universiteit Amsterdam, Department of Epidemiology and Biostatistics, Amsterdam Public Health research institute, Amsterdam, Netherlands; 5Vrije Universiteit Amsterdam, Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health research institute, Amsterdam, Netherlands Correspondence: Leonie H. Klaufus (LKlaufus@ggd.amsterdam.nl) Objective This study aims to validate the Dutch-Flemish PROMIS pediatric item banks v2.0 Anxiety and Depressive Symptoms in a general Dutch population. Methods Participants (N = 2,893, aged 8 - 18), recruited by two certified internet panel agencies, completed the PROMIS pediatric item banks v2.0 Anxiety and Depressive Symptoms online. Both item banks were assessed on unidimensionality, local dependence, monotonicity, Graded Response Model (GRM) item fit, and differential item functioning (DIF) across gender, age groups, region, ethnicity, and language. The PROMIS pediatric Anxiety and Depressive Symptoms short forms 8a and simulated computerized adaptive testings (CATs) were assessed on reliability and construct validity compared to the Revised Child Anxiety and Depression Scale short version (RCADS-22) subscales. Results The PROMIS pediatric item banks v2.0 Anxiety and Depressive Symptoms showed sufficient unidimensionality (Omega H = 0.83, 0.95; ECV = 0.79, 0.93, respectively), local independence (residual correlations < 0.2), and monotonicity (H = 0.61, 0.69, respectively). Both item banks showed sufficient GRM item fit (S-X2 p-value < 0.001), except for the Depressive Symptoms items 2697R1r “I wanted to be by myself“, 7010 “I felt sad for no reason“, and 9001r “I felt too sad to eat”. No DIF was found for gender, age groups, region, ethnicity, and language, except for the Depressive Symptoms items 2697R1r “I wanted to be by myself” and 488R1r “I could not stop feeling sad” that showed uniform DIF for language (McFadden pseudo R 2 change > 2%). Based on U.S. parameters, the PROMIS pediatric Anxiety and Depressive Symptoms short forms 8a showed a reliability of > 0.90 in 2% and 34%, and the CATs in 26% and 41% of the participants, respectively. Both short forms and CATs revealed high positive correlations (r > 0.70) with the corresponding RCADS-22 subscales and slightly lower correlations with the non-corresponding RCADS-22 subscales (r ≤ 0.70). Conclusions The Dutch-Flemish PROMIS pediatric item banks v2.0 Anxiety and Depressive Symptoms show sufficient psychometric properties, except for four Depressive Symptoms items that show DIF for language or poor GRM item fit; the short forms 8a and CATs seem valid, but reliable for a small percentage of children. 132-P. Evaluation of a patient-reported frailty tool in Systemic Lupus Erythematosus Sarah B. Lieber1, Stephen Paget1,2, Jessica R. Berman,1,2 Medha Barbhaiya,1,2, Lisa R. Sammaritano,1,2, Kyriakos Kirou,1,2, John A. Carrino,1,2, Musarrat Nahid,2, Mangala Rajan,2, Dina Sheira1, Lisa A. Mandl1,2 1Hospital for Special Surgery, New York, NY; 2Weill Cornell Medicine, New York, NY Correspondence: Sarah B. Lieber, MD, MS (liebers@hss.edu) Objective Frailty is associated with disability in systemic lupus erythematosus (SLE). To our knowledge, no phenotypic frailty tool including objective/subjective domains has been compared to a validated point-of-care frailty measure in SLE. We evaluated the point-of-care self-reported FRAIL scale (FS) versus the standard Fried phenotype (FP) by comparing the prevalence of frailty as measured by both tools in a cohort of women with SLE. We also evaluated the association of each frailty measure with several patient-reported outcomes (PROs), comparing associations in frail versus non-frail women. Methods Adult women <70 years old with validated SLE and mild/moderate disease enrolled from one center. Measures included: frailty (FP/FS); disease activity/damage; and PROs (PRO Measurement Information System (PROMIS) computerized adaptive tests (CATs) and Valued Life Activities (VLA) disability). Differences between frail and non-frail participants were evaluated using Fisher’s exact or Wilcoxon rank sum tests and the association of frailty with disability using logistic regression. Correlation between the FP and the FS was determined using Spearman’s correlation. Results 72 women enrolled; 67 (93%) completed the FS. 17% (FP) and 27% (FS) were frail. Frail women according to either definition had greater disease damage (FP: p=0.002; FS: p=0.0006) and worse PROMIS CATs, including mobility, physical function, pain behavior and interference, and fatigue (FP and FS: all p<0.01). Compared with non-frail women, frail women classified by the FP had greater comorbidity (p=0.02); when classified by the FS, frail women were older (p=0.04) with worse PROMIS CAT depression (p=0.02). Frailty according to either definition was associated with VLA disability after adjustment for age, comorbidity, and disease activity (FP: p=0.02; FS: p=0.0003), but this relationship was attenuated for the FP after adjustment for disease damage (p=0.08). There was moderate correlation between the FS and the FP (r=0.48; p<0.0001). Conclusions Prevalence of patient-reported frailty was high in this cohort of women with SLE. Frailty, measured with either metric, was associated with worse PROs, providing face validity for both definitions. The FS was associated with disability even after adjustment for multiple confounders. These data suggest that the FS may be an informative point-of-care tool to identify frail women with SLE. 133-O. Patient-Reported Outcomes Measurement Information System (PROMIS) - Translation and cultural adaptation of Chinese version of severity of substance use Yan Rong, Yun Ting, Shang Meimei, Xu Juan, Huang Ame All authors: Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences) Correspondence: Yun Ting (yunting.love@qq.com) Introduction Patient-reported outcomes and listening to the true feelings of the patient are the hot spot in cancer research both in China and abroad recently. Given the increase in misuse and abuse of prescription opioids, clinicians clearly benefit from a standardized tool to screen screening opioid overuse. In 2009, the International Society for Pharmacoeconomics and Outcome Research (ISPOR), FDA, the Health-related quality of Life working Group and the International Association for quality of Life Research (ISQOL) jointly put forward that incorporating patient self-reporting data into the evaluation system of clinical decision-making, Combined with the patient self-reporting measurement system, it can help clinicians to better detect and screen abnormal drug use behavior, and lay the foundation for early intervention. Objectives The present study developed a Chinese version of the Severity of Substance Use, and incorporated into the Patient-Reported Outcomes Measurement Information System to promote domestic opioid abuse screening, improve drug evaluation and promote clinical nursing and drug management. Methods After applying for authorization from the American PROMIS data management center, the translation method of FACIT (Functional Assessment of Chronic Illness Therapy) was adopted. After simultaneous forward translations, reconciliation, back-translation, expert review and proofreading, the first translation draft was formed and submitted to the PNC-China center for quality review. On the basis of the review, cognitive interviews were conducted among 5 cancer patients (at least 5 patients in each item) who were eligible for inclusion, and the interviewees pointed out the items and phrases that were difficult to understand, as well as the possible difficulties in the answer process. The interviews with each patient were recorded and recorded with their consent. The head of the translation team will sort out the patient's feedback, and the cultural mediator will provide the appropriate translation plan with reference to the patient's opinion. After cultural debugging, the final Chinese version of the drug use severity scale was formed. Results A Chinese version of the severity scale of drug use was formed. Conclusion We provide a culturally adjusted Chinese version of screening tool for drug abuse in China, and the translation has gone through a standardized process and cultural debugging, which can be used to screen drug abuse in China. 134-P. A PROMISing prospect of measuring pediatric general health: A comparison of the PROMIS® pediatric Global Health scale (PGH-7) and the Pediatric Quality of Life Inventory (PedsQLTM). Michiel A. J. Luijten1,2, Lotte Haverman1, Raphaële R.L. van Litsenburg3,4, Leo D. Roorda5, Martha A. Grootenhuis3, Caroline B. Terwee2 1 Emma Children’s Hospital, Amsterdam UMC, University of Amsterdam, Psychosocial Department, Amsterdam, the Netherlands; 2 Amsterdam UMC, Vrije Universiteit, Epidemiology and Biostatistics, Amsterdam, the Netherlands; 3 Princess Máxima Center for Pediatric Oncology, Utrecht, the Netherlands; 4 Emma’s Children’s Hospital, Amsterdam UMC, Vrije Universiteit Amsterdam, 5Pediatric Oncology, Cancer Center Amsterdam, Amsterdam, the Netherlands Correspondence: Michiel A.J. Luijten (m.a.luijten@amc.nl) Objective On February 18th 2020 the International Consortium for Health Outcomes Measurement (ICHOM) announced the release of the Standard Set for overall pediatric health. This outcome set contains the Patient-Reported Outcomes Measurement Information System (PROMIS) Pediatric Scale v1.0 Global Health (PGH-7+2) for measuring overall physical, mental and social health. Our aim was to assess the psychometric properties of the PGH-7 in the Dutch population and to compare the performance of the PGH-7 with the Pediatric Quality of Life Inventory (PedsQLTM). Methods Children aged 8-18 years (n=2654), representative of the Dutch population on key demographics were asked to complete the PGH-7 (nitems=7) and the PedsQL (nitems=23). To assess structural validity of the PGH-7 a graded response model (GRM) was fitted to the data after assessing the following assumptions: Unidimensionality through CFA (CFI>.95, TLI>.95, RMSEA<.10), local independence by residual correlations (r<.20) and monotonicity by Mokken analysis (H>.50, H i >.30). Item fit of the GRM model was inspected with S-X2, where p<.001 indicates misfit. Additionally, convergent validity of the PGH-7 T-score with the PedsQL total score was assessed. A moderately strong correlation (>.50) was expected, as both instruments measure physical, mental and social domains. Percentage of participants reliably measured was assessed using the standard error of measurement (SEM) <0.32 as a criterion (which equals a reliability of 0.90). Relative efficiency was calculated (1- SEM2)/nitems) to compare how well both instruments perform relative to the amount of items administered. Results In total 1082 (response rate = 40.8%) children completed both questionnaires. All GRM assumptions were met. PGH-7 displayed good structural (no misfit) and convergent (r=.65) validity. Both questionnaires measured reliably (nPGH-7=74.5%, nPedsQL=76.6%) at the mean and 2SD in clinically relevant direction. The relative efficiency of the PGH-7 was 2.6 in comparison to the PedsQL, indicating that, on average, the items in the PGH-7 are 2.6 times more informative than PedsQL items. Conclusions The PGH-7 displays sufficient reliability and validity in the general Dutch pediatric population. The scale measures more efficiently than the most commonly used legacy instrument (PedsQL). 135-P. How the COVID-19 pandemic impacts the psychosocial well-being of children and adolescents in the Netherlands Michiel A.J. Luijten1,2, Maud M. van Muilekom1, Lorynn Teela1, Hedy A. van Oers1, Kim J. Oostrom1, Lotte Haverman1 1Amsterdam UMC, University of Amsterdam, Department of Child and Adolescent Psychiatry, Pediatric Psychology and Psychosocial Care, Emma Children’s Hospital, Amsterdam Public Health, Meibergdreef 11, Amsterdam, the Netherlands;2Amsterdam UMC, Vrije Universiteit Amsterdam, Department of Epidemiology and Biostatistics, Amsterdam Public Health, De Boelelaan 1117, Amsterdam, the Netherlands Correspondence: Michiel A.J. Luijten, MSc; (m.a.luijten@amc.nl) Objective Recent measures of implementing social isolation and physical distancing as governmental reactions to the COVID-19 outbreak profoundly impact daily life, including that of children and adolescents. Suddenly children and adolescents were not allowed to go to school or participate in sports or other socializing activities anymore. It is therefore relevant to investigate the impact of these measures on psychosocial outcomes in children and adolescents in the general population. In this study we surveyed how the COVID-19 outbreak impacts the psychosocial functioning in a sample of Dutch children and adolescents during the first months of lockdown in one of the largest public health crisis of our time. Methods In April 2020, children and adolescents aged 8-18 years, representative of the Dutch population on key demographics, were asked to complete the following Patient-Reported Outcomes Measurement Information System (PROMIS®) computerized adaptive tests (CATs); anger, anxiety, depressive symptoms, peer relationships, sleep-related impairment and the global health scale, online using the KLIK PROM portal (www.hetklikt.nu). In addition, parents were asked to complete sociodemographic questions about themselves (age, ethnicity, education level) and their child (age, gender, education level and presence of chronic conditions). Finally, both children and parents answered COVID-19 specific questions such as consequences for employment, school and the atmosphere at home. Using independent sample T-tests, PROMIS COVID-19 T-scores were compared to normative data that were collected in the general population pre-COVID (2018; n=1098). Additionally, the same data was gathered simultaneously in a sample of chronically ill children/adolescents and a sample of pediatric psychiatric patients. Results and Conclusion In total, 902/90/265 children and parents completed all questionnaires for respectively the general population/chronically ill/psychiatric samples. Preliminary results indicate that during the COVID quarantine, children scored significantly (p < 0.001) lower on all domains measured by the PROMIS CATs when compared to pre-quarantine normative data. Children and families experience the quarantine differently, as some children indicate that the atmosphere at home has improved, while others indicate a decline in atmosphere. However, further analyses are required to compare groups on background characteristics and to determine possible relevant covariates that may impact psychosocial functioning. These results will be shown at the conference 136-O. Integrating PROMIS CAT collection into Epic: tips for success Eric C. Makhni Henry Ford Health System, Detroit, MI USA Correspondence: Eric C. Makhni (ericmakhnimd@gmail.com) Objective There are many significant challenges in implementing PROMIS CAT collection for effective and efficient population health applications. One of the biggest challenges is in effectively integrating this platform with daily clinical operations through the electronic medical record. While third-party platforms offer numerous advantages with regards to customization that may be appealing to medical providers, they can be costly and do not fully integrate into the electronic medical record. The purpose of this presentation is to highlight technical and practical key steps to effectively developing a PROMIS CAT platform within a widely used electronic medical record (Epic, Verona, WI, USA). Methods A PROM platform was designed with the following objectives: 1) electronic questionnaire assignment fully integrated through the native EHR on a 2) population basis through the orthopedic department, such that all ambulatory patients (and not just surgical patients) received questionnaires. The primary outcome was questionnaire completion rate during an initial pilot implementation. Secondary outcomes included completion rates by questionnaire type, patient age (<45 years, 45-64 years, and 65+ years), and visit type (new or follow-up patient), along with psychometric data of included questionnaires. Results An automated PROM platform was created through the native workflow and EHR, without the hiring of any additional personnel, utilizing National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) computer adaptive test (CAT) questionnaires. Among the first 1,930 ambulatory encounters and 8,383 questionnaires administered, there was an overall completion rate of 86%, with no questionnaire type completed less than 80% of the time. Questionnaire completion rate among the two youngest age groups (<45 and 45-64 years) was approximately 87%, compared to 83% among patients 65 and older. New patient questionnaire completion rate was 91%, compared to 81% for follow-up patients. There were favorable floor and ceiling effects for all PROMIS questionnaires, with the exception of PROMIS Depression, which had a high floor effect. Conclusions The results of this pilot study demonstrate feasibility of administering PROMs on a population basis through an EHR. The questionnaire completion rate of (86%) exceeded the target for this pilot phase (60%) and for steady-state implementation (80%). This methodology can serve as a model for effective PROM collection. 137-P. Design, development, and implementation of an integrated and automated patient reported outcome measure platform through a native electronic health record: Results from the first 2,000 ambulatory encounters and 8,400 questionnaires administered Eric C. Makhni, Jason Davis, Michael Charters, Stephanie Muh, Kelechi Okoroha, Charles S. Day, Theodore Parsons All authors: Henry Ford Health System, Detroit, MI, USA Corresponding Author: Eric C. Makhni (ericmakhnimd@gmail.com) Background Patient reported outcome measures (PROMs) represent the gold standard for reporting patient-centric health state measures in orthopedics. However, routine collection of PROMs in the busy ambulatory setting is challenging due to a number of constraints. The purpose of this study was to design and implement a successful PROM platform through a native electronic health record (EHR). Methods A PROM platform was designed with the following objectives: 1) electronic questionnaire assignment fully integrated through the native EHR on a 2) population basis through the orthopedic department, such that all ambulatory patients (and not just surgical patients) received questionnaires. The primary outcome was questionnaire completion rate during an initial pilot implementation. Secondary outcomes included completion rates by questionnaire type, patient age (<45 years, 45-64 years, and 65+ years), and visit type (new or follow-up patient), along with psychometric data of included questionnaires. Results An automated PROM platform was created through the native workflow and EHR, without the hiring of any additional personnel, utilizing National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) computer adaptive test (CAT) questionnaires. Among the first 1,930 ambulatory encounters and 8,383 questionnaires administered, there was an overall completion rate of 86%, with no questionnaire type completed less than 80% of the time. Questionnaire completion rate among the two youngest age groups (<45 and 45-64 years) was approximately 87%, compared to 83% among patients 65 and older. New patient questionnaire completion rate was 91%, compared to 81% for follow-up patients. There were favorable floor and ceiling effects for all PROMIS questionnaires, with the exception of PROMIS Depression, which had a high floor effect. Conclusions The results of this pilot study demonstrate feasibility of administering PROMs on a population basis using a native electronic health record. The questionnaire completion rate of (86%) exceeded the target for this pilot phase (60%) and for steady-state implementation (80%). This methodology can serve as a model for effective PROM collection. 138-P. Role of pre-operative PROMIS scores in predicting post-operative outcomes and likelihood of achieving MCID following arthroscopic rotator cuff repair Joseph S. Tramer, Sreten Franovic, Noah Kuhlmann, Colin Schlosser, Alex Pietroski, Vasilios Moutzouros, Stephanie J Muh, Eric C. Makhni All authors: Henry Ford Health System, Detroit, MI, USA Corresponding Author: Eric C. Makhni (ericmakhnimd@gmail.com) Background The Patient-Reported Outcomes Measurement Information System (PROMIS) has emerged as a valid and efficient means of collecting outcomes in patients with rotator cuff tears. The purpose of this study was to examine the role of pre-operative PROMIS computer adaptive test (CAT) scores in predicting post-operative PROMIS CAT scores, as well as likelihood of achieving minimal clinically important difference (MCID) following rotator cuff repair. We hypothesize that pre-operative PROMIS CAT scores will directly impact both post- operative PROMIS CAT scores and likelihood of achieving MCID. Methods Patients undergoing arthroscopic rotator cuff repair by one of three fellowship-trained surgeons were identified over a 12-month period. Only patients that completed both pre- operative and post-operative PROMIS CAT assessments were included in this cohort. PROMIS CAT forms for upper extremity physical function (PROMIS-UE), pain interference (PROMIS- PI), and depression (PROMIS-D) were utilized. MCID was calculated according to both distribution-based (db) and anchor-based (ab) methodology, and receiver operating characteristics (ROC) were utilized to determine if pre-operative scores were predictive of post- operative outcomes, with 95% specificity. Results One hundred and seventeen rotator cuff repair patients were included for statistical analysis with surveys completed an average of 29±36 days before and 243±117 days after surgery. PROMIS-UE improved from 30.3 to 38.7 (p<0.05), PROMIS-PI improved from 62.7 to 53.3 (p<0.05), and PROMIS-D improved from 47.4 to 44.3. The average change from pre- operative scores to post operative scores in PROMIS-UE and PROMIS-PI exceeded their dbMCIDs of +3.3 and -2.8, respectively. Similarly, PROMIS-UE, PROMIS-PI, and PROMIS-D exceeded their abMCIDs of 27 +3.1, -4.7, and -3.1, respectively. The percent of patients who met dbMCID for PROMIS-UE, PROMIS-PI and PROMIS-D was 67.8%, 75.4%, and 37.5%, respectively. After introduction of 95% specificity cutoffs, percentage of patients achieving dbMCID for PROMIS-UE, PROMIS PI, and PROMIS-D increased to 86.7%, 88.9%, and 50.0%, respectively. Similarly, the cohort’s probability of achieving abMCID for PROMIS-UE, PROMIS-PI, and PROMIS-D was 66.7%, 64.7%, and 48.2%, respectively. When prognostic cutoffs were introduced, probability of achieving abMCID for PROMIS-UE, PROMIS-PI, and PROMIS-D all increased to 86.7%, 83.3%, and 66.7%, respectively. Conclusion Arthroscopic rotator cuff repair is an effective surgery for symptomatic patients with rotator cuff tears, resulting in improvements of PROMIS-UE, PROMIS-PI, and PROMIS-D. Pre-operative PROMIS CAT domain scores can be utilized to predict likelihood of achieving or failing to achieve significant improvement across all three health domains. 139-P. Presence of preoperative clinical depression does not hinder recovery after anterior cruciate ligament reconstruction Eric Guo, Austin Cross, Luke Hessburg, Dylan Koolmes, David Bernstein, Vasilios Moutzouros, Eric C. Makhni All authors: Henry Ford Health System, Detroit, MI, USA Correspondence: Eric C. Makhni (ericmakhnimd@gmail.com) Background Current literature suggests a link between psychosocial factors and poor surgical outcomes in patients with musculoskeletal complaints. However, there is a limited body of literature examining the effect of depression on outcomes after anterior cruciate ligament reconstruction (ACLR). The goal of this study is to determine the prevalence of depression in ACLR patients and evaluate its effect on patient-reported outcomes postoperatively. Methods In this single center retrospective cohort study, 121 pediatric and adult patients who underwent ACLR were included. PROMIS Physical Function (PF), Pain Interference (PI) and Depression (D) scores collected preoperatively and six months postoperatively were reviewed. A PROMIS D ≥ 55 served as a validated threshold for clinical depression. Patients were separated into clinical depression (CD) and no clinical depression (NCD) groups based on preoperative PROMIS D score. Results 121 patients undergoing ACLR were included in this study. 24 (20%) patients met criteria for clinical depression. Preoperatively, the CD group reported lower mean PROMIS PF (34.6 vs. 40.2, [p < 0.01], higher PROMIS PI (65.1 vs. 59.1, [p< 0.01]) than those in the NCD group. Postoperatively, the mean PROMIS PF scores for the CD and NCD group were 48.7 and 51.0, respectively (p = 0.2). Mean postoperative PROMIS PI scores for the CD and NCD cohorts were 52.3 and 48.1, respectively (p = 0.04). After ACLR, there was substantial improvement in PROMIS PF, PROMIS PI in both the CD (+14.1 and -12.8, respectively) and NCD cohorts (+10.8 and -10.4, respectively). Conclusion Prevalence of preoperative depression in ACLR patients could be as high as 20%. Despite high prevalence of depression preoperatively, there is a significant increase – which exceeds currently accepted MCID values - in PROMIS PF scores after ACLR regardless of presence of preoperative clinical depre