The Opportunities and Shortcomings of Using Big Data and National Databases for Sarcoma Research

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The rarity and heterogeneity of sarcomas makes performing appropriately powered studies challenging and magnifies the significance of large databases in sarcoma research. Established large tumor registries and population-based databases have become increasingly more relevant to answer clinical questions regarding sarcoma incidence, treatment patterns, and outcomes. However, the validity of large databases has been questioned and scrutinized due to inaccuracy and wide variability of coding practices and absence of clinically relevant variables. Additionally, the utilization of large databases for the study of rare cancers like sarcoma may be particularly challenging secondary to known limitations of administrative data and poor overall data quality. Currently there are several large national cancer databases including the Surveillance, Epidemiology, and End Results (SEER) database, the American College of Surgeons’ and American Cancer Society’s National Cancer Database (NCDB), and the Center for Disease Control (CDC) National Program of Cancer Registries (NPCR). These are often used for sarcoma research but these databases are limited by a dependence on administrative or billing data, the lack of agreement between chart abstractors on diagnosis codes, and the use of preexisting documented hospital diagnosis codes for tumor registries leading to significant underestimation of sarcomas in large datasets. Current and future initiatives to improve databases and big data applications for sarcoma research include increasing the utilization of sarcoma-specific registries and encouraging national initiatives to expand on real-world evidence based datasets.

Precis:

The main aim of this article is to demonstrate the limitations of these databases specifically for sarcoma research. We also describe current initiatives formed to improve the application of big data for rare malignancies.

Related collections

Author and article information

Journal

Journal ID (nlm-journal-id): 0374236

Journal ID (pubmed-jr-id): 2771

Journal ID (nlm-ta): Cancer

Journal ID (iso-abbrev): Cancer

Title: Cancer

ISSN (Print): 0008-543X

ISSN (Electronic): 1097-0142

Publication date Nihms-submitted: 11 April 2019

Publication date (Electronic): 15 May 2019

Publication date (Print): 01 September 2019

Publication date PMC-release: 01 September 2020

Volume: 125

Issue: 17

Pages: 2926-2934

Affiliations

[1 ]Department of Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115

[2 ]Department of Emergency Medicine, Brigham and Women’s Hospital, Harvard Medical School Boston, MA, 02115

[3 ]Center for Sarcoma and Bone Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115

Author notes

Corresponding Author: Heather Lyu, MD, Brigham and Women’s Hospital, Department of Surgery, 75 Francis St, Boston, MA 02115, Phone: 703-965-9392, hlyu@ 123456bwh.harvard.edu

Author information

DR. Heather Lyu http://orcid.org/0000-0001-7759-0799

Article

Accession ID: PMC6690764 Pmcid ID: PMC6690764 Pmc-uid ID: 6690764 Manuscript ID: nihpa1018980

DOI: 10.1002/cncr.32118

PMC ID: 6690764

PubMed ID: 31090929

SO-VID: 89df7aaa-3ddf-49ca-96c8-8e27036d5b4c

History

Comments

Comment on this article

scite_

Cited by 16

See all cited by