Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Rare diseases affect approximately 400 million people worldwide. Many of them suffer from delayed diagnosis. Among them, NPHP1-related renal ciliopathies need to be diagnosed as early as possible as potential treatments have been recently investigated with promising results. Our objective was to develop a supervised machine learning pipeline for the detection of NPHP1 ciliopathy patients from a large number of nephrology patients using electronic health records (EHRs).

          Methods and results

          We designed a pipeline combining a phenotyping module re-using unstructured EHR data, a semantic similarity module to address the phenotype dependence, a feature selection step to deal with high dimensionality, an undersampling step to address the class imbalance, and a classification step with multiple train-test split for the small number of rare cases. The pipeline was applied to thirty NPHP1 patients and 7231 controls and achieved good performances (sensitivity 86% with specificity 90%). A qualitative review of the EHRs of 40 misclassified controls showed that 25% had phenotypes belonging to the ciliopathy spectrum, which demonstrates the ability of our system to detect patients with similar conditions.

          Conclusions

          Our pipeline reached very encouraging performance scores for pre-diagnosing ciliopathy patients. The identified patients could then undergo genetic testing. The same data-driven approach can be adapted to other rare diseases facing underdiagnosis challenges.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s13023-024-03063-7.

          Related collections

          Most cited references51

          • Record: found
          • Abstract: found
          • Article: not found

          The Unified Medical Language System (UMLS): integrating biomedical terminology.

          The Unified Medical Language System (http://umlsks.nlm.nih.gov) is a repository of biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over 2 million names for some 900,000 concepts from more than 60 families of biomedical vocabularies, as well as 12 million relations among these concepts. Vocabularies integrated in the UMLS Metathesaurus include the NCBI taxonomy, Gene Ontology, the Medical Subject Headings (MeSH), OMIM and the Digital Anatomist Symbolic Knowledge Base. UMLS concepts are not only inter-related, but may also be linked to external resources such as GenBank. In addition to data, the UMLS includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap). The UMLS knowledge sources are updated quarterly. All vocabularies are available at no fee for research purposes within an institution, but UMLS users are required to sign a license agreement. The UMLS knowledge sources are distributed on CD-ROM and by FTP.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Human Phenotype Ontology in 2021

            Abstract The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence

                Bookmark

                Author and article information

                Contributors
                carole.faviez@inserm.fr
                Journal
                Orphanet J Rare Dis
                Orphanet J Rare Dis
                Orphanet Journal of Rare Diseases
                BioMed Central (London )
                1750-1172
                10 February 2024
                10 February 2024
                2024
                : 19
                : 55
                Affiliations
                [1 ]GRID grid.417925.c, Centre de Recherche des Cordeliers, , Université Paris Cité, Sorbonne Université, INSERM UMR 1138, ; 75006 Paris, France
                [2 ]GRID grid.5328.c, ISNI 0000 0001 2186 3954, Inria, ; 75012 Paris, France
                [3 ]GRID grid.462336.6, Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, ; 75015 Paris, France
                [4 ]GRID grid.508487.6, ISNI 0000 0004 7885 7602, Department of Pediatric Nephrology, APHP-Centre, Reference Center for Inherited Renal Diseases (MARHEA), Imagine Institute, , Hôpital Necker-Enfants Malades, Université Paris Cité, ; 75015 Paris, France
                [5 ]GRID grid.508487.6, ISNI 0000 0004 7885 7602, Laboratory of Renal Hereditary Diseases, INSERM UMR 1163, Imagine Institute, , Université Paris Cité, ; 75015 Paris, France
                [6 ]GRID grid.508487.6, ISNI 0000 0004 7885 7602, Nephrology and Transplantation Department, MARHEA, , Hôpital Necker-Enfants Malades, AP-HP, Université Paris Cité, ; 75015 Paris, France
                [7 ]Département d’informatique Médicale, Hôpital Necker-Enfants Malades, AP-HP, ( https://ror.org/05tr67282) 75015 Paris, France
                Author information
                http://orcid.org/0000-0002-1500-0236
                Article
                3063
                10.1186/s13023-024-03063-7
                10858490
                38336713
                1f47eabe-6eac-41b4-8155-1077d858dc79
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 29 August 2023
                : 3 February 2024
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100001665, Agence Nationale de la Recherche;
                Award ID: ANR-17-RHUS-0002
                Award ID: ANR-19-P3IA-0001
                Categories
                Research
                Custom metadata
                © Institut National de la Santé et de la Recherche Médicale (INSERM) 2024

                Infectious disease & Microbiology
                diagnosis support,electronic health record,supervised machine learning,semantic similarity,imbalanced dataset,rare disease

                Comments

                Comment on this article