Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Myopathies are a heterogenous collection of disorders characterized by dysfunction of skeletal muscle. In practice, myopathies are frequently encountered by physicians and precise diagnosis remains a challenge in primary care. Molecular expression profiles show promise for disease diagnosis in various pathologies. We propose a novel machine learning-based clinical tool for predicting muscle disease subtypes using multi-cohort microarray expression data.

          Materials and methods

          Muscle tissue samples originating from 1260 patients with muscle weakness. Data was curated from 42 independent cohorts with expression profiles in public microarray gene expression repositories, which represent a broad range of patient ages and peripheral muscles. Cohorts were categorized into five muscle disease subtypes: immobility, inflammatory myopathies, intensive care unit acquired weakness (ICUAW), congenital, and chronic systemic disease. The data contains expression data on 34,099 genes. Data augmentation techniques were used to address class imbalances in the muscle disease subtypes. Support vector machine (SVM) models were trained on two-thirds of the 1260 samples based on the top selected gene signature using analysis of variance (ANOVA). The model was validated in the remaining samples using area under the receiver operator curve (AUC). Gene enrichment analysis was used to identify enriched biological functions in the gene signature.

          Results

          The AUC ranges from 0.611 to 0.649 in the observed imbalanced data. Overall, using the augmented data, chronic systemic disease was the best predicted class with AUC 0.872 (95% confidence interval (CI): 0.824–0.920). The least discriminated classes were ICUAW with AUC 0.777 (95% CI: 0.668–0.887) and immobility with AUC 0.789 (95% CI: 0.716–0.861). Disease-specific gene set enrichment results showed that the gene signature was enriched in biological processes including neural precursor cell proliferation for ICUAW and aerobic respiration for congenital (false discovery rate q-value < 0.001).

          Conclusion

          Our results present a well-performing molecular classification tool with the selected gene markers for muscle disease classification. In practice, this tool addresses an important gap in the literature on myopathies and presents a potentially useful clinical tool for muscle disease subtype diagnosis.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          SMOTE: Synthetic Minority Over-sampling Technique

          An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A survey on Image Data Augmentation for Deep Learning

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts

              Background Missing data may seriously compromise inferences from randomised clinical trials, especially if missing data are not handled appropriately. The potential bias due to missing data depends on the mechanism causing the data to be missing, and the analytical methods applied to amend the missingness. Therefore, the analysis of trial data with missing values requires careful planning and attention. Methods The authors had several meetings and discussions considering optimal ways of handling missing data to minimise the bias potential. We also searched PubMed (key words: missing data; randomi*; statistical analysis) and reference lists of known studies for papers (theoretical papers; empirical studies; simulation studies; etc.) on how to deal with missing data when analysing randomised clinical trials. Results Handling missing data is an important, yet difficult and complex task when analysing results of randomised clinical trials. We consider how to optimise the handling of missing data during the planning stage of a randomised clinical trial and recommend analytical approaches which may prevent bias caused by unavoidable missing data. We consider the strengths and limitations of using of best-worst and worst-best sensitivity analyses, multiple imputation, and full information maximum likelihood. We also present practical flowcharts on how to deal with missing data and an overview of the steps that always need to be considered during the analysis stage of a trial. Conclusions We present a practical guide and flowcharts describing when and how multiple imputation should be used to handle missing data in randomised clinical. Electronic supplementary material The online version of this article (10.1186/s12874-017-0442-1) contains supplementary material, which is available to authorized users.
                Bookmark

                Author and article information

                Contributors
                dossantosC@unityhealth.to
                pingzhao.hu@umanitoba.ca
                Journal
                J Transl Med
                J Transl Med
                Journal of Translational Medicine
                BioMed Central (London )
                1479-5876
                30 November 2020
                30 November 2020
                2020
                : 18
                : 454
                Affiliations
                [1 ]GRID grid.17063.33, ISNI 0000 0001 2157 2938, Division of Biostatistics, Dalla Lana School of Public Health, , University of Toronto, ; Toronto, ON Canada
                [2 ]GRID grid.415502.7, Keenan Research Center for Biomedical Science, , St. Michael’s Hospital, ; Toronto, ON Canada
                [3 ]GRID grid.17063.33, ISNI 0000 0001 2157 2938, Institute of Medical Sciences and Department of Medicine, , University of Toronto, ; Toronto, ON Canada
                [4 ]GRID grid.17063.33, ISNI 0000 0001 2157 2938, Interdepartmental Division of Critical Care, , St. Michael’s Hospital, University of Toronto, ; 30 Bond Street, Room 4-008, Toronto, ON M5B 1WB Canada
                [5 ]GRID grid.21613.37, ISNI 0000 0004 1936 9609, Department of Biochemistry and Medical Genetics, , University of Manitoba, ; 745 Bannatyne Avenue, Winnipeg, MB R3E 0J9 Canada
                [6 ]GRID grid.470367.1, Research Institute in Oncology and Hematology, ; Winnipeg, MB Canada
                Author information
                http://orcid.org/0000-0002-9546-2245
                Article
                2630
                10.1186/s12967-020-02630-3
                7708151
                33256785
                29a59fb8-9b65-4554-be3f-7a626fa7040d
                © The Author(s) 2020

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 21 October 2020
                : 23 November 2020
                Categories
                Research
                Custom metadata
                © The Author(s) 2020

                Medicine
                muscle diseases,machine learning,microarray,clinical tool,biomarker
                Medicine
                muscle diseases, machine learning, microarray, clinical tool, biomarker

                Comments

                Comment on this article