5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Machine learning (ML) methodology used in the social and health sciences needs to fit the intended research purposes of description, prediction, or causal inference. This paper provides a comprehensive, systematic meta-mapping of research questions in the social and health sciences to appropriate ML approaches by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, counterfactual prediction, and causal structural learning to common research goals, such as estimating prevalence of adverse social or health outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes, and explain common ML performance metrics. Such mapping may help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research.

          Abstract

          Abstract

          ML methods are comprehensively reviewed for their optimal use in epidemiology, demography, psychology, health care, and economics.

          Related collections

          Most cited references123

          • Record: found
          • Abstract: found
          • Article: not found

          SMOTE: Synthetic Minority Over-sampling Technique

          An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Sparse inverse covariance estimation with the graphical lasso.

            We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: SupervisionRole: Writing - original draftRole: Writing - review & editing
                Role: VisualizationRole: Writing - original draftRole: Writing - review & editing
                Role: MethodologyRole: Writing - review & editing
                Role: Conceptualization
                Role: ConceptualizationRole: SupervisionRole: Writing - review & editing
                Role: Writing - review & editing
                Role: ConceptualizationRole: VisualizationRole: Writing - original draftRole: Writing - review & editing
                Journal
                Sci Adv
                Sci Adv
                sciadv
                advances
                Science Advances
                American Association for the Advancement of Science
                2375-2548
                October 2022
                19 October 2022
                : 8
                : 42
                : eabk1942
                Affiliations
                [ 1 ]Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg.
                [ 2 ]Department of Epidemiology and Population Health, Stanford University, Palo Alto, CA, USA.
                [ 3 ]Department of Engineering, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
                [ 4 ]Centre for Dementia Prevention, University of Edinburgh, Edinburgh, UK.
                [ 5 ]Ohio University, Athens, OH, USA.
                [ 6 ]School of Mathematics, University of Edinburgh, Edinburgh, UK.
                Author notes
                [* ]Corresponding author. Email: anja.leist@ 123456uni.lu
                Author information
                https://orcid.org/0000-0002-5074-5209
                https://orcid.org/0000-0002-1629-482X
                https://orcid.org/0000-0001-7313-5481
                https://orcid.org/0000-0002-7597-6513
                https://orcid.org/0000-0002-6547-5555
                Article
                abk1942
                10.1126/sciadv.abk1942
                9581488
                36260666
                0fb05082-447e-4d08-94c7-d6fe87bc679c
                Copyright © 2022 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY).

                This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 28 June 2021
                : 01 September 2022
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100010663, H2020 European Research Council;
                Award ID: 803239
                Funded by: FundRef http://dx.doi.org/10.13039/501100000332, Royal Society of Edinburgh;
                Award ID: 69938
                Categories
                Review
                Social and Interdisciplinary Sciences
                SciAdv reviews
                Research Methods
                Social Sciences
                Research Methods
                Custom metadata
                Nicole Falcasantos

                Comments

                Comment on this article