25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Mapping complex traits using Random Forests

      research-article
      1 , 2 , , 1 , 3 , 1 , 1 , 1 , 4
      BMC Genetics
      BioMed Central
      Genetic Analysis Workshop 13: Analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors
      November 11–14 2002

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels.

          Related collections

          Most cited references6

          • Record: found
          • Abstract: found
          • Article: not found

          Parametric and nonparametric linkage analysis: a unified multipoint approach.

          In complex disease studies, it is crucial to perform multipoint linkage analysis with many markers and to use robust nonparametric methods that take account of all pedigree information. Currently available methods fall short in both regards. In this paper, we describe how to extract complete multipoint inheritance information from general pedigrees of moderate size. This information is captured in the multipoint inheritance distribution, which provides a framework for a unified approach to both parametric and nonparametric methods of linkage analysis. Specifically, the approach includes the following: (1) Rapid exact computation of multipoint LOD scores involving dozens of highly polymorphic markers, even in the presence of loops and missing data. (2) Non-parametric linkage (NPL) analysis, a powerful new approach to pedigree analysis. We show that NPL is robust to uncertainty about mode of inheritance, is much more powerful than commonly used nonparametric methods, and loses little power relative to parametric linkage analysis. NPL thus appears to be the method of choice for pedigree studies of complex traits. (3) Information-content mapping, which measures the fraction of the total inheritance information extracted by the available marker data and points out the regions in which typing additional markers is most useful. (4) Maximum-likelihood reconstruction of many-marker haplotypes, even in pedigrees with missing data. We have implemented NPL analysis, LOD-score computation, information-content mapping, and haplotype reconstruction in a new computer package, GENEHUNTER. The package allows efficient multipoint analysis of pedigree data to be performed rapidly in a single user-friendly environment.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The investigation of linkage between a quantitative trait and a marker locus.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Genetic Analysis Workshop 13: Simulated longitudinal data on families for a system of oligogenic traits

              The Genetic Analysis Workshop 13 simulated data aimed to mimic the major features of the real Framingham Heart Study data that formed Problem 1, but under a known inheritance model and with 100 replicates, so as to allow evaluation of the statistical properties of various methods. The pedigrees used were the 330 real pedigree structures (comprising 4692 individuals) with some minor changes to protect confidentiality. Fifty trait genes and 399 microsatellite markers were simulated by gene dropping on 22 autosomal chromosomes. Assuming random ascertainment of families, a system of eight longitudinal quantitative traits (designed to be similar to those in the real data) was generated with a wide range of heritabilities, including some pleiotropic and interactive effects. Genes could affect either the baseline level or the rate of change of the phenotype. Hypertension diagnosis and treatment were simulated with treatment availability, compliance, and efficacy depending on calendar year. Nongenetic traits of smoking and alcohol were generated as covariates for other traits. Death was simulated as a hazard rate depending upon age, sex, smoking, cholesterol, and systolic blood pressure. After the complete data were simulated, missing data indicators were generated based on logistic models fitted to the real data, involving the subject's history of previous missing values, together with that of their spouses, parents, siblings, and offspring, as well as marital status, only-child indicators, current value at certain simulated traits, and the data collection pattern on the cohort into which each subject was ascertained.
                Bookmark

                Author and article information

                Conference
                BMC Genet
                BMC Genetics
                BioMed Central (London )
                1471-2156
                2003
                31 December 2003
                : 4
                : Suppl 1
                : S64
                Affiliations
                [1 ]Genome Therapeutics Corporation, Waltham, Massachusetts, 02453, USA
                [2 ]Current address: School of Health Sciences, University of Lethbridge, Lethbridge, Alberta, T1K 3M4, Canada
                [3 ]Current address: Department of Biostatistics, Boston University, Boston, Massachusetts, 02215, USA
                [4 ]Department of Psychiatry, Harvard Medical School, Boston, Massachusetts, 02115, USA
                Article
                1471-2156-4-S1-S64
                10.1186/1471-2156-4-S1-S64
                1866502
                14975132
                2c9eddba-a1b6-4710-9710-f835dc603bd3
                Copyright © 2003 Bureau et al; licensee BioMed Central Ltd

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Genetic Analysis Workshop 13: Analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors
                New Orleans Marriott Hotel, New Orleans, LA, USA
                November 11–14 2002
                History
                Categories
                Proceedings

                Genetics
                Genetics

                Comments

                Comment on this article