25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Gene-gene interaction filtering with ensemble of filters

      research-article
      1 , 2 , 3 , , 1 , 3 , 2 , 1 , 4 ,
      BMC Bioinformatics
      BioMed Central
      The Ninth Asia Pacific Bioinformatics Conference (APBC 2011)
      11–14 January 2011

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step. In this study, we show that two commonly used SNP-SNP interaction filtering algorithms, ReliefF and tuned ReliefF (TuRF), are sensitive to the order of the samples in the dataset, giving rise to unstable and suboptimal results. However, we observe that the ‘unstable’ results from multiple runs of these algorithms can provide valuable information about the dataset. We therefore hypothesize that aggregating results from multiple runs of the algorithm may improve the filtering performance.

          Results

          We propose a simple and effective ensemble approach in which the results from multiple runs of an unstable filter are aggregated based on the general theory of ensemble learning. The ensemble versions of the ReliefF and TuRF algorithms, referred to as ReliefF-E and TuRF-E, are robust to sample order dependency and enable a more informative investigation of data characteristics. Using simulated and real datasets, we demonstrate that both the ensemble of ReliefF and the ensemble of TuRF can generate a much more stable SNP ranking than the original algorithms. Furthermore, the ensemble of TuRF achieved the highest success rate in comparison to many state-of-the-art algorithms as well as traditional χ 2-test and odds ratio methods in terms of retaining gene-gene interactions.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found

          Epistasis--the essential role of gene interactions in the structure and evolution of genetic systems.

          Epistasis, or interactions between genes, has long been recognized as fundamentally important to understanding the structure and function of genetic pathways and the evolutionary dynamics of complex genetic systems. With the advent of high-throughput functional genomics and the emergence of systems approaches to biology, as well as a new-found ability to pursue the genetic basis of evolution down to specific molecular changes, there is a renewed appreciation both for the importance of studying gene interactions and for addressing these questions in a unified, quantitative manner.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions.

            Polymorphisms in human genes are being described in remarkable numbers. Determining which polymorphisms and which environmental factors are associated with common, complex diseases has become a daunting task. This is partly because the effect of any single genetic variation will likely be dependent on other genetic variations (gene-gene interaction or epistasis) and environmental factors (gene-environment interaction). Detecting and characterizing interactions among multiple factors is both a statistical and a computational challenge. To address this problem, we have developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe the MDR approach and an MDR software package. We developed a program that integrates MDR with a cross-validation strategy for estimating the classification and prediction error of multifactor models. The software can be used to analyze interactions among 2-15 genetic and/or environmental factors. The dataset may contain up to 500 total variables and a maximum of 4000 study subjects. Information on obtaining the executable code, example data, example analysis, and documentation is available upon request. All supplementary information can be found at http://phg.mc.vanderbilt.edu/Software/MDR.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Bayesian inference of epistatic interactions in case-control studies.

              Epistatic interactions among multiple genetic variants in the human genome may be important in determining individual susceptibility to common diseases. Although some existing computational methods for identifying genetic interactions have been effective for small-scale studies, we here propose a method, denoted 'bayesian epistasis association mapping' (BEAM), for genome-wide case-control studies. BEAM treats the disease-associated markers and their interactions via a bayesian partitioning model and computes, via Markov chain Monte Carlo, the posterior probability that each marker set is associated with the disease. Testing this on an age-related macular degeneration genome-wide association data set, we demonstrate that the method is significantly more powerful than existing approaches and that genome-wide case-control epistasis mapping with many thousands of markers is both computationally and statistically feasible.
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2011
                15 February 2011
                : 12
                : Suppl 1
                : S10
                Affiliations
                [1 ]School of Information Technologies, University of Sydney, NSW 2006, Australia
                [2 ]School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia
                [3 ]National ICT Australia, Australian Technology Park, Eveleigh, NSW 2015, Australia
                [4 ]Centre for Distributed and High Performance Computing, University of Sydney, NSW 2006, Australia
                Article
                1471-2105-12-S1-S10
                10.1186/1471-2105-12-S1-S10
                3044264
                21342539
                13b5ec31-6bb6-4b08-b5db-73217ba2ac6e
                Copyright ©2011 Yang et al; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                The Ninth Asia Pacific Bioinformatics Conference (APBC 2011)
                Inchon, Korea
                11–14 January 2011
                History
                Categories
                Research

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article