• Record: found
  • Abstract: found
  • Article: found
Is Open Access

Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses

1 , * , 2


Public Library of Science

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      An important problem in systems biology is to reconstruct gene regulatory networks (GRNs) from experimental data and other a priori information. The DREAM project offers some types of experimental data, such as knockout data, knockdown data, time series data, etc. Among them, multifactorial perturbation data are easier and less expensive to obtain than other types of experimental data and are thus more common in practice. In this article, a new algorithm is presented for the inference of GRNs using the DREAM4 multifactorial perturbation data. The GRN inference problem among genes is decomposed into different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. Then these weights are normalized to reflect effort differences of regulating distinct genes. By appropriately choosing the parameters of the power law, we constructe a 0–1 integer programming problem. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. And, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. Furthermore, the high precision of the obtained most reliable predictions shows the suggested algorithm may be helpful in guiding biological experiment designs.

      Related collections

      Most cited references 15

      • Record: found
      • Abstract: found
      • Article: not found

      Modeling and simulation of genetic regulatory systems: a literature review.

       Hidde de Jong (2001)
      In order to understand the functioning of organisms on the molecular level, we need to know which genes are expressed, when and where in the organism, and to which extent. The regulation of gene expression is achieved through genetic regulatory systems structured by networks of interactions between DNA, RNA, proteins, and small molecules. As most genetic regulatory networks of interest involve many components connected through interlocking positive and negative feedback loops, an intuitive understanding of their dynamics is hard to obtain. As a consequence, formal methods and computer tools for the modeling and simulation of genetic regulatory networks will be indispensable. This paper reviews formalisms that have been employed in mathematical biology and bioinformatics to describe genetic regulatory systems, in particular directed graphs, Bayesian networks, Boolean networks and their generalizations, ordinary and partial differential equations, qualitative differential equations, stochastic equations, and rule-based formalisms. In addition, the paper discusses how these formalisms have been used in the simulation of the behavior of actual regulatory systems.
        • Record: found
        • Abstract: found
        • Article: not found

        Scale-free networks in cell biology.

         Reka Albert (2005)
        A cell's behavior is a consequence of the complex interactions between its numerous constituents, such as DNA, RNA, proteins and small molecules. Cells use signaling pathways and regulatory mechanisms to coordinate multiple processes, allowing them to respond to and adapt to an ever-changing environment. The large number of components, the degree of interconnectivity and the complex control of cellular networks are becoming evident in the integrated genomic and proteomic analyses that are emerging. It is increasingly recognized that the understanding of properties that arise from whole-cell function require integrated, theoretical descriptions of the relationships between different cellular components. Recent theoretical advances allow us to describe cellular network structure with graph concepts and have revealed organizational features shared with numerous non-biological networks. We now have the opportunity to describe quantitatively a network of hundreds or thousands of interacting components. Moreover, the observed topologies of cellular networks give us clues about their evolution and how their organization influences their function and dynamic responses.
          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Inferring Regulatory Networks from Expression Data Using Tree-Based Methods

          One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.

            Author and article information

            [1 ]Department of Automation, Tsinghua University, Beijing, China
            [2 ]Department of Automation and Tsinghua National Laboratory for Information Science and Technology(TNList), Tsinghua University, Beijing, China
            CRS4, Italy
            Author notes

            Competing Interests: The authors have declared that no competing interests exist.

            Conceived and designed the experiments: JK TZ. Performed the experiments: JX. Analyzed the data: JX. Wrote the paper: JX TZ.

            Role: Editor
            PLoS One
            PLoS ONE
            PLoS ONE
            Public Library of Science (San Francisco, USA )
            21 September 2012
            : 7
            : 9

            This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

            Pages: 13
            The reported work was financially supported in part by the 973 Program under Grant 2012CB316504 and 2009CB320602 and by the National Natural Science Foundation of China under Grants 61174122, 61021063, 60721003, and 60625305. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
            Research Article
            Computational Biology
            Genome Expression Analysis
            Biological Data Management
            Regulatory Networks
            Signaling Networks
            Gene Networks
            Applied Mathematics



            Comment on this article