9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SILGGM: An extensive R package for efficient statistical inference in large-scale gene networks

      research-article
      1 , 1 , * , 2 , 3 , *
      PLoS Computational Biology
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Gene co-expression network analysis is extremely useful in interpreting a complex biological process. The recent droplet-based single-cell technology is able to generate much larger gene expression data routinely with thousands of samples and tens of thousands of genes. To analyze such a large-scale gene-gene network, remarkable progress has been made in rigorous statistical inference of high-dimensional Gaussian graphical model (GGM). These approaches provide a formal confidence interval or a p-value rather than only a single point estimator for conditional dependence of a gene pair and are more desirable for identifying reliable gene networks. To promote their widespread use, we herein introduce an extensive and efficient R package named SILGGM ( Statistical Inference of Large-scale Gaussian Graphical Model) that includes four main approaches in statistical inference of high-dimensional GGM. Unlike the existing tools, SILGGM provides statistically efficient inference on both individual gene pair and whole-scale gene pairs. It has a novel and consistent false discovery rate (FDR) procedure in all four methodologies. Based on the user-friendly design, it provides outputs compatible with multiple platforms for interactive network visualization. Furthermore, comparisons in simulation illustrate that SILGGM can accelerate the existing MATLAB implementation to several orders of magnitudes and further improve the speed of the already very efficient R package FastGGM. Testing results from the simulated data confirm the validity of all the approaches in SILGGM even in a very large-scale setting with the number of variables or genes to a ten thousand level. We have also applied our package to a novel single-cell RNA-seq data set with pan T cells. The results show that the approaches in SILGGM significantly outperform the conventional ones in a biological sense. The package is freely available via CRAN at https://cran.r-project.org/package=SILGGM.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: found
          • Article: not found

          Sparse inverse covariance estimation with the graphical lasso.

          We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A gene-coexpression network for global discovery of conserved genetic modules.

            To elucidate gene function on a global scale, we identified pairs of genes that are coexpressed over 3182 DNA microarrays from humans, flies, worms, and yeast. We found 22,163 such coexpression relationships, each of which has been conserved across evolution. This conservation implies that the coexpression of these gene pairs confers a selective advantage and therefore that these genes are functionally related. Many of these relationships provide strong evidence for the involvement of new genes in core biological functions such as the cell cycle, secretion, and protein expression. We experimentally confirmed the predictions implied by some of these links and identified cell proliferation functions for several genes. By assembling these links into a gene-coexpression network, we found several components that were animal-specific as well as interrelationships between newly evolved and ancient modules.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Emergence of scaling in random networks

              Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Funding acquisitionRole: MethodologyRole: SupervisionRole: Writing – review & editing
                Role: ConceptualizationRole: MethodologyRole: ResourcesRole: SupervisionRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                13 August 2018
                August 2018
                : 14
                : 8
                : e1006369
                Affiliations
                [1 ] Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
                [2 ] Division of Pulmonary Medicine; Department of Pediatrics, Children’s Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
                [3 ] Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania, United States of America
                bioinformatics, GERMANY
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-1163-8187
                http://orcid.org/0000-0001-8096-5248
                http://orcid.org/0000-0001-7196-8703
                Article
                PCOMPBIOL-D-18-00051
                10.1371/journal.pcbi.1006369
                6107288
                30102702
                e69c7859-2980-4a5a-bc86-c9ea3f5f00b0
                © 2018 Zhang et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 11 January 2018
                : 17 July 2018
                Page count
                Figures: 4, Tables: 1, Pages: 14
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: UL1TR001857
                Award Recipient :
                This work was supported in part by the National Institutes of Health ( https://www.nih.gov/) grant (Grant No. UL1TR001857) and the National Science Foundation ( https://nsf.gov/) grant (Grant No. DMS-1812030) to ZR. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Computational Biology
                Gene Regulatory Networks
                Biology and Life Sciences
                Genetics
                Gene Regulatory Networks
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Statistical Inference
                Physical Sciences
                Mathematics
                Statistics (Mathematics)
                Statistical Methods
                Statistical Inference
                Computer and Information Sciences
                Network Analysis
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Test Statistics
                Physical Sciences
                Mathematics
                Statistics (Mathematics)
                Statistical Methods
                Test Statistics
                Biology and Life Sciences
                Genetics
                Gene Expression
                Research and Analysis Methods
                Simulation and Modeling
                Biology and Life Sciences
                Cell Biology
                Cellular Types
                Animal Cells
                Blood Cells
                White Blood Cells
                T Cells
                Biology and Life Sciences
                Cell Biology
                Cellular Types
                Animal Cells
                Immune Cells
                White Blood Cells
                T Cells
                Biology and Life Sciences
                Immunology
                Immune Cells
                White Blood Cells
                T Cells
                Medicine and Health Sciences
                Immunology
                Immune Cells
                White Blood Cells
                T Cells
                Biology and Life Sciences
                Genetics
                Gene Identification and Analysis
                Genetic Networks
                Computer and Information Sciences
                Network Analysis
                Genetic Networks
                Custom metadata
                vor-update-to-uncorrected-proof
                2018-08-23
                All relevant data are within the paper and its Supporting Information files. The original microarray asthma data are available from the EMBL-EBI ArrayExpress database (accession number E-MTAB-1425). The original single-cell RNA-seq data with pan T cells are publicly available at https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/t_3k.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article