134
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Orchestrating high-throughput genomic analysis with Bioconductor.

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

          Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages

            It is now recognized that extensive expression heterogeneities among cells precede the emergence of lineages in the early mammalian embryo. To establish a map of pluripotent epiblast (EPI) versus primitive endoderm (PrE) lineage segregation within the inner cell mass (ICM) of the mouse blastocyst, we characterised the gene expression profiles of individual ICM cells. Clustering analysis of the transcriptomes of 66 cells demonstrated that initially they are non-distinguishable. Early in the segregation, lineage-specific marker expression exhibited no apparent correlation, and a hierarchical relationship was established only in the late blastocyst. Fgf4 exhibited a bimodal expression at the earliest stage analysed, and in its absence, the differentiation of PrE and EPI was halted, indicating that Fgf4 drives, and is required for, ICM lineage segregation. These data lead us to propose a model where stochastic cell-to-cell expression heterogeneity followed by signal reinforcement underlies ICM lineage segregation by antagonistically separating equivalent cells.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible

              , (2013)
              The interpretation of count data originating from the current generation of DNA sequencing platforms requires special attention. In particular, the per-sample library sizes often vary by orders of magnitude from the same sequencing run, and the counts are overdispersed relative to a simple Poisson model These challenges can be addressed using an appropriate mixture model that simultaneously accounts for library size differences and biological variability. This approach is already well-characterized and implemented for RNA-Seq data in R packages such as edgeR and DESeq. We use statistical theory, extensive simulations, and empirical data to show that variance stabilizing normalization using a mixture model like the negative binomial is appropriate for microbiome count data. In simulations detecting differential abundance, normalization procedures based on a Gamma-Poisson mixture model provided systematic improvement in performance over crude proportions or rarefied counts -- both of which led to a high rate of false positives. In simulations evaluating clustering accuracy, we found that the rarefying procedure discarded samples that were nevertheless accurately clustered by alternative methods, and that the choice of minimum library size threshold was critical in some settings, but with an optimum that is unknown in practice. Techniques that use variance stabilizing transformations by modeling microbiome count data with a mixture distribution, such as those implemented in edgeR and DESeq, substantially improved upon techniques that attempt to normalize by rarefying or crude proportions. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.
                Bookmark

                Author and article information

                Journal
                Nat. Methods
                Nature methods
                1548-7105
                1548-7091
                Feb 2015
                : 12
                : 2
                Affiliations
                [1 ] European Molecular Biology Laboratory, Heidelberg, Germany.
                [2 ] 1] Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA. [2] Harvard School of Public Health, Boston, Massachusetts, USA.
                [3 ] Genentech, South San Francisco, California, USA.
                [4 ] Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
                [5 ] Department of Medical Genetics, School of Medical Sciences, State University of Campinas, Campinas, Brazil.
                [6 ] Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
                [7 ] Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA.
                [8 ] Department of Biochemistry, University of Cambridge, Cambridge, UK.
                [9 ] Institute for Integrative Genome Biology, University of California, Riverside, Riverside, California, USA.
                [10 ] Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
                [11 ] Novartis Institutes for Biomedical Research, Basel, Switzerland.
                [12 ] 1] McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, USA. [2] Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA.
                [13 ] 1] Harvard School of Public Health, Boston, Massachusetts, USA. [2] Dana-Farber Cancer Institute, Boston, Massachusetts, USA.
                [14 ] Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, Washington, USA.
                [15 ] 1] Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. [2] Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria, Australia.
                [16 ] School of Urban Public Health at Hunter College, City University of New York, New York, New York, USA.
                Article
                nmeth.3252 NIHMS661499
                10.1038/nmeth.3252
                25633503
                70d83ceb-79e6-4e12-9770-abc8afc3529f
                History

                Comments

                Comment on this article