26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: not found

          Moderated statistical tests for assessing differences in tag abundance.

          Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. An R package can be accessed from http://bioinf.wehi.edu.au/resources/
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            How does gene expression clustering work?

            Clustering is often one of the first steps in gene expression analysis. How do clustering algorithms work, which ones should we use and what can we expect from them?
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Multisubject independent component analysis of fMRI: a decade of intrinsic networks, default mode, and neurodiagnostic discovery.

              Since the discovery of functional connectivity in fMRI data (i.e., temporal correlations between spatially distinct regions of the brain) there has been a considerable amount of work in this field. One important focus has been on the analysis of brain connectivity using the concept of networks instead of regions. Approximately ten years ago, two important research areas grew out of this concept. First, a network proposed to be "a default mode of brain function" since dubbed the default mode network was proposed by Raichle. Secondly, multisubject or group independent component analysis (ICA) provided a data-driven approach to study properties of brain networks, including the default mode network. In this paper, we provide a focused review of how ICA has contributed to the study of intrinsic networks. We discuss some methodological considerations for group ICA and highlight multiple analytic approaches for studying brain networks. We also show examples of some of the differences observed in the default mode and resting networks in the diseased brain. In summary, we are in exciting times and still just beginning to reap the benefits of the richness of functional brain networks as well as available analytic approaches.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Formal analysisRole: MethodologyRole: SoftwareRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: MethodologyRole: Writing – original draft
                Role: ConceptualizationRole: Methodology
                Role: Formal analysisRole: MethodologyRole: SoftwareRole: Writing – original draft
                Role: Formal analysis
                Role: Formal analysis
                Role: Formal analysis
                Role: Investigation
                Role: ConceptualizationRole: MethodologyRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                17 July 2017
                2017
                : 12
                : 7
                : e0181195
                Affiliations
                [1 ] Department of Statistics, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
                [2 ] Department of Animal Science, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
                [3 ] Department of Exact Sciences, Federal University of Lavras, Lavras, Minas Gerais, Brazil
                [4 ] Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
                Tianjin University, CHINA
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0001-5886-9540
                Article
                PONE-D-17-03014
                10.1371/journal.pone.0181195
                5513449
                28715507
                fb4ffc71-05c1-46cb-9e1e-980fb8c3cad4
                © 2017 Nascimento et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 31 January 2017
                : 27 June 2017
                Page count
                Figures: 7, Tables: 0, Pages: 12
                Funding
                Funded by: CAPES
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100005636, Fundação Arthur Bernardes;
                Award Recipient :
                Funded by: FAPEMIG
                Award Recipient :
                The authors thank CAPES, FAPEMIG and FUNARBE for the financial support. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Genetics
                Gene Expression
                Research and Analysis Methods
                Simulation and Modeling
                Physical Sciences
                Mathematics
                Statistics (Mathematics)
                Statistical Data
                Biology and life sciences
                Molecular biology
                Molecular biology techniques
                Sequencing techniques
                RNA sequencing
                Research and analysis methods
                Molecular biology techniques
                Sequencing techniques
                RNA sequencing
                Research and analysis methods
                Chemical synthesis
                Biosynthetic techniques
                Nucleic acid synthesis
                RNA synthesis
                Biology and life sciences
                Biochemistry
                Nucleic acids
                RNA
                RNA synthesis
                Biology and Life Sciences
                Organisms
                Animals
                Vertebrates
                Amniotes
                Mammals
                Swine
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Biology and life sciences
                Molecular biology
                Molecular biology techniques
                Biomolecular isolation
                RNA isolation
                Research and analysis methods
                Molecular biology techniques
                Biomolecular isolation
                RNA isolation
                Custom metadata
                All dataset related to simulation, real data, as well the R software codes are available from the zenodo database (DOI: 10.5281/zenodo.571134).

                Uncategorized
                Uncategorized

                Comments

                Comment on this article