15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Challenges and advances for transcriptome assembly in non-model species

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Analyses of high-throughput transcriptome sequences of non-model organisms are based on two main approaches: de novo assembly and genome-guided assembly using mapping to assign reads prior to assembly. Given the limits of mapping reads to a reference when it is highly divergent, as is frequently the case for non-model species, we evaluate whether using blastn would outperform mapping methods for read assignment in such situations (>15% divergence). We demonstrate its high performance by using simulated reads of lengths corresponding to those generated by the most common sequencing platforms, and over a realistic range of genetic divergence (0% to 30% divergence). Here we focus on gene identification and not on resolving the whole set of transcripts (i.e. the complete transcriptome). For simulated datasets, the transcriptome-guided assembly based on blastn recovers 94.8% of genes irrespective of read length at 0% divergence; however, assignment rate of reads is negatively correlated with both increasing divergence level and reducing read lengths. Nevertheless, we still observe 92.6% of recovered genes at 30% divergence irrespective of read length. This analysis also produces a categorization of genes relative to their assignment, and suggests guidelines for data processing prior to analyses of comparative transcriptomics and gene expression to minimize potential inferential bias associated with incorrect transcript assignment. We also compare the performances of de novo assembly alone vs in combination with a transcriptome-guided assembly based on blastn both via simulation and empirically, using data from a cyprinid fish species and from an oak species. For any simulated scenario, the transcriptome-guided assembly using blastn outperforms the de novo approach alone, including when the divergence level is beyond the reach of traditional mapping methods. Combining de novo assembly and a related reference transcriptome for read assignment also addresses the bias/error in contigs caused by the dependence on a related reference alone. Empirical data corroborate these findings when assembling transcriptomes from the two non-model organisms: Parachondrostoma toxostoma (fish) and Quercus pubescens (plant). For the fish species, out of the 31,944 genes known from D. rerio, the guided and de novo assemblies recover respectively 20,605 and 20,032 genes but the performance of the guided assembly approach is much higher for both the contiguity and completeness metrics. For the oak, out of the 29,971 genes known from Vitis vinifera, the transcriptome-guided and de novo assemblies display similar performance, but the new guided approach detects 16,326 genes where the de novo assembly only detects 9,385 genes.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: not found

          Quantitative monitoring of gene expression patterns with a complementary DNA microarray.

          A high-capacity system was developed to monitor the expression of many genes in parallel. Microarrays prepared by high-speed robotic printing of complementary DNAs on glass were used for quantitative expression measurements of the corresponding genes. Because of the small format and high density of the arrays, hybridization volumes of 2 microliters could be used that enabled detection of rare transcripts in probe mixtures derived from 2 micrograms of total cellular messenger RNA. Differential expression measurements of 45 Arabidopsis genes were made by means of simultaneous, two-color fluorescence hybridization.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Next-generation transcriptome assembly.

            Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalogue of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies - along with some perspectives on transcriptome assembly in the near future.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              De novo assembly and analysis of RNA-seq data.

              We describe Trans-ABySS, a de novo short-read transcriptome assembly and analysis pipeline that addresses variation in local read densities by assembling read substrings with varying stringencies and then merging the resulting contigs before analysis. Analyzing 7.4 gigabases of 50-base-pair paired-end Illumina reads from an adult mouse liver poly(A) RNA library, we identified known, new and alternative structures in expressed transcripts, and achieved high sensitivity and specificity relative to reference-based assembly methods.
                Bookmark

                Author and article information

                Contributors
                Role: Formal analysisRole: MethodologyRole: SoftwareRole: Writing – review & editing
                Role: Formal analysisRole: MethodologyRole: ValidationRole: Writing – review & editing
                Role: Data curationRole: InvestigationRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Formal analysisRole: ValidationRole: VisualizationRole: Writing – review & editing
                Role: Data curationRole: ResourcesRole: Writing – review & editing
                Role: Funding acquisitionRole: Resources
                Role: ConceptualizationRole: Formal analysisRole: MethodologyRole: Project administrationRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                20 September 2017
                2017
                : 12
                : 9
                : e0185020
                Affiliations
                [1 ] UMR 7263, Équipe Évolution Génome Environnement, Aix Marseille Université, CNRS, IRD, IMBE, Marseille, France
                [2 ] UMR Centre de Biologie pour la Gestion des Populations, Montpellier SupAgro, Montferrier-sur-Lez, France
                [3 ] ESE, Ecology and Ecosystem Health, INRA, Agrocampus Ouest, Rennes, France
                [4 ] UMR 7263, Équipe Diversité Fonctionnement: des molécules aux écosystèmes, Aix Marseille Université, CNRS, IRD, IMBE, Marseille, France
                Wageningen UR Livestock Research, NETHERLANDS
                Author notes

                Competing Interests: AU was supported by Electricité de France, a commercial company. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

                Author information
                http://orcid.org/0000-0001-9176-4476
                Article
                PONE-D-17-03882
                10.1371/journal.pone.0185020
                5607178
                28931057
                daee69b2-e261-4f36-aa11-b469d43fbded
                © 2017 Ungaro et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 30 January 2017
                : 4 September 2017
                Page count
                Figures: 5, Tables: 0, Pages: 21
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/501100006289, Électricité de France;
                Award Recipient :
                AU was supported by a PhD grant from EDF (Electricité de France). We are grateful to the different departments of Electricité de France for the financial support of the present study: EDF -Recherche et Développement, Clamart especially Dr Mathieu Le Brun and Laurence Tissot, EDF- Unité of Production Méditerranée especially Dr Julie Mosseri and EDF Centre d’Ingénierie Hydraulique Technolac – Chambéry especially Dr Agnès Barillier and Frédéric Jacob. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Transcriptome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Transcriptome Analysis
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Sequence Assembly Tools
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Sequence Assembly Tools
                Research and Analysis Methods
                Experimental Organism Systems
                Model Organisms
                Zebrafish
                Research and Analysis Methods
                Model Organisms
                Zebrafish
                Research and Analysis Methods
                Experimental Organism Systems
                Animal Models
                Zebrafish
                Biology and Life Sciences
                Organisms
                Eukaryota
                Animals
                Vertebrates
                Fish
                Osteichthyes
                Zebrafish
                Research and Analysis Methods
                Database and Informatics Methods
                Biological Databases
                Sequence Databases
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Databases
                Research and Analysis Methods
                Simulation and Modeling
                Biology and Life Sciences
                Agriculture
                Animal Management
                Animal Performance
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genome Annotation
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genome Annotation
                Custom metadata
                All scripts used are freely available at https://github.com/egeeamu/voskhod. All data acquired for this study are available as an SRA archive, for fish sample at https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP091996 (SRX2266500 to SRX2266509) and for plant sample at https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?run=SRR5410765.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article