6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Unraveling the complex genome of Saccharum spontaneum using Polyploid Gene Assembler

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Polyploid Gene Assembler (PGA), developed and tested in this study, represents a new strategy to perform gene-space assembly from complex genomes using low coverage DNA sequencing. The pipeline integrates reference-assisted loci and de novo assembly strategies to construct high-quality sequences focused on gene content. Pipeline validation was conducted with wheat ( Triticum aestivum), a hexaploid species, using barley ( Hordeum vulgare) as reference, that resulted in the identification of more than 90% of genes and several new genes. Moreover, PGA was used to assemble gene content in Saccharum spontaneum species, a parental lineage for hybrid sugarcane cultivars. Saccharum spontaneum gene sequence obtained was used to reference-guided transcriptome analysis of six different tissues. A total of 39,234 genes were identified, 60.4% clustered into known grass gene families. Thirty-seven gene families were expanded when compared with other grasses, three of them highlighted by the number of gene copies potentially involved in initial development and stress response. In addition, 3,108 promoters (many showing tissue specificity) were identified in this work. In summary, PGA can reconstruct high-quality gene sequences from polyploid genomes, as shown for wheat and S. spontaneum species, and it is more efficient than conventional genome assemblers using low coverage DNA sequencing.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: found
          • Article: not found

          The Sorghum bicolor genome and the diversification of grasses.

          Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            UniRef: comprehensive and non-redundant UniProt reference clusters.

            Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. Supplementary data are available at Bioinformatics online.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification.

              Genes are often characterized dichotomously as either housekeeping or single-tissue specific. We conjectured that crucial functional information resides in genes with midrange profiles of expression. To obtain such novel information genome-wide, we have determined the mRNA expression levels for one of the largest hitherto analyzed set of 62 839 probesets in 12 representative normal human tissues. Indeed, when using a newly defined graded tissue specificity index tau, valued between 0 for housekeeping genes and 1 for tissue-specific genes, genes with midrange profiles having 0.15 50% of all expression patterns. We developed a binary classification, indicating for every gene the I(B) tissues in which it is overly expressed, and the 12-I(B) tissues in which it shows low expression. The 85 dominant midrange patterns with I(B)=2-11 were found to be bimodally distributed, and to contribute most significantly to the definition of tissue specification dendrograms. Our analyses provide a novel route to infer expression profiles for presumed ancestral nodes in the tissue dendrogram. Such definition has uncovered an unsuspected correlation, whereby de novo enhancement and diminution of gene expression go hand in hand. These findings highlight the importance of gene suppression events, with implications to the course of tissue specification in ontogeny and phylogeny. All data and analyses are publically available at the GeneNote website, http://genecards.weizmann.ac.il/genenote/ and, GEO accession GSE803. doron.lancet@weizmann.ac.il Four tables available at the above site.
                Bookmark

                Author and article information

                Journal
                DNA Res
                DNA Res
                dnares
                DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes
                Oxford University Press
                1340-2838
                1756-1663
                June 2019
                14 February 2019
                14 February 2019
                : 26
                : 3
                : 205-216
                Affiliations
                [1 ]Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
                [2 ]Laboratório Central de Tecnologias de Alto Desempenho (LaCTAD), Universidade Estadual de Campinas, Campinas, SP, Brazil
                [3 ]Biocelere Agroindustrial Ltda, GranBio Investimentos S.A., Campinas, SP, Brazil
                [4 ]Laboratório Nacional de Ciência e Tecnologia do Bioetanol (CTBE), Centro Nacional de Pesquisas em Energia e Materiais (CNPEM), Campinas, SP, Brazil
                [5 ]Laboratório de Citogenética e Citometria, Departamento de Biologia Geral, Universidade Federal de Viçosa, Viçosa, MG, Brazil
                [6 ]Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
                Author notes
                To whom correspondence should be addressed. Tel. +55 19 3521 6237. Fax. +55 19 3521 6185. Email: goncalo@ 123456unicamp.br
                Article
                dsz001
                10.1093/dnares/dsz001
                6589550
                30768175
                66ee5535-20d2-47f8-a6b0-579f12f99260
                © The Author(s) 2019. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 22 October 2018
                : 21 January 2019
                Page count
                Pages: 12
                Funding
                Funded by: the Brazilian National Council for Scientific and Technological Development
                Award ID: 350474/2013-3
                Award ID: 35081/2015-1
                Funded by: Sao Paulo Research Foundation
                Funded by: FAPESP 10.13039/501100001807
                Award ID: 2014/09638-0
                Award ID: 2012/05890-1
                Funded by: Center for Computational Engineering and Sciences—FAPESP/Cepid
                Award ID: 2013/08293-7
                Categories
                Full Papers

                Genetics
                sugarcane,genome assembly,transcriptome,gene discovery,new assembler
                Genetics
                sugarcane, genome assembly, transcriptome, gene discovery, new assembler

                Comments

                Comment on this article