201
views
0
recommends
+1 Recommend
0 collections
    27
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation.

          Results

          With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches.

          Conclusion

          In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: not found

          Natural selection on protein-coding genes in the human genome.

          Comparisons of DNA polymorphism within species to divergence between species enables the discovery of molecular adaptation in evolutionarily constrained genes as well as the differentiation of weak from strong purifying selection. The extent to which weak negative and positive darwinian selection have driven the molecular evolution of different species varies greatly, with some species, such as Drosophila melanogaster, showing strong evidence of pervasive positive selection, and others, such as the selfing weed Arabidopsis thaliana, showing an excess of deleterious variation within local populations. Here we contrast patterns of coding sequence polymorphism identified by direct sequencing of 39 humans for over 11,000 genes to divergence between humans and chimpanzees, and find strong evidence that natural selection has shaped the recent molecular evolution of our species. Our analysis discovered 304 (9.0%) out of 3,377 potentially informative loci showing evidence of rapid amino acid evolution. Furthermore, 813 (13.5%) out of 6,033 potentially informative loci show a paucity of amino acid differences between humans and chimpanzees, indicating weak negative selection and/or balancing selection operating on mutations at these loci. We find that the distribution of negatively and positively selected genes varies greatly among biological processes and molecular functions, and that some classes, such as transcription factors, show an excess of rapidly evolving genes, whereas others, such as cytoskeletal proteins, show an excess of genes with extensive amino acid polymorphism within humans and yet little amino acid divergence between humans and chimpanzees.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana.

            The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and approximately 4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms.

              Although great progress has been made in clarifying deep-level angiosperm relationships, several early nodes in the angiosperm branch of the Tree of Life have proved difficult to resolve. Perhaps the last great question remaining in basal angiosperm phylogeny involves the branching order among the five major clades of mesangiosperms (Ceratophyllum, Chloranthaceae, eudicots, magnoliids, and monocots). Previous analyses have found no consistent support for relationships among these clades. In an effort to resolve these relationships, we performed phylogenetic analyses of 61 plastid genes ( approximately 42,000 bp) for 45 taxa, including members of all major basal angiosperm lineages. We also report the complete plastid genome sequence of Ceratophyllum demersum. Parsimony analyses of combined and partitioned data sets varied in the placement of several taxa, particularly Ceratophyllum, whereas maximum-likelihood (ML) trees were more topologically stable. Total evidence ML analyses recovered a clade of Chloranthaceae + magnoliids as sister to a well supported clade of monocots + (Ceratophyllum + eudicots). ML bootstrap and Bayesian support values for these relationships were generally high, although approximately unbiased topology tests could not reject several alternative topologies. The extremely short branches separating these five lineages imply a rapid diversification estimated to have occurred between 143.8 +/- 4.8 and 140.3 +/- 4.8 Mya.
                Bookmark

                Author and article information

                Journal
                BMC Genomics
                BMC Genomics
                BioMed Central
                1471-2164
                2008
                30 June 2008
                : 9
                : 312
                Affiliations
                [1 ]School of Forest Resources and Conservation, University of Florida, PO Box 110410, Gainesville, USA
                [2 ]Plant Molecular and Cellular Biology, University of Florida, Gainesville, USA
                [3 ]Interdisiplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
                [4 ]University of Florida Genetics Institute, University of Florida, Gainesville, USA
                [5 ]Graduate Program in Genomic Sciences and Biotechnology, Universidade Católica de Brasília, Brasília, Brazil
                [6 ]EMBRAPA Recursos Genéticos e Biotecnologia, Empresa Brasileira de Pesquisa Agropecuária, Brasília, Brazil
                [7 ]Department of Genetics, North Carolina State University, Raleigh, USA
                Article
                1471-2164-9-312
                10.1186/1471-2164-9-312
                2483731
                18590545
                1562d922-1f01-4088-9205-741581b02744
                Copyright © 2008 Novaes et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 29 February 2008
                : 30 June 2008
                Categories
                Research Article

                Genetics
                Genetics

                Comments

                Comment on this article