High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation.

Results

With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches.

Conclusion

In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.

Related collections

Most cited references 33

Record: found
Abstract: found
Article: not found

Natural selection on protein-coding genes in the human genome.

Carlos Bustamante, Adi Fledel-Alon, Scott Williamson … (2005)

Comparisons of DNA polymorphism within species to divergence between species enables the discovery of molecular adaptation in evolutionarily constrained genes as well as the differentiation of weak from strong purifying selection. The extent to which weak negative and positive darwinian selection have driven the molecular evolution of different species varies greatly, with some species, such as Drosophila melanogaster, showing strong evidence of pervasive positive selection, and others, such as the selfing weed Arabidopsis thaliana, showing an excess of deleterious variation within local populations. Here we contrast patterns of coding sequence polymorphism identified by direct sequencing of 39 humans for over 11,000 genes to divergence between humans and chimpanzees, and find strong evidence that natural selection has shaped the recent molecular evolution of our species. Our analysis discovered 304 (9.0%) out of 3,377 potentially informative loci showing evidence of rapid amino acid evolution. Furthermore, 813 (13.5%) out of 6,033 potentially informative loci show a paucity of amino acid differences between humans and chimpanzees, indicating weak negative selection and/or balancing selection operating on mutations at these loci. We find that the distribution of negatively and positively selected genes varies greatly among biological processes and molecular functions, and that some classes, such as transcription factors, show an excess of rapidly evolving genes, whereas others, such as cytoskeletal proteins, show an excess of genes with extensive amino acid polymorphism within humans and yet little amino acid divergence between humans and chimpanzees.

0 comments Cited 299 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana.

Richard M Clark, Gabriele Schweikert, Christopher Toomajian … (2007)

The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and approximately 4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.

0 comments Cited 266 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms.

Michael Moore, Charles D Bell, Pamela Soltis … (2007)

Although great progress has been made in clarifying deep-level angiosperm relationships, several early nodes in the angiosperm branch of the Tree of Life have proved difficult to resolve. Perhaps the last great question remaining in basal angiosperm phylogeny involves the branching order among the five major clades of mesangiosperms (Ceratophyllum, Chloranthaceae, eudicots, magnoliids, and monocots). Previous analyses have found no consistent support for relationships among these clades. In an effort to resolve these relationships, we performed phylogenetic analyses of 61 plastid genes ( approximately 42,000 bp) for 45 taxa, including members of all major basal angiosperm lineages. We also report the complete plastid genome sequence of Ceratophyllum demersum. Parsimony analyses of combined and partitioned data sets varied in the placement of several taxa, particularly Ceratophyllum, whereas maximum-likelihood (ML) trees were more topologically stable. Total evidence ML analyses recovered a clade of Chloranthaceae + magnoliids as sister to a well supported clade of monocots + (Ceratophyllum + eudicots). ML bootstrap and Bayesian support values for these relationships were generally high, although approximately unbiased topology tests could not reject several alternative topologies. The extremely short branches separating these five lineages imply a rapid diversification estimated to have occurred between 143.8 +/- 4.8 and 140.3 +/- 4.8 Mya.

0 comments Cited 264 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central

ISSN (Electronic): 1471-2164

Publication date Collection: 2008

Publication date (Electronic): 30 June 2008

Volume: 9

Page: 312

Affiliations

[1 ]School of Forest Resources and Conservation, University of Florida, PO Box 110410, Gainesville, USA

[2 ]Plant Molecular and Cellular Biology, University of Florida, Gainesville, USA

[3 ]Interdisiplinary Center for Biotechnology Research, University of Florida, Gainesville, USA

[4 ]University of Florida Genetics Institute, University of Florida, Gainesville, USA

[5 ]Graduate Program in Genomic Sciences and Biotechnology, Universidade Católica de Brasília, Brasília, Brazil

[6 ]EMBRAPA Recursos Genéticos e Biotecnologia, Empresa Brasileira de Pesquisa Agropecuária, Brasília, Brazil

[7 ]Department of Genetics, North Carolina State University, Raleigh, USA

Article

Publisher ID: 1471-2164-9-312

DOI: 10.1186/1471-2164-9-312

PMC ID: 2483731

PubMed ID: 18590545

SO-VID: 1562d922-1f01-4088-9205-741581b02744

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 29 February 2008

Date accepted : 30 June 2008

Comments

Comment on this article

scite_

Cited by 184

See all cited by

Most referenced authors 1,135

See all reference authors

- Version 1

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genome Integrity

Most cited references 33

Natural selection on protein-coding genes in the human genome.

Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana.

Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 604

Cited by 184

Most referenced authors 1,135