Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Sesame is an important oil crop, but limited transcriptomic and genomic data are currently available. This information is essential to clarify the fatty acid and lignan biosynthesis molecular mechanism. In addition, a shortage of sesame molecular markers limits the efficiency and accuracy of genetic breeding. High-throughput transcriptomic sequencing is essential to generate a large transcriptome sequence dataset for gene discovery and molecular marker development.

Results

Sesame transcriptomes from five tissues were sequenced using Illumina paired-end sequencing technology. The cleaned raw reads were assembled into a total of 86,222 unigenes with an average length of 629 bp. Of the unigenes, 46,584 (54.03%) had significant similarity with proteins in the NCBI nonredundant protein database and Swiss-Prot database (E-value < 10 ^-5). Of these annotated unigenes, 10,805 and 27,588 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In total, 22,003 (25.52%) unigenes were mapped onto 119 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Furthermore, 44,750 unigenes showed homology to 15,460 Arabidopsis genes based on BLASTx analysis against The Arabidopsis Information Resource (TAIR, Version 10) and revealed relatively high gene coverage. In total, 7,702 unigenes were converted into SSR markers (EST-SSR). Dinucleotide SSRs were the dominant repeat motif (67.07%, 5,166), followed by trinucleotide (24.89%, 1,917), tetranucleotide (4.31%, 332), hexanucleotide (2.62%, 202), and pentanucleotide (1.10%, 85) SSRs. AG/CT (46.29%) was the dominant repeat motif, followed by AC/GT (16.07%), AT/AT (10.53%), AAG/CTT (6.23%), and AGG/CCT (3.39%). Fifty EST-SSRs were randomly selected to validate amplification and to determine the degree of polymorphism in the genomic DNA pools. Forty primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism among 24 sesame accessions.

Conclusions

This study demonstrates that Illumina paired-end sequencing is a fast and cost-effective approach to gene discovery and molecular marker development in non-model organisms. Our results provide a comprehensive sequence resource for sesame research.

Related collections

Most cited references 58

Record: found
Abstract: found
Article: not found

Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).

T Thiel, W Michalek, R. K. Varshney … (2003)

A software tool was developed for the identification of simple sequence repeats (SSRs) in a barley ( Hordeum vulgare L.) EST (expressed sequence tag) database comprising 24,595 sequences. In total, 1,856 SSR-containing sequences were identified. Trimeric SSR repeat motifs appeared to be the most abundant type. A subset of 311 primer pairs flanking SSR loci have been used for screening polymorphisms among six barley cultivars, being parents of three mapping populations. As a result, 76 EST-derived SSR-markers were integrated into a barley genetic consensus map. A correlation between polymorphism and the number of repeats was observed for SSRs built of dimeric up to tetrameric units. 3'-ESTs yielded a higher portion of polymorphic SSRs (64%) than 5'-ESTs did. The estimated PIC (polymorphic information content) value was 0.45 +/- 0.03. Approximately 80% of the SSR-markers amplified DNA fragments in Hordeum bulbosum, followed by rye, wheat (both about 60%) and rice (40%). A subset of 38 EST-derived SSR-markers comprising 114 alleles were used to investigate genetic diversity among 54 barley cultivars. In accordance with a previous, RFLP-based, study, spring and winter cultivars, as well as two- and six-rowed barleys, formed separate clades upon PCoA analysis. The results show that: (1) with the software tool developed, EST databases can be efficiently exploited for the development of cDNA-SSRs, (2) EST-derived SSRs are significantly less polymorphic than those derived from genomic regions, (3) a considerable portion of the developed SSRs can be transferred to related species, and (4) compared to RFLP-markers, cDNA-SSRs yield similar patterns of genetic diversity.

0 comments Cited 884 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Genic microsatellite markers in plants: features and applications.

Rajeev Varshney, Andreas Graner, Mark E. Sorrells (2005)

Expressed sequence tag (EST) projects have generated a vast amount of publicly available sequence data from plant species; these data can be mined for simple sequence repeats (SSRs). These SSRs are useful as molecular markers because their development is inexpensive, they represent transcribed genes and a putative function can often be deduced by a homology search. Because they are derived from transcripts, they are useful for assaying the functional diversity in natural populations or germplasm collections. These markers are valuable because of their higher level of transferability to related species, and they can often be used as anchor markers for comparative mapping and evolutionary studies. They have been developed and mapped in several crop species and could prove useful for marker-assisted selection, especially when the markers reside in the genes responsible for a phenotypic trait. Applications and potential uses of EST-SSRs in plant genetics and breeding are discussed.

0 comments Cited 487 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Microsatellites in different eukaryotic genomes: survey and analysis.

G Tóth, Z Gáspári, J Jurka (2000)

We examined the abundance of microsatellites with repeated unit lengths of 1-6 base pairs in several eukaryotic taxonomic groups: primates, rodents, other mammals, nonmammalian vertebrates, arthropods, Caenorhabditis elegans, plants, yeast, and other fungi. Distribution of simple sequence repeats was compared between exons, introns, and intergenic regions. Tri- and hexanucleotide repeats prevail in protein-coding exons of all taxa, whereas the dependence of repeat abundance on the length of the repeated unit shows a very different pattern as well as taxon-specific variation in intergenic regions and introns. Although it is known that coding and noncoding regions differ significantly in their microsatellite distribution, in addition we could demonstrate characteristic differences between intergenic regions and introns. We observed striking relative abundance of (CCG)(n)*(CGG)(n) trinucleotide repeats in intergenic regions of all vertebrates, in contrast to the almost complete lack of this motif from introns. Taxon-specific variation could also be detected in the frequency distributions of simple sequence motifs. Our results suggest that strand-slippage theories alone are insufficient to explain microsatellite distribution in the genome as a whole. Other possible factors contributing to the observed divergence are discussed.

0 comments Cited 386 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central

ISSN (Electronic): 1471-2164

Publication date Collection: 2011

Publication date (Electronic): 19 September 2011

Volume: 12

Page: 451

Affiliations

[1 ]Key Laboratory of Oil Crops Biology of the Ministry of Agriculture, Sesame Germplasm and Genetic Breeding Laboratory, Oil Crops Research Institute of Chinese Academy of Agricultural Sciences (OCRI-CAAS), Wuhan, 430062, China

Article

Publisher ID: 1471-2164-12-451

DOI: 10.1186/1471-2164-12-451

PMC ID: 3184296

PubMed ID: 21929789

SO-VID: 56339d0e-8c1d-4b58-bdc6-76ded46245b1

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Characterization of the sesame ( Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Genome Engineering using CRISPR

Most cited references 58

Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).

Genic microsatellite markers in plants: features and applications.

Microsatellites in different eukaryotic genomes: survey and analysis.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 101

Cited by 159

Most referenced authors 1,615