Comprehensive Transcriptome Assembly of Chickpea (Cicer arietinum L.) Using Sanger and Next Generation Sequencing Platforms: Development and Applications

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs) from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinum Transcriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201), comprising 46,369 transcript assembly contigs (TACs) has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8%) of the TACs and gene ontology assignments were determined for 21,471 (46.3%). The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs) and intron spanning regions (ISRs) for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC) of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding applications in chickpea and other related legumes.

Related collections

Most cited references 28

Record: found
Abstract: found
Article: not found

Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).

T Thiel, W Michalek, R. K. Varshney … (2003)

A software tool was developed for the identification of simple sequence repeats (SSRs) in a barley ( Hordeum vulgare L.) EST (expressed sequence tag) database comprising 24,595 sequences. In total, 1,856 SSR-containing sequences were identified. Trimeric SSR repeat motifs appeared to be the most abundant type. A subset of 311 primer pairs flanking SSR loci have been used for screening polymorphisms among six barley cultivars, being parents of three mapping populations. As a result, 76 EST-derived SSR-markers were integrated into a barley genetic consensus map. A correlation between polymorphism and the number of repeats was observed for SSRs built of dimeric up to tetrameric units. 3'-ESTs yielded a higher portion of polymorphic SSRs (64%) than 5'-ESTs did. The estimated PIC (polymorphic information content) value was 0.45 +/- 0.03. Approximately 80% of the SSR-markers amplified DNA fragments in Hordeum bulbosum, followed by rye, wheat (both about 60%) and rice (40%). A subset of 38 EST-derived SSR-markers comprising 114 alleles were used to investigate genetic diversity among 54 barley cultivars. In accordance with a previous, RFLP-based, study, spring and winter cultivars, as well as two- and six-rowed barleys, formed separate clades upon PCoA analysis. The results show that: (1) with the software tool developed, EST databases can be efficiently exploited for the development of cDNA-SSRs, (2) EST-derived SSRs are significantly less polymorphic than those derived from genomic regions, (3) a considerable portion of the developed SSRs can be transferred to related species, and (4) compared to RFLP-markers, cDNA-SSRs yield similar patterns of genetic diversity.

0 comments Cited 884 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

UniRef: comprehensive and non-redundant UniProt reference clusters.

Baris Suzek, Hongzhan Huang, Peter McGarvey … (2007)

Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. Supplementary data are available at Bioinformatics online.

0 comments Cited 575 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Genic microsatellite markers in plants: features and applications.

Rajeev Varshney, Andreas Graner, Mark E. Sorrells (2005)

Expressed sequence tag (EST) projects have generated a vast amount of publicly available sequence data from plant species; these data can be mined for simple sequence repeats (SSRs). These SSRs are useful as molecular markers because their development is inexpensive, they represent transcribed genes and a putative function can often be deduced by a homology search. Because they are derived from transcripts, they are useful for assaying the functional diversity in natural populations or germplasm collections. These markers are valuable because of their higher level of transferability to related species, and they can often be used as anchor markers for comparative mapping and evolutionary studies. They have been developed and mapped in several crop species and could prove useful for marker-assisted selection, especially when the markers reside in the genes responsible for a phenotypic trait. Applications and potential uses of EST-SSRs in plant genetics and breeding are discussed.

0 comments Cited 487 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Lewis Lukens: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2014

Publication date (Electronic): 23 January 2014

Volume: 9

Issue: 1

Electronic Location Identifier: e86039

Affiliations

[1 ]Research Program on Grain Legumes, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Andhra Pradesh, India

[2 ]National Research Council Canada (NRC-CNRC), Saskatoon, Saskatchewan, Canada

[3 ]Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, Canada

[4 ]Department of Agronomy, University of Iowa, Ames, Iowa, United States of America

[5 ]National Center for Genome Resources (NCGR), Santa Fe, New Mexico, United States of America

[6 ]United States Department of Agriculture–Agricultural Research Service (USDA–ARS), Corn Insects and Crop Genetics Research Unit (USDA-ARS-CICGRU), Ames, Iowa, United States of America

[7 ]CGIAR Generation Challenge Programme (GCP), c/o CIMMYT, Mexico DF, Mexico

University of Guelph, Canada

Author notes

* E-mail: r.k.varshney@ 123456cgiar.org

Competing Interests: The authors have declared that no competing interests exist.

Conceived and designed the experiments: RKV AGS BT ADF SC. Performed the experiments: HK SA CC RL. Analyzed the data: HK SA BD CC. Contributed reagents/materials/analysis tools: RKV ADF SC BT AGS. Wrote the paper: RKV HK SA ADF AGS SC.

Article

Publisher ID: PONE-D-13-19774

DOI: 10.1371/journal.pone.0086039

PMC ID: 3900451

PubMed ID: 24465857

SO-VID: 5d21ede8-9b8b-4cf4-ac73-82efef78459e

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 14 May 2013

Date accepted : 3 December 2013

Page count

Pages: 12

Funding

The authors are thankful to the DST-INSPIRE program by Department of Science and Technology, Government of India, Indo-German Science Technology Centre (IGSTC), CGIAR Generation Challenge Programme (GCP) and the Saskatchewan Agriculture Development Fund (ADF) for financial support to undertake part of research presented in this study. This work has been undertaken as part of the CGIAR Research Program on Grain Legumes. ICRISAT is a member of CGIAR Consortium. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Comprehensive Transcriptome Assembly of Chickpea ( Cicer arietinum L.) Using Sanger and Next Generation Sequencing Platforms: Development and Applications

Read this article at

Abstract

Related collections

PLOS Climate

Most cited references 28

Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).

UniRef: comprehensive and non-redundant UniProt reference clusters.

Genic microsatellite markers in plants: features and applications.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 210

Cited by 46

Most referenced authors 1,735