Annotation of the Transcriptome from Taenia pisiformis and Its Comparative Analysis with Three Taeniidae Species

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Taenia pisiformis is one of the most common intestinal tapeworms and can cause infections in canines. Adult T. pisiformis (canines as definitive hosts) and Cysticercus pisiformis (rabbits as intermediate hosts) cause significant health problems to the host and considerable socio-economic losses as a consequence. No complete genomic data regarding T. pisiformis are currently available in public databases. RNA-seq provides an effective approach to analyze the eukaryotic transcriptome to generate large functional gene datasets that can be used for further studies.

Methodology/Principal Findings

In this study, 2.67 million sequencing clean reads and 72,957 unigenes were generated using the RNA-seq technique. Based on a sequence similarity search with known proteins, a total of 26,012 unigenes (no redundancy) were identified after quality control procedures via the alignment of four databases. Overall, 15,920 unigenes were mapped to 203 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Through analyzing the glycolysis/gluconeogenesis and axonal guidance pathways, we achieved an in-depth understanding of the biochemistry of T. pisiformis. Here, we selected four unigenes at random and obtained their full-length cDNA clones using RACE PCR. Functional distribution characteristics were gained through comparing four cestode species (72,957 unigenes of T. pisiformis, 30,700 ESTs of T. solium, 1,058 ESTs of Eg+Em [conserved ESTs between Echinococcus granulosus and Echinococcus multilocularis]), with the cluster of orthologous groups (COG) and gene ontology (GO) functional classification systems. Furthermore, the conserved common genes in these four cestode species were obtained and aligned by the KEGG database.

Conclusion

This study provides an extensive transcriptome dataset obtained from the deep sequencing of T. pisiformis in a non-model whole genome. The identification of conserved genes may provide novel approaches for potential drug targets and vaccinations against cestode infections. Research can now accelerate into the functional genomics, immunity and gene expression profiles of cestode species.

Related collections

Most cited references 51

Record: found
Abstract: found
Article: not found

ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences.

C Iseli, C Jongeneel, P Bucher (1999)

One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model was implemented in an efficient and robust program, ESTScan. We show that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. In the framework of genome sequencing projects, ESTScan could become a very useful tool for gene discovery, for quality control, and for the assembly of contigs representing the coding regions of genes.

0 comments Cited 224 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

Evandro Novaes, Derek Drost, William Farmerie … (2008)

Background Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. Results With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. Conclusion In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.

0 comments Cited 199 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

Cheng-Ying Shi, Hua Yang, Chao-Ling Wei … (2011)

Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.

0 comments Cited 189 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2012

Publication date (Electronic): 13 April 2012

Volume: 7

Issue: 4

Electronic Location Identifier: e32283

Affiliations

[1 ]Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Ya'an, China

[2 ]Department of Chemistry, College of Life and Basic Science, Sichuan Agricultural University, Ya'an, China

New England Biolabs, United States of America

Author notes

* E-mail: guangyou1963@ 123456yahoo.com.cn

Conceived and designed the experiments: GY. Performed the experiments: DY LC XN. Analyzed the data: DY XW YF SW XP. Contributed reagents/materials/analysis tools: XG YX HN. Wrote the paper: DY. Bioinformatics: NY RZ WZ.

Article

Publisher ID: PONE-D-11-20294

DOI: 10.1371/journal.pone.0032283

PMC ID: 3326008

PubMed ID: 22514598

SO-VID: e2e7261c-4d6b-4499-a3e0-49231e40e65e

Copyright © Yang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 3 October 2011

Date accepted : 24 January 2012

Page count

Pages: 11

Comments

Comment on this article

scite_

Cited by 12

See all cited by

Most referenced authors 612

See all reference authors

Annotation of the Transcriptome from Taenia pisiformis and Its Comparative Analysis with Three Taeniidae Species

Read this article at

Abstract

Background

Methodology/Principal Findings

Conclusion

Related collections

PLOS Climate

Most cited references 51

ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences.

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 334

Cited by 12

Most referenced authors 612