CANEapp: a user-friendly application for automated next generation transcriptomic data analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Next generation sequencing (NGS) technologies are indispensable for molecular biology research, but data analysis represents the bottleneck in their application. Users need to be familiar with computer terminal commands, the Linux environment, and various software tools and scripts. Analysis workflows have to be optimized and experimentally validated to extract biologically meaningful data. Moreover, as larger datasets are being generated, their analysis requires use of high-performance servers.

Results

To address these needs, we developed CANEapp (application for Comprehensive automated Analysis of Next-generation sequencing Experiments), a unique suite that combines a Graphical User Interface (GUI) and an automated server-side analysis pipeline that is platform-independent, making it suitable for any server architecture. The GUI runs on a PC or Mac and seamlessly connects to the server to provide full GUI control of RNA-sequencing (RNA-seq) project analysis. The server-side analysis pipeline contains a framework that is implemented on a Linux server through completely automated installation of software components and reference files. Analysis with CANEapp is also fully automated and performs differential gene expression analysis and novel noncoding RNA discovery through alternative workflows (Cuffdiff and R packages edgeR and DESeq2). We compared CANEapp to other similar tools, and it significantly improves on previous developments. We experimentally validated CANEapp’s performance by applying it to data derived from different experimental paradigms and confirming the results with quantitative real-time PCR (qRT-PCR). CANEapp adapts to any server architecture by effectively using available resources and thus handles large amounts of data efficiently. CANEapp performance has been experimentally validated on various biological datasets. CANEapp is available free of charge at http://psychiatry.med.miami.edu/research/laboratory-of-translational-rna-genomics/CANE-app.

Conclusions

We believe that CANEapp will serve both biologists with no computational experience and bioinformaticians as a simple, timesaving but accurate and powerful tool to analyze large RNA-seq datasets and will provide foundations for future development of integrated and automated high-throughput genomics data analysis tools. Due to its inherently standardized pipeline and combination of automated analysis and platform-independence, CANEapp is an ideal for large-scale collaborative RNA-seq projects between different institutions and research groups.

Related collections

Most cited references 19

Record: found
Abstract: found
Article: not found

Next-generation transcriptome assembly.

Jeffrey A Martin, Zhong Wang (2011)

Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalogue of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies - along with some perspectives on transcriptome assembly in the near future.

0 comments Cited 472 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance.

Charles Wang, Binsheng Gong, Pierre R. Bushel … (2014)

The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOAs). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is linearly correlated with treatment effect size (R(2)0.8). Furthermore, the concordance is also affected by transcript abundance and biological complexity of the MOA. RNA-seq outperforms microarray (93% versus 75%) in DEG verification as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making.

0 comments Cited 205 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals

Stefan Washietl, Manolis Kellis, Manuel Garber (2014)

Long intergenic noncoding RNAs (lincRNAs) play diverse regulatory roles in human development and disease, but little is known about their evolutionary history and constraint. Here, we characterize human lincRNA expression patterns in nine tissues across six mammalian species and multiple individuals. Of the 1898 human lincRNAs expressed in these tissues, we find orthologous transcripts for 80% in chimpanzee, 63% in rhesus, 39% in cow, 38% in mouse, and 35% in rat. Mammalian-expressed lincRNAs show remarkably strong conservation of tissue specificity, suggesting that it is selectively maintained. In contrast, abundant splice-site turnover suggests that exact splice sites are not critical. Relative to evolutionarily young lincRNAs, mammalian-expressed lincRNAs show higher primary sequence conservation in their promoters and exons, increased proximity to protein-coding genes enriched for tissue-specific functions, fewer repeat elements, and more frequent single-exon transcripts. Remarkably, we find that ∼20% of human lincRNAs are not expressed beyond chimpanzee and are undetectable even in rhesus. These hominid-specific lincRNAs are more tissue specific, enriched for testis, and faster evolving within the human lineage.

0 comments Cited 178 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Dmitry Velmeshev: dvelmeshev@med.miami.edu

Patrick Lally: p.lally1@umiami.edu

Marco Magistri: mmagistri@med.miami.edu

Mohammad Ali Faghihi: 305-243-7953 , MFaghihi@med.miami.edu

Journal

Journal ID (nlm-ta): BMC Genomics

Journal ID (iso-abbrev): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2164

Publication date (Electronic): 13 January 2016

Publication date PMC-release: 13 January 2016

Publication date Collection: 2016

Volume: 17

Electronic Location Identifier: 49

Affiliations

[ ]Department of Psychiatry, University of Miami Miller School of Medicine, Miami, FL 33136 USA

[ ]Department of Biochemistry & Molecular Biology, University of Miami Miller School of Medicine, Miami, FL 33136 USA

[ ]Department of Biomedical Engineering, University of Miami, Coral Gables, FL 33146 USA

Article

Publisher ID: 2346

DOI: 10.1186/s12864-015-2346-y

PMC ID: 4710974

PubMed ID: 26758513

SO-VID: d99ccebf-cb88-41a2-a5e9-04e3440bb576

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 16 September 2015

Date accepted : 22 December 2015

Funding

Funded by: FundRef http://dx.doi.org/10.13039/100000065, National Institute of Neurological Disorders and Stroke (US);

Award ID: R01NS081208-01A1

Award Recipient : Mohammad Ali Faghihi

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: rna sequencing,user-friendly application,graphical user interface,automated pipeline,platform-independent,differential gene expression,long noncoding rnas

Data availability:

ScienceOpen disciplines: Genetics

Keywords: rna sequencing, user-friendly application, graphical user interface, automated pipeline, platform-independent, differential gene expression, long noncoding rnas

CANEapp: a user-friendly application for automated next generation transcriptomic data analysis

Read this article at

Abstract

Background

Results

Conclusions

Related collections

RNA drug delivery

Most cited references 19

Next-generation transcriptome assembly.

The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance.

Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 105

Cited by 6

Most referenced authors 919