Advanced Applications of RNA Sequencing and Challenges

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Next-generation sequencing technologies have revolutionarily advanced sequence-based research with the advantages of high-throughput, high-sensitivity, and high-speed. RNA-seq is now being used widely for uncovering multiple facets of transcriptome to facilitate the biological applications. However, the large-scale data analyses associated with RNA-seq harbors challenges. In this study, we present a detailed overview of the applications of this technology and the challenges that need to be addressed, including data preprocessing, differential gene expression analysis, alternative splicing analysis, variants detection and allele-specific expression, pathway analysis, co-expression network analysis, and applications combining various experimental procedures beyond the achievements that have been made. Specifically, we discuss essential principles of computational methods that are required to meet the key challenges of the RNA-seq data analyses, development of various bioinformatics tools, challenges associated with the RNA-seq applications, and examples that represent the advances made so far in the characterization of the transcriptome.

Related collections

Most cited references 105

Record: found
Abstract: found
Article: not found

FLASH: fast length adjustment of short reads to improve genome assemblies.

T. Magoc, S. L. Salzberg (2013)

Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. t.magoc@gmail.com.

0 comments Cited 5174 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Glynn Dennis, Brad T. Sherman, Douglas A Hosack … (2003)

Functional annotation of differentially expressed genes is a necessary and critical step in the analysis of microarray data. The distributed nature of biological knowledge frequently requires researchers to navigate through numerous web-accessible databases gathering information one gene at a time. A more judicious approach is to provide query-based access to an integrated database that disseminates biologically rich information across large datasets and displays graphic summaries of functional information. Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://www.david.niaid.nih.gov) addresses this need via four web-based analysis modules: 1) Annotation Tool - rapidly appends descriptive data from several public databases to lists of genes; 2) GoCharts - assigns genes to Gene Ontology functional categories based on user selected classifications and term specificity level; 3) KeggCharts - assigns genes to KEGG metabolic processes and enables users to view genes in the context of biochemical pathway maps; and 4) DomainCharts - groups genes according to PFAM conserved protein domains. Analysis results and graphical displays remain dynamically linked to primary data and external data repositories, thereby furnishing in-depth as well as broad-based data coverage. The functionality provided by DAVID accelerates the analysis of genome-scale datasets by facilitating the transition from data collection to biological meaning.

0 comments Cited 1427 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

In silico prediction of protein-protein interactions in human macrophages

Oussema Souiai, Fatma Zahra Guerfali, Slimane Miled … (2015)

Background: Protein-protein interaction (PPI) network analyses are highly valuable in deciphering and understanding the intricate organisation of cellular functions. Nevertheless, the majority of available protein-protein interaction networks are context-less, i.e. without any reference to the spatial, temporal or physiological conditions in which the interactions may occur. In this work, we are proposing a protocol to infer the most likely protein-protein interaction (PPI) network in human macrophages. Results: We integrated the PPI dataset from the Agile Protein Interaction DataAnalyzer (APID) with different meta-data to infer a contextualized macrophage-specific interactome using a combination of statistical methods. The obtained interactome is enriched in experimentally verified interactions and in proteins involved in macrophage-related biological processes (i.e. immune response activation, regulation of apoptosis). As a case study, we used the contextualized interactome to highlight the cellular processes induced upon Mycobacterium tuberculosis infection. Conclusion: Our work confirms that contextualizing interactomes improves the biological significance of bioinformatic analyses. More specifically, studying such inferred network rather than focusing at the gene expression level only, is informative on the processes involved in the host response. Indeed, important immune features such as apoptosis are solely highlighted when the spotlight is on the protein interaction level.

0 comments Cited 1278 times – based on 0 reviews

Preprint

     Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Bioinform Biol Insights

Journal ID (iso-abbrev): Bioinform Biol Insights

Journal ID (publisher-id): Bioinformatics and Biology Insights

Title: Bioinformatics and Biology Insights

Publisher: Libertas Academica

ISSN (Electronic): 1177-9322

Publication date Collection: 2015

Publication date (Electronic): 15 November 2015

Volume: 9

Issue: Suppl 1

Pages: 29-46

Affiliations

[1 ]Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, USA.

[2 ]Bioinformatics and Systems Biology Core, National Heart Lung Blood Institute, National Institutes of Health, Rockville Pike, Bethesda, MD, USA.

[3 ]Leidos Biomedical Research, Inc., Basic Science Program, Frederick National Laboratory, Frederick, MD, USA.

[4 ]Department of Medicine, University of California, San Diego, La Jolla, CA, USA.

[5 ]Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.

Author notes

CORRESPONDENCE: yi-xing.han@ 123456nih.gov

Article

Publisher ID: bbi-suppl.1-2015-029

DOI: 10.4137/BBI.S28991

PMC ID: 4648566

PubMed ID: 26609224

SO-VID: fd095dd2-5a17-46dc-8716-a6263ab2330c

License:

This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.

History

Date received : 16 July 2015

Date revision received : 30 September 2015

Date accepted : 02 October 2015

Comments

Comment on this article

scite_

Cited by 88

See all cited by

- Version 1

Advanced Applications of RNA Sequencing and Challenges

Read this article at

Abstract

Related collections

Network Medicine

Most cited references 105

FLASH: fast length adjustment of short reads to improve genome assemblies.

DAVID: Database for Annotation, Visualization, and Integrated Discovery.

In silico prediction of protein-protein interactions in human macrophages

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 206

Cited by 88