Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

Related collections

Most cited references 33

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15237 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

MRBAYES: Bayesian inference of phylogenetic trees.

J P Huelsenbeck, F Ronquist (2001)

The program MRBAYES performs Bayesian inference of phylogeny using a variant of Markov chain Monte Carlo. MRBAYES, including the source code, documentation, sample data files, and an executable, is available at http://brahms.biology.rochester.edu/software.html.

0 comments Cited 2050 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The protein kinase complement of the human genome.

G. Manning (2002)

We have catalogued the protein kinase complement of the human genome (the "kinome") using public and proprietary genomic, complementary DNA, and expressed sequence tag (EST) sequences. This provides a starting point for comprehensive analysis of protein phosphorylation in normal and disease states, as well as a detailed view of the current state of human genome analysis through a focus on one large gene family. We identify 518 putative protein kinase genes, of which 71 have not previously been reported or described as kinases, and we extend or correct the protein sequences of 56 more kinases. New genes include members of well-studied families as well as previously unidentified families, some of which are conserved in model organisms. Classification and comparison with model organism kinomes identified orthologous groups and highlighted expansions specific to human and other lineages. We also identified 106 protein kinase pseudogenes. Chromosomal mapping revealed several small clusters of kinase genes and revealed that 244 kinases map to disease loci or cancer amplicons.

0 comments Cited 1871 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2009

Publication date (Electronic): 3 February 2009

Volume: 4

Issue: 2

Electronic Location Identifier: e4345

Affiliations

[1 ]Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America

[2 ]Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California, United States of America

[3 ]Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America

[4 ]Department of Biopharmaceutical Sciences, University of California San Francisco, San Francisco, California, United States of America

Georgia Institute of Technology, United States of America

Author notes

* E-mail: babbitt@ 123456cgl.ucsf.edu

Conceived and designed the experiments: HJA PCB. Performed the experiments: HJA. Analyzed the data: HJA. Contributed reagents/materials/analysis tools: JHM TF. Wrote the paper: HJA PCB.

Article

Publisher ID: 08-PONE-RA-06316R1

DOI: 10.1371/journal.pone.0004345

PMC ID: 2631154

PubMed ID: 19190775

SO-VID: 46b94a38-6434-4216-87fa-1bf046dc6045

Copyright © This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.

History

Date received : 10 September 2008

Date accepted : 10 December 2008

Page count

Pages: 14

Comments

Comment on this article

scite_

Cited by 153

See all cited by

Most referenced authors 2,914

See all reference authors

Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

Read this article at

Abstract

Related collections

PLOS Climate

Most cited references 33

Gene Ontology: tool for the unification of biology

MRBAYES: Bayesian inference of phylogenetic trees.

The protein kinase complement of the human genome.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 130

Cited by 153

Most referenced authors 2,914