Visualising biological data: a semantic approach to tool and database integration

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

In the biological sciences, the need to analyse vast amounts of information has become commonplace. Such large-scale analyses often involve drawing together data from a variety of different databases, held remotely on the internet or locally on in-house servers. Supporting these tasks are ad hoc collections of data-manipulation tools, scripting languages and visualisation software, which are often combined in arcane ways to create cumbersome systems that have been customised for a particular purpose, and are consequently not readily adaptable to other uses. For many day-to-day bioinformatics tasks, the sizes of current databases, and the scale of the analyses necessary, now demand increasing levels of automation; nevertheless, the unique experience and intuition of human researchers is still required to interpret the end results in any meaningful biological way. Putting humans in the loop requires tools to support real-time interaction with these vast and complex data-sets. Numerous tools do exist for this purpose, but many do not have optimal interfaces, most are effectively isolated from other tools and databases owing to incompatible data formats, and many have limited real-time performance when applied to realistically large data-sets: much of the user's cognitive capacity is therefore focused on controlling the software and manipulating esoteric file formats rather than on performing the research.

Methods

To confront these issues, harnessing expertise in human-computer interaction (HCI), high-performance rendering and distributed systems, and guided by bioinformaticians and end-user biologists, we are building reusable software components that, together, create a toolkit that is both architecturally sound from a computing point of view, and addresses both user and developer requirements. Key to the system's usability is its direct exploitation of semantics, which, crucially, gives individual components knowledge of their own functionality and allows them to interoperate seamlessly, removing many of the existing barriers and bottlenecks from standard bioinformatics tasks.

Results

The toolkit, named Utopia, is freely available from http://utopia.cs.man.ac.uk/.

Related collections

Most cited references 28

Record: found
Abstract: found
Article: not found

A comprehensive set of sequence analysis programs for the VAX.

John Devereux, Paul E. Haeberli, Oliver Smithies (1984)

The University of Wisconsin Genetics Computer Group (UWGCG) has been organized to develop computational tools for the analysis and publication of biological sequence data. A group of programs that will interact with each other has been developed for the Digital Equipment Corporation VAX computer using the VMS operating system. The programs available and the conditions for transfer are described.

0 comments Cited 638 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

SMART, a simple modular architecture research tool: identification of signaling domains.

J. Schultz, F Milpetz, P. Bork … (1998)

Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.

0 comments Cited 612 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The PROSITE database

Nicolas Hulo, Amos Bairoch, Virginie Bulliard … (2006)

The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to a documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. The PROSITE database is now complemented by a series of rules that can give more precise information about specific residues. During the last 2 years, the documentation and the ScanProsite web pages were redesigned to add more functionalities. The latest version of PROSITE (release 19.11 of September 27, 2005) contains 1329 patterns and 552 profile entries. Over the past 2 years more than 200 domains have been added, and now 52% of UniProtKB/Swiss-Prot entries (release 48.1 of September 27, 2005) have a cross-reference to a PROSITE entry. The database is accessible at .

0 comments Cited 263 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Conference

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2009

Publication date (Electronic): 16 June 2009

Volume: 10

Issue: Suppl 6

Page: S19

Affiliations

[1 ]School of Computer Science, University of Manchester, Manchester, M13 9PL, UK

[2 ]School of Chemistry, University of Manchester, Manchester, M13 9PL, UK

[3 ]Faculty of Life Sciences, University of Manchester, Manchester, M13 9PL, UK

Article

Publisher ID: 1471-2105-10-S6-S19

DOI: 10.1186/1471-2105-10-S6-S19

PMC ID: 2697642

PubMed ID: 19534744

SO-VID: 952e659f-6cff-402e-809c-5df395cb607e

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Conference name: European Molecular Biology Network (EMBnet) Conference 2008: 20th Anniversary Celebration

Conference location: Martina Franca, Italy

Conference date: 18–20 September 2008

History

Comments

Comment on this article

scite_

Cited by 10

See all cited by

- Version 1

Visualising biological data: a semantic approach to tool and database integration

Read this article at

Abstract

Motivation

Methods

Results

Related collections

Genetoberfest

Most cited references 28

A comprehensive set of sequence analysis programs for the VAX.

SMART, a simple modular architecture research tool: identification of signaling domains.

The PROSITE database

Author and article information

Conference

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 48

Cited by 10