Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge.

Related collections

Most cited references 89

Record: found
Abstract: not found
Article: not found

Basic Local Alignment Search Tool

S Altschul (1990)

0 comments Cited 1465 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Jeremy Goecks, Anton Nekrutenko, James E. Taylor (2010)

Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

0 comments Cited 1420 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

CATH--a hierarchic classification of protein domain structures.

C. A. Orengo, A D Michie, S. Jones … (1997)

Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structure-based classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures. We present a semi-automatic procedure for deriving a novel hierarchical classification of protein domain structures (CATH). The four main levels of our classification are protein class (C), architecture (A), topology (T) and homologous superfamily (H). Class is the simplest level, and it essentially describes the secondary structure composition of each domain. In contrast, architecture summarises the shape revealed by the orientations of the secondary structure units, such as barrels and sandwiches. At the topology level, sequential connectivity is considered, such that members of the same architecture might have quite different topologies. When structures belonging to the same T-level have suitably high similarities combined with similar functions, the proteins are assumed to be evolutionarily related and put into the same homologous superfamily. Analysis of the structural families generated by CATH reveals the prominent features of protein structure space. We find that nearly a third of the homologous superfamilies (H-levels) belong to ten major T-levels, which we call superfolds, and furthermore that nearly two-thirds of these H-levels cluster into nine simple architectures. A database of well-characterised protein structure families, such as CATH, will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures.

0 comments Cited 324 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Biomed Res Int

Journal ID (iso-abbrev): Biomed Res Int

Journal ID (publisher-id): BMRI

Title: BioMed Research International

Publisher: Hindawi Publishing Corporation

ISSN (Print): 2314-6133

ISSN (Electronic): 2314-6141

Publication date (Print): 2014

Publication date (Electronic): 1 September 2014

Volume: 2014

Electronic Location Identifier: 134023

Affiliations

¹Bioinformatics Research Unit, Institute for Biomedical Technologies, National Research Council of Italy, Segrate, 20090 Milan, Italy

²Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Science Department, Universidad Católica San Antonio de Murcia (UCAM), 30107 Murcia, Spain

³Department of Computer Science and Engineering, Center for Research Computing, University of Notre Dame, P.O. Box 539, Notre Dame, IN 46556, USA

⁴Advanced Computing Systems and High Performance Computing Group, Institute of Applied Mathematics and Information Technologies, National Research Council of Italy, 16149 Genoa, Italy

Author notes

*Daniele D'Agostino: dagostino@ 123456ge.imati.cnr.it

Academic Editor: Carlo Cattani

Article

DOI: 10.1155/2014/134023

PMC ID: 4165507

PubMed ID: 25254202

SO-VID: 2ee58f60-f422-49a0-b274-6caf44870af5

License:

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 18 June 2014

Date accepted : 13 August 2014

Comments

Comment on this article

scite_

Cited by 29

See all cited by

Most referenced authors 3,167

See all reference authors

- Version 1

Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives

Read this article at

Abstract

Related collections

Policy Perspectives

Most cited references 89

Basic Local Alignment Search Tool

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

CATH--a hierarchic classification of protein domain structures.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 74

Cited by 29

Most referenced authors 3,167