HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Human immunodeficiency virus type 1 (HIV-1) infects cells by means of ligand-receptor interactions. This lentivirus uses the CD4 receptor in conjunction with a chemokine coreceptor, either CXCR4 or CCR5, to enter a target cell. HIV-1 is characterized by high sequence variability. Nonetheless, within this extensive variability, certain features must be conserved to define functions and phenotypes. The determination of coreceptor usage of HIV-1, from its protein envelope sequence, falls into a well-studied machine learning problem known as classification. The support vector machine (SVM), with string kernels, has proven to be very efficient for dealing with a wide class of classification problems ranging from text categorization to protein homology detection. In this paper, we investigate how the SVM can predict HIV-1 coreceptor usage when it is equipped with an appropriate string kernel.

Results

Three string kernels were compared. Accuracies of 96.35% (CCR5) 94.80% (CXCR4) and 95.15% (CCR5 and CXCR4) were achieved with the SVM equipped with the distant segments kernel on a test set of 1425 examples with a classifier built on a training set of 1425 examples. Our datasets are built with Los Alamos National Laboratory HIV Databases sequences. A web server is available at http://genome.ulaval.ca/hiv-dskernel.

Conclusion

We examined string kernels that have been used successfully for protein homology detection and propose a new one that we call the distant segments kernel. We also show how to extract the most relevant features for HIV-1 coreceptor usage. The SVM with the distant segments kernel is currently the best method described.

Related collections

Most cited references 27

Record: found
Abstract: not found
Article: not found

Statitical Learning Theory

VN Vapnik, V Vapnik, N.V. Vapnik … (1998)

0 comments Cited 188 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Bioinformatics prediction of HIV coreceptor usage.

Alexander Thielen, Rolf Kaiser, Oliver Sander … (2007)

0 comments Cited 129 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Mismatch string kernels for discriminative protein classification.

Christina Leslie, Jason Weston, Adiel Cohen … (2004)

Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies.

0 comments Cited 122 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Retrovirology

Title: Retrovirology

Publisher: BioMed Central

ISSN (Electronic): 1742-4690

Publication date Collection: 2008

Publication date (Electronic): 4 December 2008

Volume: 5

Page: 110

Affiliations

[1 ]Centre de recherche du centre hospitalier de l'Université Laval, Québec (QC), Canada

[2 ]Département d'informatique et de génie logiciel, Université Laval, Québec (QC), Canada

Article

Publisher ID: 1742-4690-5-110

DOI: 10.1186/1742-4690-5-110

PMC ID: 2637298

PubMed ID: 19055831

SO-VID: 1c772b68-b32c-43bc-a6d7-2bf9dfbe0108

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Annual Reviews HIV/AIDS: Public Health and Society

Most cited references 27

Statitical Learning Theory

Bioinformatics prediction of HIV coreceptor usage.

Mismatch string kernels for discriminative protein classification.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 247

Cited by 10

Most referenced authors 232