The application of mathematics to biology is not a novelty and just like the discovery of the microscope created a revolution in biology by revealing otherwise invisible and previously unknown worlds, mathematics could also be regarded as a form of microscope which can be used to visualise highly complex data. Today, when Mass-Spectrometry based proteomic assays are becoming standard to identify microorganisms at the genus and species level in many clinical microbiology laboratories, it is prime time to find solutions which would solve the classification problem. In our lab, we are developing one such solution by replacing numerous unrelated single protein searches with something we call a “proteom fingerprint”. By applying novel mathematics models we can display invisible worlds hidden within forests of data generated by mass spectrometry based proteomics. By diving each protein into n-grams and making a vector of n-gram frequencies we can represent species proteome using term vector model. CUR matrix decomposition, used as a tool for exploratory data analysis, suggests that there are less then 1% of particulary important or influential n-grams in low-dimensional classification problem. Therefore, we propose a proficient dimensionality reduction-based algorithm to identify microorganisms on both species and strain level. Singular Value Decomposition (SVD) often used by Latent Semantic Indexing (LSI) is used to obtain a reduced data representation, therefore solving the sparsity and scalability issues. As a result, significant improvements in computational time are observed together with increased quality of returned results and decreased memory requirements. To show the efficiency of our algorithm, we employed it to bacterial proteomes downloaded from NCBI. The algorithm assigned correctly both strain and species in more than 95% of tested samples. This work was funded by HRZZ (Croatian Science Foundation) research project “Clinical proteomics of microorganisms”.

Content

Author and article information

Conference

Title: ScienceOpen Posters

Publisher: ScienceOpen

Publication date: May 29 2015

Author information

Ena Melvan https://orcid.org/0000-0002-3437-2887

Article

DOI: 10.14293/P2199-8442.1.SOP-LIFE.P94XJX.v1

SO-VID: ec8f8be5-c9bb-4913-954d-571150f7e46b

License:

This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

Conference name: Microbiome R&D and Business Collaboration Forum

History

ScienceOpen disciplines: Mathematical software,Applied mathematics,Evolutionary Biology

Keywords: Data, species, algortihm, mathematics, proteomics, mass spectrometry

Mathematical insight of species

Abstract