Blog
About

  • Record: found
  • Abstract: found
  • Poster: not found
Is Open Access

Mathematical insight of species

ScienceOpen Posters

ScienceOpen

This work has been published open access under Creative Commons Attribution License CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

Data, species, algortihm, mathematics, proteomics, mass spectrometry

Read this article at

ScienceOpen
Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      The application of mathematics to biology is not a novelty and just like the discovery of the microscope created a revolution in biology by revealing otherwise invisible and previously unknown worlds, mathematics could also be regarded as a form of microscope which can be used to visualise highly complex data. Today, when Mass-Spectrometry based proteomic assays are becoming standard to identify microorganisms at the genus and species level in many clinical microbiology laboratories, it is prime time to find solutions which would solve the classification problem. In our lab, we are developing one such solution by replacing numerous unrelated single protein searches with something we call a “proteom fingerprint”. By applying novel mathematics models we can display invisible worlds hidden within forests of data generated by mass spectrometry based proteomics. By diving each protein into n-grams and making a vector of n-gram frequencies we can represent species proteome using term vector model. CUR matrix decomposition, used as a tool for exploratory data analysis, suggests that there are less then 1% of particulary important or influential n-grams in low-dimensional classification problem. Therefore, we propose a proficient dimensionality reduction-based algorithm to identify microorganisms on both species and strain level. Singular Value Decomposition (SVD) often used by Latent Semantic Indexing (LSI) is used to obtain a reduced data representation, therefore solving the sparsity and scalability issues. As a result, significant improvements in computational time are observed together with increased quality of returned results and decreased memory requirements. To show the efficiency of our algorithm, we employed it to bacterial proteomes downloaded from NCBI. The algorithm assigned correctly both strain and species in more than 95% of tested samples. This work was funded by HRZZ (Croatian Science Foundation) research project “Clinical proteomics of microorganisms”.

      Related collections

      Author and article information

      Journal
      10.14293/P2199-8442.1.SOP-LIFE.P94XJX.v1
      ScienceOpen disciplines:
      Keywords:

      Comments

      Comment on this article