The non-negative matrix factorization toolbox for biological data mining

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Non-negative matrix factorization (NMF) has been introduced as an important method for mining biological data. Though there currently exists packages implemented in R and other programming languages, they either provide only a few optimization algorithms or focus on a specific application field. There does not exist a complete NMF package for the bioinformatics community, and in order to perform various data mining tasks on biological data.

Results

We provide a convenient MATLAB toolbox containing both the implementations of various NMF techniques and a variety of NMF-based data mining approaches for analyzing biological data. Data mining approaches implemented within the toolbox include data clustering and bi-clustering, feature extraction and selection, sample classification, missing values imputation, data visualization, and statistical comparison.

Conclusions

A series of analysis such as molecular pattern discovery, biological process identification, dimension reduction, disease prediction, visualization, and statistical comparison can be performed using this toolbox.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: not found

Missing value estimation methods for DNA microarrays.

Annette Hastie, Allison Altman, John P. Brown … (2001)

Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.

0 comments Cited 311 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Biclustering algorithms for biological data analysis: a survey.

Sara Madeira, Arlindo L Oliveira (2006)

A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.

0 comments Cited 253 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method

Hyunsoo Kim, Haesun Park (2008)

0 comments Cited 81 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Yifeng Li

Alioune Ngom

Journal

Journal ID (nlm-ta): Source Code Biol Med

Journal ID (iso-abbrev): Source Code Biol Med

Title: Source Code for Biology and Medicine

Publisher: BioMed Central

ISSN (Electronic): 1751-0473

Publication date Collection: 2013

Publication date (Electronic): 16 April 2013

Volume: 8

Page: 10

Affiliations

[1 ]School of Computer Science, University of Windsor, Windsor, Ontario, Canada

Article

Publisher ID: 1751-0473-8-10

DOI: 10.1186/1751-0473-8-10

PMC ID: 3736608

PubMed ID: 23591137

SO-VID: 697007ea-5be2-4467-90ad-41aa648cc74f

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The non-negative matrix factorization toolbox for biological data mining

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Genetoberfest

Most cited references 10

Missing value estimation methods for DNA microarrays.

Biclustering algorithms for biological data analysis: a survey.

Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 187

Cited by 41

Most referenced authors 631