Fast automated cell phenotype image classification

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The genomic revolution has led to rapid growth in sequencing of genes and proteins, and attention is now turning to the function of the encoded proteins. In this respect, microscope imaging of a protein's sub-cellular localisation is proving invaluable, and recent advances in automated fluorescent microscopy allow protein localisations to be imaged in high throughput. Hence there is a need for large scale automated computational techniques to efficiently quantify, distinguish and classify sub-cellular images. While image statistics have proved highly successful in distinguishing localisation, commonly used measures suffer from being relatively slow to compute, and often require cells to be individually selected from experimental images, thus limiting both throughput and the range of potential applications. Here we introduce threshold adjacency statistics, the essence which is to threshold the image and to count the number of above threshold pixels with a given number of above threshold pixels adjacent. These novel measures are shown to distinguish and classify images of distinct sub-cellular localization with high speed and accuracy without image cropping.

Results

Threshold adjacency statistics are applied to classification of protein sub-cellular localization images. They are tested on two image sets (available for download), one for which fluorescently tagged proteins are endogenously expressed in 10 sub-cellular locations, and another for which proteins are transfected into 11 locations. For each image set, a support vector machine was trained and tested. Classification accuracies of 94.4% and 86.6% are obtained on the endogenous and transfected sets, respectively. Threshold adjacency statistics are found to provide comparable or higher accuracy than other commonly used statistics while being an order of magnitude faster to calculate. Further, threshold adjacency statistics in combination with Haralick measures give accuracies of 98.2% and 93.2% on the endogenous and transfected sets, respectively.

Conclusion

Threshold adjacency statistics have the potential to greatly extend the scale and range of applications of image statistics in computational image analysis. They remove the need for cropping of individual cells from images, and are an order of magnitude faster to calculate than other commonly used statistics while providing comparable or better classification accuracy, both essential requirements for application to large-scale approaches.

Related collections

Most cited references 25

Record: found
Abstract: found
Article: found

Is Open Access

Bias in error estimation when using cross-validation for model selection

Sudhir Varma, Richard M Simon (2006)

Background Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. Results We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions. Conclusion We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.

0 comments Cited 392 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Light microscopy techniques for live cell imaging.

David J. Stephens, Victoria Allan (2003)

Since the earliest examination of cellular structures, biologists have been fascinated by observing cells using light microscopy. The advent of fluorescent labeling technologies plus the plethora of sophisticated light microscope techniques now available make studying dynamic processes in living cells almost commonplace. For anyone new to this area, however, it can be daunting to decide which techniques or equipment to try. Here, we aim to give a brief overview of the main approaches to live cell imaging, with some mention of their pros and cons.

0 comments Cited 188 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Invariant image recognition by Zernike moments

A Khotanzad, Y.H. Hong (1990)

0 comments Cited 135 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date Collection: 2007

Publication date (Electronic): 30 March 2007

Volume: 8

Page: 110

Affiliations

[1 ]ARC Centre in Bioinformatics, University of Queensland, Brisbane, Queensland 4072, Australia

[2 ]Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland 4072, Australia

[3 ]Advanced Computational Modelling Centre, University of Queensland, Brisbane, Queensland 4072, Australia

Article

Publisher ID: 1471-2105-8-110

DOI: 10.1186/1471-2105-8-110

PMC ID: 1847687

PubMed ID: 17394669

SO-VID: f961b4fd-c64c-42a5-9145-7a5e8c6c6b82

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Fast automated cell phenotype image classification

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 25

Bias in error estimation when using cross-validation for model selection

Light microscopy techniques for live cell imaging.

Invariant image recognition by Zernike moments

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 99

Cited by 36

Most referenced authors 253