Improved analysis of bacterial CGH data beyond the log-ratio paradigm

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Existing methods for analyzing bacterial CGH data from two-color arrays are based on log-ratios only, a paradigm inherited from expression studies. We propose an alternative approach, where microarray signals are used in a different way and sequence identity is predicted using a supervised learning approach.

Results

A data set containing 32 hybridizations of sequenced versus sequenced genomes have been used to test and compare methods. A ROC-analysis has been performed to illustrate the ability to rank probes with respect to Present/Absent calls. Classification into Present and Absent is compared with that of a gaussian mixture model.

Conclusion

The results indicate our proposed method is an improvement of existing methods with respect to ranking and classification of probes, especially for multi-genome arrays.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

J A Hanley, B J McNeil, Marnix van Holsbeeck (1982)

A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.

0 comments Cited 3865 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Normalization of cDNA microarray data.

Gordon K. Smyth, Terry Speed (2003)

Normalization means to adjust microarray data for effects which arise from variation in the technology rather than from biological differences between the RNA samples or between the printed probes. This paper describes normalization methods based on the fact that dye balance typically varies with spot intensity and with spatial position on the array. Print-tip loess normalization provides a well-tested general purpose normalization method which has given good results on a wide range of arrays. The method may be refined by using quality weights for individual spots. The method is best combined with diagnostic plots of the data which display the spatial and intensity trends. When diagnostic plots show that biases still remain in the data after normalization, further normalization steps such as plate-order normalization or scale-normalization between the arrays may be undertaken. Composite normalization may be used when control spots are available which are known to be not differentially expressed. Variations on loess normalization include global loess normalization and two-dimensional normalization. Detailed commands are given to implement the normalization techniques using freely available software.

0 comments Cited 395 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Microarrays reveal that each of the ten dominant lineages of Staphylococcus aureus has a unique combination of surface-associated and regulatory genes.

Jodi A. Lindsay, Catrin E. Moore, Nicholas Day … (2006)

Staphylococcus aureus is the most common cause of hospital-acquired infection. In healthy hosts outside of the health care setting, S. aureus is a frequent colonizer of the human nose but rarely causes severe invasive infection such as bacteremia, endocarditis, or osteomyelitis. To identify genes associated with community-acquired invasive isolates, regions of genomic variability, and the S. aureus population structure, we compared 61 community-acquired invasive isolates of S. aureus and 100 nasal carriage isolates from healthy donors using a microarray spotted with PCR products representing every gene from the seven S. aureus sequencing projects. The core genes common to all strains were identified, and 10 dominant lineages of S. aureus were clearly discriminated. Each lineage carried a unique combination of hundreds of "core variable" (CV) genes scattered throughout the chromosome, suggesting a common ancestor but early evolutionary divergence. Many CV genes are regulators of virulence genes or known or predicted to be expressed on the bacterial surface and to interact with the host during nasal colonization and infection. Within each lineage, isolates showed substantial variation in the carriage of mobile genetic elements and their associated virulence and resistance genes, indicating frequent horizontal transfer. However, we were unable to identify any association between lineage or gene and invasive isolates. We suggest that the S. aureus gene combinations necessary for invasive disease may also be necessary for nasal colonization and that community-acquired invasive disease is strongly dependent on host factors.

0 comments Cited 97 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2009

Publication date (Electronic): 19 March 2009

Volume: 10

Page: 91

Affiliations

[1 ]Biostatistics, Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, Ås, Norway

[2 ]Laboratory of Microbial Gene Technology, Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, Ås, Norway

Article

Publisher ID: 1471-2105-10-91

DOI: 10.1186/1471-2105-10-91

PMC ID: 2679023

PubMed ID: 19298668

SO-VID: 516d4159-784e-4611-bfb4-25a93362ae7c

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Improved analysis of bacterial CGH data beyond the log-ratio paradigm

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Microbial Genomics

Most cited references 17

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Normalization of cDNA microarray data.

Microarrays reveal that each of the ten dominant lineages of Staphylococcus aureus has a unique combination of surface-associated and regulatory genes.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 257

Cited by 5

Most referenced authors 272