Detection of divergent genes in microbial aCGH experiments

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Array-based comparative genome hybridization (aCGH) is a tool for rapid comparison of genomes from different bacterial strains. The purpose of such analysis is to detect highly divergent or absent genes in a sample strain compared to an index strain. Development of methods for analyzing aCGH data has primarily focused on copy number abberations in cancer research. In microbial aCGH analyses, genes are typically ranked by log-ratios, and classification into divergent or present is done by choosing a cutoff log-ratio, either manually or by statistics calculated from the log-ratio distribution. As experimental settings vary considerably, it is not possible to develop a classical discriminant or statistical learning approach.

Methods

We introduce a more efficient method for analyzing microbial aCGH data using a finite mixture model and a data rotation scheme. Using the average posterior probabilities from the model fitted to log-ratios before and after rotation, we get a score for each gene, and demonstrate its advantages for ranking and detecting divergent genes with enlarged specificity and sensitivity.

Results

The procedure is tested and compared to other approaches on simulated data sets, as well as on four experimental validation data sets for aCGH analysis on fully sequenced strains of Staphylococcus aureus and Streptococcus pneumoniae.

Conclusion

When tested on simulated data as well as on four different experimental validation data sets from experiments with only fully sequenced strains, our procedure out-competes the standard procedures of using a simple log-ratio cutoff for classification into present and divergent genes.

Related collections

Most cited references 20

Record: found
Abstract: found
Article: not found

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

J A Hanley, B J McNeil, Marnix van Holsbeeck (1982)

A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.

0 comments Cited 3994 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

In silico prediction of protein-protein interactions in human macrophages

Oussema Souiai, Fatma Zahra Guerfali, Slimane Miled … (2015)

Background: Protein-protein interaction (PPI) network analyses are highly valuable in deciphering and understanding the intricate organisation of cellular functions. Nevertheless, the majority of available protein-protein interaction networks are context-less, i.e. without any reference to the spatial, temporal or physiological conditions in which the interactions may occur. In this work, we are proposing a protocol to infer the most likely protein-protein interaction (PPI) network in human macrophages. Results: We integrated the PPI dataset from the Agile Protein Interaction DataAnalyzer (APID) with different meta-data to infer a contextualized macrophage-specific interactome using a combination of statistical methods. The obtained interactome is enriched in experimentally verified interactions and in proteins involved in macrophage-related biological processes (i.e. immune response activation, regulation of apoptosis). As a case study, we used the contextualized interactome to highlight the cellular processes induced upon Mycobacterium tuberculosis infection. Conclusion: Our work confirms that contextualizing interactomes improves the biological significance of bioinformatic analyses. More specifically, studying such inferred network rather than focusing at the gene expression level only, is informative on the processes involved in the host response. Indeed, important immune features such as apoptosis are solely highlighted when the spotlight is on the protein interaction level.

0 comments Cited 1278 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: not found

High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays.

D. Pinkel, R. Segraves, D. Sudar … (1998)

Gene dosage variations occur in many diseases. In cancer, deletions and copy number increases contribute to alterations in the expression of tumour-suppressor genes and oncogenes, respectively. Developmental abnormalities, such as Down, Prader Willi, Angelman and Cri du Chat syndromes, result from gain or loss of one copy of a chromosome or chromosomal region. Thus, detection and mapping of copy number abnormalities provide an approach for associating aberrations with disease phenotype and for localizing critical genes. Comparative genomic hybridization (CGH) was developed for genome-wide analysis of DNA sequence copy number in a single experiment. In CGH, differentially labelled total genomic DNA from a 'test' and a 'reference' cell population are cohybridized to normal metaphase chromosomes, using blocking DNA to suppress signals from repetitive sequences. The resulting ratio of the fluorescence intensities at a location on the 'cytogenetic map', provided by the chromosomes, is approximately proportional to the ratio of the copy numbers of the corresponding DNA sequences in the test and reference genomes. CGH has been broadly applied to human and mouse malignancies. The use of metaphase chromosomes, however, limits detection of events involving small regions (of less than 20 Mb) of the genome, resolution of closely spaced aberrations and linking ratio changes to genomic/genetic markers. Therefore, more laborious locus-by-locus techniques have been required for higher resolution studies. Hybridization to an array of mapped sequences instead of metaphase chromosomes could overcome the limitations of conventional CGH (ref. 6) if adequate performance could be achieved. Copy number would be related to the test/reference fluorescence ratio on the array targets, and genomic resolution could be determined by the map distance between the targets, or by the length of the cloned DNA segments. We describe here our implementation of array CGH. We demonstrate its ability to measure copy number with high precision in the human genome, and to analyse clinical specimens by obtaining new information on chromosome 20 aberrations in breast cancer.

0 comments Cited 342 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date Collection: 2006

Publication date (Electronic): 30 March 2006

Volume: 7

Page: 181

Affiliations

[1 ]Biostatistics, Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway

[2 ]Department of Biology and Biochemistry/Bioinformatics, University of Potsdam, Germany

[3 ]Microbial Gene Technology, Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, Ås, Norway

[4 ]Institute of Medical Biometry and Statistics, University at Lübeck, Germany

Article

Publisher ID: 1471-2105-7-181

DOI: 10.1186/1471-2105-7-181

PMC ID: 1563484

PubMed ID: 16573812

SO-VID: f81bd412-ff46-4400-9933-8c0d886573c6

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Detection of divergent genes in microbial aCGH experiments

Read this article at

Abstract

Background

Methods

Results

Conclusion

Related collections

Microbial Genomics

Most cited references 20

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

In silico prediction of protein-protein interactions in human macrophages

High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 160

Cited by 3

Most referenced authors 495