Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples.

Results

We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log ₂units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators.

Conclusions

This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: found

Is Open Access

Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms

Peter 't Hoen, Yavuz Ariyurek, Helene Thygesen … (2008)

The hippocampal expression profiles of wild-type mice and mice transgenic for δC-doublecortin-like kinase were compared with Solexa/Illumina deep sequencing technology and five different microarray platforms. With Illumina's digital gene expression assay, we obtained ∼2.4 million sequence tags per sample, their abundance spanning four orders of magnitude. Results were highly reproducible, even across laboratories. With a dedicated Bayesian model, we found differential expression of 3179 transcripts with an estimated false-discovery rate of 8.5%. This is a much higher figure than found for microarrays. The overlap in differentially expressed transcripts found with deep sequencing and microarrays was most significant for Affymetrix. The changes in expression observed by deep sequencing were larger than observed by microarrays or quantitative PCR. Relevant processes such as calmodulin-dependent protein kinase activity and vesicle transport along microtubules were found affected by deep sequencing but not by microarrays. While undetectable by microarrays, antisense transcription was found for 51% of all genes and alternative polyadenylation for 47%. We conclude that deep sequencing provides a major advance in robustness, comparability and richness of expression profiling data and is expected to boost collaborative, comparative and integrative genomics studies.

0 comments Cited 336 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project.

Tucker A. Patterson, Edward K. Lobenhofer, Stephanie Fulmer-Smentek … (2006)

Microarray-based expression profiling experiments typically use either a one-color or a two-color design to measure mRNA abundance. The validity of each approach has been amply demonstrated. Here we provide a simultaneous comparison of results from one- and two-color labeling designs, using two independent RNA samples from the Microarray Quality Control (MAQC) project, tested on each of three different microarray platforms. The data were evaluated in terms of reproducibility, specificity, sensitivity and accuracy to determine if the two approaches provide comparable results. For each of the three microarray platforms tested, the results show good agreement with high correlation coefficients and high concordance of differentially expressed gene lists within each platform. Cumulatively, these comparisons indicate that data quality is essentially equivalent between the one- and two-color approaches and strongly suggest that this variable need not be a primary factor in decisions regarding experimental microarray design.

0 comments Cited 150 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A variance-stabilizing transformation for gene-expression microarray data.

D Rocke, J Michael Hardin, B Durbin … (2001)

Standard statistical techniques often assume that data are normally distributed, with constant variance not depending on the mean of the data. Data that violate these assumptions can often be brought in line with the assumptions by application of a transformation. Gene-expression microarray data have a complicated error structure, with a variance that changes with the mean in a non-linear fashion. Log transformations, which are often applied to microarray data, can inflate the variance of observations near background. We introduce a transformation that stabilizes the variance of microarray data across the full range of expression. Simulation studies also suggest that this transformation approximately symmetrizes microarray data.

0 comments Cited 85 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2010

Publication date (Electronic): 24 June 2010

Volume: 11

Page: 344

Affiliations

[1 ]Division of infection and Immunity, UCL, London, UK

[2 ]Warwick HRI, University of Warwick, Warwick, UK

[3 ]Windeyer Building, 46 Cleveland St., UCL,W1F 4JT, London, UK

Article

Publisher ID: 1471-2105-11-344

DOI: 10.1186/1471-2105-11-344

PMC ID: 2909218

PubMed ID: 20576120

SO-VID: 34d24a66-dcc5-4130-be69-135894b594f8

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays

Read this article at

Abstract

Background

Results

Conclusions

Related collections

REPO4EU WP2 Databases

Most cited references 10

Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms

Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project.

A variance-stabilizing transformation for gene-expression microarray data.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 38

Cited by 19

Most referenced authors 1,411