Quality Control and Robust Estimation for cDNA Microarrays With Replicates
Details
Serval ID
serval:BIB_A12CD43F89D1
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Quality Control and Robust Estimation for cDNA Microarrays With Replicates
Journal
Journal of the American Statistical Association
ISSN
0162-1459
1537-274X
1537-274X
Publication state
Published
Issued date
03/2006
Volume
101
Number
473
Pages
30-40
Language
english
Abstract
We consider robust estimation of gene intensities from cDNA microarray data with replicates. Several statistical methods for estimating gene intensities from microarrays have been proposed, but little work has been done on robust estimation. This is particularly relevant for experiments with replicates, because even one outlying replicate can have a disastrous effect on the estimated intensity for the gene concerned. Because of the many steps involved in the experimental process from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a Bayesian hierarchical model for robust estimation of cDNA microarray intensities. Outliers are modeled explicitly using a t-distribution, and our model also addresses such classical issues as design effects, normalization, transformation, and nonconstant variance. Parameter estimation is carried out using Markov chain Monte Carlo. By identifying potential outliers, the method provides automatic quality control of replicate, array, and gene measurements. The method is applied to three publicly available gene expression datasets and compared with three other methods: ANOVA-normalized log ratios, the median log ratio, and estimation after the removal of outliers based on Dixon's test. We find that the between-replicate variability of the intensity estimates is lower for our method than for any of the others. We also address the issue of whether the background should be subtracted when estimating intensities. It has been argued that this should not be done because it increases variability, whereas the arguments for doing so are that there is a physical basis for the image background, and that not doing so will bias downward the estimated log ratios of differentially expressed genes. We show that the arguments on both sides of this debate are correct for our data, but that by using our model one can have the best of both worlds: One can subtract the background without increasing variability by much.
Keywords
Statistics, Probability and Uncertainty, Statistics and Probability
Web of science
Create date
28/02/2022 11:45
Last modification date
23/03/2024 7:24