Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human.
Details
Download: BIB_FFE8D8F98971.P001.pdf (412.48 [Ko])
State: Public
Version: author
State: Public
Version: author
Serval ID
serval:BIB_FFE8D8F98971
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human.
Journal
Bioinformatics
ISSN
1367-4811 (Electronic)
ISSN-L
1367-4803
Publication state
Published
Issued date
2012
Peer-reviewed
Oui
Volume
28
Number
14
Pages
1865-1872
Language
english
Notes
Publication types: Journal Article Publication Status: ppublish
Abstract
MOTIVATION: Comparative analyses of gene expression data from different species have become an important component of the study of molecular evolution. Thus methods are needed to estimate evolutionary distances between expression profiles, as well as a neutral reference to estimate selective pressure. Divergence between expression profiles of homologous genes is often calculated with Pearson's or Euclidean distance. Neutral divergence is usually inferred from randomized data. Despite being widely used, neither of these two steps has been well studied. Here, we analyze these methods formally and on real data, highlight their limitations and propose improvements.
RESULTS: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species.
CONTACT: marc.robinson-rechavi@unil.ch; sven.bergmann@unil.ch
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESULTS: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species.
CONTACT: marc.robinson-rechavi@unil.ch; sven.bergmann@unil.ch
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Pubmed
Web of science
Open Access
Yes
Create date
01/05/2012 16:37
Last modification date
20/08/2019 16:30