Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human.
Détails
Télécharger: BIB_FFE8D8F98971.P001.pdf (412.48 [Ko])
Etat: Public
Version: de l'auteur⸱e
Etat: Public
Version: de l'auteur⸱e
ID Serval
serval:BIB_FFE8D8F98971
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human.
Périodique
Bioinformatics
ISSN
1367-4811 (Electronic)
ISSN-L
1367-4803
Statut éditorial
Publié
Date de publication
2012
Peer-reviewed
Oui
Volume
28
Numéro
14
Pages
1865-1872
Langue
anglais
Notes
Publication types: Journal Article Publication Status: ppublish
Résumé
MOTIVATION: Comparative analyses of gene expression data from different species have become an important component of the study of molecular evolution. Thus methods are needed to estimate evolutionary distances between expression profiles, as well as a neutral reference to estimate selective pressure. Divergence between expression profiles of homologous genes is often calculated with Pearson's or Euclidean distance. Neutral divergence is usually inferred from randomized data. Despite being widely used, neither of these two steps has been well studied. Here, we analyze these methods formally and on real data, highlight their limitations and propose improvements.
RESULTS: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species.
CONTACT: marc.robinson-rechavi@unil.ch; sven.bergmann@unil.ch
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESULTS: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species.
CONTACT: marc.robinson-rechavi@unil.ch; sven.bergmann@unil.ch
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Pubmed
Web of science
Open Access
Oui
Création de la notice
01/05/2012 16:37
Dernière modification de la notice
20/08/2019 16:30