Phylogenetic profiling: how much input data is enough?
Détails
Télécharger: BIB_BCB3501714F5.P001.pdf (1461.61 [Ko])
Etat: Public
Version: Final published version
Etat: Public
Version: Final published version
ID Serval
serval:BIB_BCB3501714F5
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Phylogenetic profiling: how much input data is enough?
Périodique
Plos One
ISSN
1932-6203 (Electronic)
ISSN-L
1932-6203
Statut éditorial
Publié
Date de publication
2015
Peer-reviewed
Oui
Volume
10
Numéro
2
Pages
e0114701
Langue
anglais
Notes
Publication types: Journal Article ; Research Support, Non-U.S. Gov't Publication Status: epublish
Résumé
Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼ 100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors.
Mots-clé
Genetic Variation, Genomics/methods, Molecular Sequence Annotation, Phylogeny
Pubmed
Web of science
Open Access
Oui
Création de la notice
17/02/2016 16:41
Dernière modification de la notice
20/08/2019 15:30