Benchmarking gene ontology function predictions using negative annotations.

Détails

Ressource 1Télécharger: 32657372_BIB_7AA95FEF6B71.pdf (1784.62 [Ko])
Etat: Public
Version: Final published version
Licence: CC BY-NC 4.0
ID Serval
serval:BIB_7AA95FEF6B71
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Benchmarking gene ontology function predictions using negative annotations.
Périodique
Bioinformatics
Auteur⸱e⸱s
Warwick Vesztrocy A., Dessimoz C.
ISSN
1367-4811 (Electronic)
ISSN-L
1367-4803
Statut éditorial
Publié
Date de publication
01/07/2020
Peer-reviewed
Oui
Volume
36
Numéro
Supplement_1
Pages
i210-i218
Langue
anglais
Notes
Publication types: Journal Article
Publication Status: ppublish
Résumé
With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations.
This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments.
All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not.
Supplementary data are available at Bioinformatics online.
Pubmed
Web of science
Open Access
Oui
Création de la notice
24/07/2020 11:27
Dernière modification de la notice
30/04/2021 7:12
Données d'usage