Benchmarking gene ontology function predictions using negative annotations.

Details

Ressource 1Download: 32657372_BIB_7AA95FEF6B71.pdf (1784.62 [Ko])
State: Public
Version: Final published version
License: CC BY-NC 4.0
Serval ID
serval:BIB_7AA95FEF6B71
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Benchmarking gene ontology function predictions using negative annotations.
Journal
Bioinformatics
Author(s)
Warwick Vesztrocy A., Dessimoz C.
ISSN
1367-4811 (Electronic)
ISSN-L
1367-4803
Publication state
Published
Issued date
01/07/2020
Peer-reviewed
Oui
Volume
36
Number
Supplement_1
Pages
i210-i218
Language
english
Notes
Publication types: Journal Article
Publication Status: ppublish
Abstract
With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations.
This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments.
All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not.
Supplementary data are available at Bioinformatics online.
Pubmed
Web of science
Open Access
Yes
Create date
24/07/2020 11:27
Last modification date
30/04/2021 7:12
Usage data