Computational prediction of inter-species relationships through omics data analysis and machine learning.
Détails
Télécharger: BMCBion.pdf (1521.71 [Ko])
Etat: Public
Version: Final published version
Etat: Public
Version: Final published version
ID Serval
serval:BIB_41B8FD3269F6
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Computational prediction of inter-species relationships through omics data analysis and machine learning.
Périodique
BMC Bioinformatics
ISSN
1471-2105 (Electronic)
ISSN-L
1471-2105
Statut éditorial
Publié
Date de publication
2018
Peer-reviewed
Oui
Volume
19
Numéro
Suppl 14
Pages
420
Langue
anglais
Résumé
Antibiotic resistance and its rapid dissemination around the world threaten the efficacy of currently-used medical treatments and call for novel, innovative approaches to manage multi-drug resistant infections. Phage therapy, i.e., the use of viruses (phages) to specifically infect and kill bacteria during their life cycle, is one of the most promising alternatives to antibiotics. It is based on the correct matching between a target pathogenic bacteria and the therapeutic phage. Nevertheless, correctly matching them is a major challenge. Currently, there is no systematic method to efficiently predict whether phage-bacterium interactions exist and these pairs must be empirically tested in laboratory. Herein, we present our approach for developing a computational model able to predict whether a given phage-bacterium pair can interact based on their genome.
Based on public data from GenBank and phagesDB.org, we collected more than a thousand positive phage-bacterium interactions with their complete genomes. In addition, we generated putative negative (i.e., non-interacting) pairs. We extracted, from the collected genomes, a set of informative features based on the distribution of predictive protein-protein interactions and on their primary structure (e.g. amino-acid frequency, molecular weight and chemical composition of each protein). With these features, we generated multiple candidate datasets to train our algorithms. On this base, we built predictive models exhibiting predictive performance of around 90% in terms of F1-score, sensitivity, specificity, and accuracy, obtained on the test set with 10-fold cross-validation.
These promising results reinforce the hypothesis that machine learning techniques may produce highly-predictive models accelerating the search of interacting phage-bacteria pairs.
Based on public data from GenBank and phagesDB.org, we collected more than a thousand positive phage-bacterium interactions with their complete genomes. In addition, we generated putative negative (i.e., non-interacting) pairs. We extracted, from the collected genomes, a set of informative features based on the distribution of predictive protein-protein interactions and on their primary structure (e.g. amino-acid frequency, molecular weight and chemical composition of each protein). With these features, we generated multiple candidate datasets to train our algorithms. On this base, we built predictive models exhibiting predictive performance of around 90% in terms of F1-score, sensitivity, specificity, and accuracy, obtained on the test set with 10-fold cross-validation.
These promising results reinforce the hypothesis that machine learning techniques may produce highly-predictive models accelerating the search of interacting phage-bacteria pairs.
Mots-clé
Algorithms, Bacteria/virology, Bacteriophages/genetics, Computational Biology/methods, Data Analysis, Genomics, Machine Learning, Proteins/chemistry, Species Specificity, Health, Machine learning, Phage-therapy, Supervised learning
Pubmed
Web of science
Open Access
Oui
Création de la notice
27/11/2018 9:46
Dernière modification de la notice
20/08/2019 13:42