Enabling semantic queries across federated bioinformatics databases.
Détails
Télécharger: 31697362_BIB_ED991CD0F569.pdf (25431.89 [Ko])
Etat: Public
Version: de l'auteur⸱e
Licence: CC BY 4.0
Etat: Public
Version: de l'auteur⸱e
Licence: CC BY 4.0
ID Serval
serval:BIB_ED991CD0F569
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Enabling semantic queries across federated bioinformatics databases.
Périodique
Database
ISSN
1758-0463 (Electronic)
ISSN-L
1758-0463
Statut éditorial
Publié
Date de publication
01/01/2019
Peer-reviewed
Oui
Volume
2019
Langue
anglais
Notes
Publication types: Journal Article ; Research Support, Non-U.S. Gov't
Publication Status: ppublish
Publication Status: ppublish
Résumé
Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases.
We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.
We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.
Mots-clé
Biological Ontologies, Computational Biology, Databases, Factual, Semantic Web
Pubmed
Open Access
Oui
Financement(s)
Fonds national suisse / Programmes / 407540_167149
Création de la notice
15/11/2019 21:01
Dernière modification de la notice
30/04/2021 6:16