INDREX : in-database distributional relation extraction

Kilias, T.; Löser, A.; Andritsos, P.

doi:10.1145/2513190.2513196

INDREX : in-database distributional relation extraction

Details

Request a copy

Serval ID

serval:BIB_44917269F31E

Type

Inproceedings: an article in a conference proceedings.

Collection

Publications

Institution

Production externe

Title

INDREX : in-database distributional relation extraction

Title of the conference

Proceedings of the sixteenth international workshop on Data warehousing and OLAP - DOLAP '13

Author(s)

Kilias T., Löser A., Andritsos P.

Publisher

ACM Press

Address

San Francisco, California

ISBN

9781450324120

Publication state

Published

Issued date

10/2013

Peer-reviewed

Oui

Series

Conference on Information and Knowledge Management

Pages

93-100

Language

english

Abstract

Relation extraction transforms the textual representation of a relationship into the relational model of a data warehouse. Early systems, such as SystemT by IBM or the open source system GATE solve this task with handcrafted rule sets that the system executes document-by-document. Thereby the user must execute a highly interactive and iterative process of reading a document, of expressing rules, of testing these rules on the next document and of refining rules. Until now, these systems do neither leverage the full potential of built-in declarative query languages nor the indexing and query optimization techniques of a modern RDBMS that would enable a user interactive rule refinement across documents and on the entire corpus. We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries. For enabling this powerful functionality we extend a standard PostgreSQL with a set of white-box user-defined functions that enable corpus-wide transformations from sentences into relationships. We store the text corpus and rules in the same RDBMS that already holds domain specific structured data. As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage the full power of built-in indexing and query optimization techniques of the underlaying RDBMS. In a preliminary study we report on the feasibility of this disruptive approach and show multiple queries in INDREX on the Reuters Corpus, Volume 1.

DOI

10.1145/2513190.2513196

Create date

22/08/2017 13:21

Last modification date

20/08/2019 13:49

Usage data

SERVAL

serveur académique lausannois

INDREX : in-database distributional relation extraction

Details