An Efficient Type-Agnostic Approach for Finding Sub-sequences in Data

Détails

ID Serval
serval:BIB_7EF8C37FC0BF
Type
Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).
Collection
Publications
Institution
Titre
An Efficient Type-Agnostic Approach for Finding Sub-sequences in Data
Titre de la conférence
2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
Auteur⸱e⸱s
Chapuis B., Garbinato B., Andritsos P.
Editeur
IEEE
ISBN
9781538625880
Statut éditorial
Publié
Date de publication
12/2017
Peer-reviewed
Oui
Résumé
In this paper, we present an efficient type-agnostic approach for finding sub-sequences in data, such as text documents or GPS trajectories. Our approach relies on data deduplication for creating an inverted index. In contrast with existing data deduplication techniques that split raw sequences of characters arbitrarily, our approach preserves the semantics of the original sequence via the notion of token and can be used to index normalized data. When compared to indexing methods that preserve the semantics and operate on normalized data, our method increases the relevance of the inverted index, reduces its size and improves its performances. As data normalization is generally not used beyond the scope of textual data, we introduce a framework that helps identify the extent to which data should be normalized regardless of its type. On this basis, we demonstrate with a dataset made of GPS trajectories that our method can be used agnostically: it can be used to index and query data of a completely different type. Finally, we show that the resulting spatial-index is characterized by a better discrimination than classic spatial-indexing approaches.
Création de la notice
28/02/2018 11:51
Dernière modification de la notice
21/08/2019 6:13
Données d'usage