Data Driven Discovery of Attribute Dictionaries

Chiang, F.; Andritsos, P.; Miller, R.J.

doi:10.1007/978-3-662-49521-6_4

Data Driven Discovery of Attribute Dictionaries

Détails

Demande d'une copie

ID Serval

serval:BIB_8388977F1C03

Type

Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).

Collection

Publications

Institution

UNIL/CHUV

Titre

Data Driven Discovery of Attribute Dictionaries

Titre de la conférence

Transactions on Computational Collective Intelligence XXI - Special Issue on Keyword Search and Big Data

Auteur⸱e⸱s

Chiang F., Andritsos P., Miller R.J.

Editeur

Springer Berlin Heidelberg

ISBN

9783662495209
9783662495216

ISSN

0302-9743
1611-3349

Statut éditorial

Publié

Date de publication

2016

Peer-reviewed

Oui

Editeur⸱rice scientifique

Nguyen N.T., Kowalczyk R., Rupino da Cunha P.

Volume

9630

Série

Lecture Notes in Computer Science (LNCS)

Pages

69-96

Langue

anglais

Résumé

Online product search engines such as Google and Yahoo shopping, rely on having extensive and complete product information to return accurate and timely search results. Given the expanding scope of products and updates to existing products, automated techniques are needed to ensure the underlying product dictionaries remain current and complete. Product search engines receive offers from merchants describing product specific attributes and characteristics. These offers normally contain structured attribute-value pairs, and unstructured (textual) descriptions describing product characteristics and features. For example, a laptop offer may contain attribute-value pairs such as “model-X42” and “RAM-8 GB”, and a text description of the software, accessories, battery features, warranty, etc. Updating the product dictionaries using the textual descriptions is a more challenging task than using the attribute-value pairs since the relevant attribute values must first be extracted. This task becomes difficult since the text descriptions often do not follow a predefined format, and the data in the descriptions vary across different merchants and products. However, this information needs to be captured to ensure a comprehensive and complete product listing. In this paper, we present techniques that extract attribute values from textual product descriptions. We introduce an end-to-end framework that takes an input string record, and parses the tokens in a record to identify candidate attribute values. We then map these values to attributes. We take an information theoretic approach to identify groups of tokens that represent an attribute value. We demonstrate the accuracy and relevance of our approach using a variety of real data sets.

Mots-clé

Information extraction, Clustering, Dictionaries

OAI-PMH

oai:serval.unil.ch:BIB_8388977F1C03

DOI

10.1007/978-3-662-49521-6_4

Web of science

000385733900005

Création de la notice

21/08/2017 13:42

Dernière modification de la notice

20/08/2019 15:43

Données d'usage

SERVAL

serveur académique lausannois

Data Driven Discovery of Attribute Dictionaries

Détails