The impact of excluding common blocks for approximate matching
Détails
ID Serval
serval:BIB_203F1FD91074
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
The impact of excluding common blocks for approximate matching
Périodique
Computers & Security
ISSN
0167-4048
Statut éditorial
Publié
Date de publication
02/2020
Volume
89
Pages
101676
Langue
anglais
Résumé
Approximate matching functions allow the identification of similarity (bytewise level) in a very efficient way, by creating and comparing compact representations of objects (a.k.a digests). However, many similarity matches occur due to common data that repeats over many different files and consist of inner structure, header and footer information, color tables, font specifications, etc.; data created by applications and not generated by users. Most of the times, this sort of information is less relevant from an investigator perspective and should be avoided. In this work, we show how the common data can be identified and filtered out by using approximate matching, as well as how they are spread over different file types and their frequency. We assess the impact on similarity when removing it (i.e., in the number of matches) and the effects on performance. Our results show that for a small price on performance, a reduction about 87% on the number of matches can be achieved when removing such data.
Mots-clé
General Computer Science, Law
Site de l'éditeur
Création de la notice
06/05/2021 11:01
Dernière modification de la notice
06/05/2021 11:43