How Cuckoo Filter Can Improve Existing Approximate Matching Techniques

Détails

ID Serval
serval:BIB_7D7931246BFD
Type
Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).
Collection
Publications
Titre
How Cuckoo Filter Can Improve Existing Approximate Matching Techniques
Titre de la conférence
Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
Auteur⸱e⸱s
Gupta Vikas, Breitinger Frank
Editeur
Springer International Publishing
ISBN
9783319255118
9783319255125
ISSN
1867-8211
1867-822X
Statut éditorial
Publié
Date de publication
2015
Editeur⸱rice scientifique
James Joshua I., Breitinger Frank
Volume
157
Pages
39-52
Langue
anglais
Résumé
In recent years, approximate matching algorithms have become an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches but especially sdhash and mrsh-v2 attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have a quite different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been around for a while. Recently, a new data structure was proposed and claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter.
In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37% and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint.
Mots-clé
Approximate matching, Similarity hashing, Bloom filter, Cuckoo filter, Fuzzy hashing, Similarity hashing, mrsh-v2, mrsh-net
Création de la notice
06/05/2021 12:01
Dernière modification de la notice
06/05/2021 12:38
Données d'usage