A fuzzy hashing approach based on random sequences and hamming distance

Détails

ID Serval
serval:BIB_E6C8B78E7F2B
Type
Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).
Collection
Publications
Titre
A fuzzy hashing approach based on random sequences and hamming distance
Titre de la conférence
Proceedings of the Conference on Digital Forensics, Security and Law
Auteur⸱e⸱s
Breitinger Frank, Baier Harald
Statut éditorial
Publié
Date de publication
2012
Pages
89-100
Langue
anglais
Résumé
Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function.
In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to ‘rebuild’ an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l = 128). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l, and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file’s bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures.
Mots-clé
A fuzzy hashing approach based on random sequences and hamming distance
Création de la notice
06/05/2021 12:01
Dernière modification de la notice
06/05/2021 12:21
Données d'usage