A fuzzy hashing approach based on random sequences and hamming distance

Details

Serval ID
serval:BIB_E6C8B78E7F2B
Type
Inproceedings: an article in a conference proceedings.
Collection
Publications
Title
A fuzzy hashing approach based on random sequences and hamming distance
Title of the conference
Proceedings of the Conference on Digital Forensics, Security and Law
Author(s)
Breitinger Frank, Baier Harald
Publication state
Published
Issued date
2012
Pages
89-100
Language
english
Abstract
Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function.
In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to ‘rebuild’ an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l = 128). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l, and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file’s bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures.
Keywords
A fuzzy hashing approach based on random sequences and hamming distance
Create date
06/05/2021 11:01
Last modification date
06/05/2021 11:21
Usage data