Properties of a similarity preserving hash function and their realization in sdhash

Détails

ID Serval
serval:BIB_4B0DF3C87990
Type
Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).
Collection
Publications
Titre
Properties of a similarity preserving hash function and their realization in sdhash
Titre de la conférence
2012 Information Security for South Africa
Auteur⸱e⸱s
Breitinger Frank, Baier Harald
Editeur
IEEE
ISBN
9781467321594
9781467321600
9781467321587
Statut éditorial
Publié
Date de publication
08/2012
Langue
anglais
Résumé
Finding similarities between byte sequences is a complex task and necessary in many areas of computer science, e.g., to identify malicious files or spam. Instead of comparing files against each other, one may apply a similarity preserving compression function (hash function) first and do the comparison for the hashes. Although we have different approaches, there is no clear definition / specification or needed properties of such algorithms available. This paper presents four basic properties for similarity pre- serving hash functions that are partly related to the properties of cryptographic hash functions. Compression and ease of computation are borrowed from traditional hash functions and define the hash value length and the performance. As every byte is expected to influence the hash value, we introduce coverage. Similarity score describes the need for a comparison function for hash values. We shortly discuss these properties with respect to three existing approaches and finally have a detailed view on the promising approach sdhash. However, we uncovered some bugs and other peculiarities of the implementation of sdhash. Finally we conclude that sdhash has the potential to be a robust similarity preserving digest algorithm, but there are some points that need to be improved.
Mots-clé
cryptography, file organisation, unsolicited e-mail, byte sequences, hash function, malicious files, sdhash, Digital forensics, fuzzy hashing, properties of similarity preserving hashing, similarity preserving hashing
Création de la notice
06/05/2021 12:01
Dernière modification de la notice
06/05/2021 12:24
Données d'usage