An Empirical Study of the Usage of Checksums for Web Downloads

Détails

Ressource 1Télécharger: Bernard2023WWW.pdf (2355.70 [Ko])
Etat: Public
Version: Final published version
Licence: CC BY 4.0
ID Serval
serval:BIB_DB6F0918594E
Type
Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).
Collection
Publications
Institution
Titre
An Empirical Study of the Usage of Checksums for Web Downloads
Titre de la conférence
Proceedings of the WebConference (WWW)
Auteur⸱e⸱s
Bernard Gaël, Coudert Rémi, Chapuis Bertil, Huguenin Kévin
Statut éditorial
Publié
Date de publication
04/2023
Peer-reviewed
Oui
Pages
2155-‌2165
Langue
anglais
Résumé
Checksums, typically provided on webpages and generated from cryptographic hash functions (e.g., MD5, SHA256) or signature schemes (e.g., PGP), are commonly used on websites to enable users to verify that the files they download have not been tampered with when stored on possibly untrusted servers. In this paper, we shed light on the current practices regarding the usage of checksums for web downloads (hash functions used, visibility and validity of checksums, type of websites and files, presence of instructions, etc.), as this has been mostly overlooked so far. Using a snowball-sampling strategy for the 200,000 most popular domains of the Web, we first crawled a dataset of 8.5M webpages, from which we built, through an active-learning approach, a unique dataset of 277 diverse webpages that contain checksums. Our analysis of these webpages reveals interesting findings about the usage of checksums. For instance, it shows that checksums are used mostly to verify program files, that weak hash functions are frequently used and that a non-negligible proportion of the checksums provided on webpages do not match that of their associated files.
We make freely available our dataset and the code for collecting and analyzing it.
Finally, we complement our analysis with a survey of the webmasters of the considered webpages (26 complete responses), shedding light on the reasons behind the checksum-related choices they make.
Données de la recherche
Open Access
Oui
APC
700 USD
Financement(s)
La Fondation Hasler / 19024
Création de la notice
25/01/2023 23:37
Dernière modification de la notice
10/10/2023 7:00
Données d'usage