Throughput: A Key Performance Measure of Content-Defined Chunking Algorithms

Détails

ID Serval
serval:BIB_5372662C2DAB
Type
Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).
Collection
Publications
Institution
Titre
Throughput: A Key Performance Measure of Content-Defined Chunking Algorithms
Titre de la conférence
2016 IEEE 36th International Conference on Distributed Computing Systems Workshops (ICDCSW)
Auteur⸱e⸱s
Chapuis B., Garbinato B., Andritsos P.
Editeur
IEEE
ISBN
978-1-5090-3686-8
Statut éditorial
Publié
Date de publication
06/2016
Peer-reviewed
Oui
Série
IEEE International Conference on Distributed Computing Systems Workshops
Pages
7-12
Langue
anglais
Résumé
Data deduplication techniques are often used by cloud storage systems to reduce network bandwidth and storage requirements. As a consequence, the current research literature tends to focus most of its algorithmic efforts on improving the Duplicate Elimination Ratio (DER), which reflects the compression achieved using a given algorithm. Yet, the importance of this indicator tends to be overestimated, while another key indicator, namely throughput, tends to be underestimated. To substantiate this claim, we reimplement a selection of popular Content-Defined Chunking algorithms (CDC) and perform a detailed performance analysis. On this basis, we show that the gain brought by algorithms that are aggressively focusing on DER often come at a significant cost in terms of throughput. As a consequence, we advocate for future optimizations taking throughput into account and for making balanced tradeoffs between DER and throughput.
Mots-clé
Content-defined chunking, Duplicate elimination ratio, Rolling hash function, Performance, throughput
Web of science
Création de la notice
13/07/2017 15:21
Dernière modification de la notice
20/08/2019 14:08
Données d'usage