Throughput: A Key Performance Measure of Content-Defined Chunking Algorithms

Details

Serval ID
serval:BIB_5372662C2DAB
Type
Inproceedings: an article in a conference proceedings.
Collection
Publications
Institution
Title
Throughput: A Key Performance Measure of Content-Defined Chunking Algorithms
Title of the conference
2016 IEEE 36th International Conference on Distributed Computing Systems Workshops (ICDCSW)
Author(s)
Chapuis B., Garbinato B., Andritsos P.
Publisher
IEEE
ISBN
978-1-5090-3686-8
Publication state
Published
Issued date
06/2016
Peer-reviewed
Oui
Series
IEEE International Conference on Distributed Computing Systems Workshops
Pages
7-12
Language
english
Abstract
Data deduplication techniques are often used by cloud storage systems to reduce network bandwidth and storage requirements. As a consequence, the current research literature tends to focus most of its algorithmic efforts on improving the Duplicate Elimination Ratio (DER), which reflects the compression achieved using a given algorithm. Yet, the importance of this indicator tends to be overestimated, while another key indicator, namely throughput, tends to be underestimated. To substantiate this claim, we reimplement a selection of popular Content-Defined Chunking algorithms (CDC) and perform a detailed performance analysis. On this basis, we show that the gain brought by algorithms that are aggressively focusing on DER often come at a significant cost in terms of throughput. As a consequence, we advocate for future optimizations taking throughput into account and for making balanced tradeoffs between DER and throughput.
Keywords
Content-defined chunking, Duplicate elimination ratio, Rolling hash function, Performance, throughput
Web of science
Create date
13/07/2017 16:21
Last modification date
20/08/2019 15:08
Usage data