Information-theoretic software clustering

Details

Serval ID
serval:BIB_0B92A7A8C457
Type
Article: article from journal or magazin.
Collection
Publications
Title
Information-theoretic software clustering
Journal
IEEE Transactions on Software Engineering
Author(s)
Andritsos P., Tzerpos V.
ISSN
0098-5589
Publication state
Published
Issued date
02/2005
Peer-reviewed
Oui
Volume
31
Number
2
Pages
150-165
Language
english
Abstract
The majority of the algorithms in the software clustering literature utilize structural information to decompose large software systems. Approaches using other attributes, such as file names or ownership information, have also demonstrated merit. At the same time, existing algorithms commonly deem all attributes of the software artifacts being clustered as equally important, a rather simplistic assumption. Moreover, no method that can assess the usefulness of a particular attribute for clustering purposes has been presented in the literature. In this paper, we present an approach that applies information theoretic techniques in the context of software clustering. Our approach allows for weighting schemes that reflect the importance of various attributes to be applied. We introduce LIMBO, a scalable hierarchical clustering algorithm based on the minimization of information loss when clustering a software system. We also present a method that can assess the usefulness of any nonstructural attribute in a software clustering context. We applied LIMBO to three large software systems in a number of experiments. The results indicate that this approach produces clusterings that come close to decompositions prepared by system experts. Experimental results were also used to validate our usefulness assessment method. Finally, we experimented with well-established weighting schemes from information retrieval, Web search, and data clustering. We report results as to which weighting schemes show merit in the decomposition of software systems.
Keywords
information theory, Index Terms- Reverse engineering, reengineering, architecture reconstruction, clustering
Web of science
Create date
22/08/2017 13:29
Last modification date
20/08/2019 13:33
Usage data