PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data

El Zein, Yamane; Lemay, Mathieu; Huguenin, Kévin

doi:10.1109/TCBB.2023.3286274

PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data

Details

Download: ElZein2023TCBB.pdf (1592.20 [Ko])
State: Public
Version: Author's accepted manuscript
License: Not specified

Serval ID

serval:BIB_7F34A9FD2656

Type

Article: article from journal or magazin.

Collection

Publications

Institution

UNIL/CHUV

Title

PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data

Journal

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Author(s)

El Zein Yamane, Lemay Mathieu, Huguenin Kévin

ISSN

1545-5963 (print)
1557-9964 (electronic)

Publication state

Published

Issued date

02/2024

Peer-reviewed

Oui

Volume

Number

Pages

Language

english

Abstract

Biomedical data generation and collection have become faster and more ubiquitous. Consequently, datasets are increasingly spread across hospitals, research institutions, or other entities. Exploiting such distributed datasets simultaneously can be beneficial; in particular, classification using machine learning models such as decision trees is becoming increasingly common and important. However, given that biomedical data is highly sensitive, sharing data records across entities or centralizing them in one location are often prohibited due to privacy concerns or regulations. We design PrivaTree, an efficient and privacy-preserving protocol for collaborative training of decision tree models on distributed, horizontally partitioned, biomedical datasets. Although decision tree models may not always be as accurate as neural networks, they have better interpretability and are helpful in decision-making processes, which are crucial for biomedical applications. PrivaTree follows a federated learning approach, where raw data is not shared, and where every data provider computes updates to a global decision tree model being trained, on their private dataset. This is followed by privacy-preserving aggregation of these updates using additive secret-sharing, in order to collaboratively update the model. We implement PrivaTree, and evaluate its computational and communication efficiency on three different biomedical datasets, as well as the accuracy of the resulting models. Compared to the model centrally trained on all data records, the obtained collaborative model presents a modest loss of accuracy, while consistently outperforming the accuracy of the local models, trained separately by each data provider. Moreover, PrivaTree is more efficient than existing solutions, which makes it usable for training decision trees with numerous nodes, on large complex datasets, with both continuous and categorical attributes, as often found in the biomedical field.

Keywords

Decision Trees, Privacy-Preserving Machine Learning, Scalability, Biomedical Data, Decision-making

URN

urn:nbn:ch:serval-BIB_7F34A9FD26562

OAI-PMH

oai:serval.unil.ch:BIB_7F34A9FD2656

DOI

10.1109/TCBB.2023.3286274

Create date

03/06/2023 8:44

Last modification date

07/02/2024 7:18

Usage data

SERVAL

serveur académique lausannois

PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data

Details