USING ND-LAPLACE TO TRAIN PRIVACY-PRESERVING CLUSTERING ALGORITHMS ON DISTRIBUTED N-DIMENSIONAL DATA

  • Tjibbe van der Ende

Student thesis: Master's Thesis

Abstract

The importance of data in our lives emphasizes the critical need for data security, particularly considering that this data often contains sensitive and personal information. In response to this concern, privacy laws like the US Privacy Act and the General Data Protection Regulation (GDPR) were enacted. However, these regulations pose challenges for data-mining techniques, such as clustering, as they necessitate storing historical data for
training, thereby increasing the risk of data leaks. Since storing plain data on the server is not viable, it underscores the necessity of implementing differential privacy (DP). DP- is a technique that enhances data with controlled noise and provides the opportunity to protect privacy while maintaining clustering patterns. The original data must still be copied entirely for this method to work. Therefore, Local differential privacy (LDP) was introduced, adding local noise. This removes the privacy constraints, but increases the noise impact on the data.
Balancing utility and privacy in LDP for clustering poses known challenges, often limited by information leaks and cluster algorithms only trained 2-dimensional and 3-dimensional datasets. Our research addresses these issues with realistic scenarios and data, aiming to establish a robust privacy framework for secure training of diverse clustering algorithms on distributed n-dimensional data.

The primary goal of this research is to enhance privacy and utility for data clustering by utilizing the Geo-indistinguishability (GI) framework, a distance-optimized framework for achieving differential privacy. The GI framework uses the 2D-Laplace and 3D-Laplace mechanisms, to add 2-dimensional and 3-dimensional data based on a privacy budget. In this thesis, we adapt these mechanisms for clustering and extend this to nD-Laplace for working with n-dimensional data. Therefore, the following research question is formulated:
How can the nD-Laplace mechanism be applied in training privacy-preserving clustering algorithms on distributed n-dimensional data?
This research addresses three sub-questions, including the adaptation of nD-Laplace for privacy-preserving clustering algorithms. It also investigates how dataset characteristics affect nD-Laplace’s utility and privacy, with hypotheses related to data shape, dimensions, and adaptability. To answer these questions, we conducted a literature study and experiments to collect quantitative data and compared it to an existing privacy mechanism called Piecewise.

The results have shown that the nD-Laplace mechanism achieves high accuracy on specific datasets compared to the Piecewise mechanism. Additionally, the findings underscored the substantial impact of the privacy budget, indicating that higher budgets enhance utility at the expense of decreased privacy. Furthermore, the privacy results showed that the mechanisms sometimes exhibit high privacy leakage. In the research and experiments, we also included a solution for this issue as an extension of the nD-Laplace mechanism.

The research uncovered limitations in using nD-Laplace for specific clustering algorithms due to its sensitivity to specific data shapes. Additionally, there was an emphasis on refining privacy metrics, as the current ones failed to capture the most important privacy properties. Future work should delve deeper into adapting nD-Laplace to diverse data shapes, expanding dimensionality beyond current limits, and enhancing support for categorical and binary data for broader practical applications.


Date of Award12 Oct 2023
Original languageEnglish
SupervisorMina Sheikh Alishahi (Examiner) & Clara Maathuis (Co-assessor)

Master's Degree

  • Master Software Engineering

Cite this

'