TY - JOUR
T1 - Add noise to remove noise
T2 - Local differential privacy for feature selection
AU - Alishahi, Mina
AU - Moghtadaiee, Vahideh
AU - Navidan, Hojjat
N1 - Funding Information:
This work was partially supported by the AMdEX Fieldlab project funded by Kansen Voor West EFRO (KVW00309) and the province of Noord-Holland.
Publisher Copyright:
© 2022
PY - 2022/12
Y1 - 2022/12
N2 - Feature selection has become significantly important for data analysis. It selects the most informative features describing the data to filter out the noise, complexity, and over-fitting caused by less relevant features. Accordingly, feature selection improves the predictors’ accuracy, enables them to be trained faster and more cost-effectively, and provides a better understanding of the underlying data. While plenty of practical solutions have been proposed in the literature to identify the most discriminating features describing a dataset, an understanding of feature selection over privacy-sensitive data in the absence of a trusted party is still missing. The design of such a framework is specifically important in our modern society, where each individual through accessing the Internet can play simultaneously the role of a data provider and a data-analysis beneficiary. In this study, we propose a novel feature selection framework based on Local Differential Privacy (LDP), named LDP-FS, which estimates the importance of features over securely protected data while protects the confidentiality of each individual data before leaving the user's device. The performance of LDP-FS in terms of scoring and ordering the features is assessed by investigating the impact of datasets properties, privacy mechanism, privacy levels, and feature selection techniques on this framework. The accuracy of classifiers trained on the selected subset of features by LDP-FS is also presented. Our experimental results demonstrate the effectiveness and efficiency of the proposed framework.
AB - Feature selection has become significantly important for data analysis. It selects the most informative features describing the data to filter out the noise, complexity, and over-fitting caused by less relevant features. Accordingly, feature selection improves the predictors’ accuracy, enables them to be trained faster and more cost-effectively, and provides a better understanding of the underlying data. While plenty of practical solutions have been proposed in the literature to identify the most discriminating features describing a dataset, an understanding of feature selection over privacy-sensitive data in the absence of a trusted party is still missing. The design of such a framework is specifically important in our modern society, where each individual through accessing the Internet can play simultaneously the role of a data provider and a data-analysis beneficiary. In this study, we propose a novel feature selection framework based on Local Differential Privacy (LDP), named LDP-FS, which estimates the importance of features over securely protected data while protects the confidentiality of each individual data before leaving the user's device. The performance of LDP-FS in terms of scoring and ordering the features is assessed by investigating the impact of datasets properties, privacy mechanism, privacy levels, and feature selection techniques on this framework. The accuracy of classifiers trained on the selected subset of features by LDP-FS is also presented. Our experimental results demonstrate the effectiveness and efficiency of the proposed framework.
KW - Feature ranking
KW - Feature selection
KW - Local differential privacy
KW - Machine learning
KW - Privacy preserving
U2 - 10.1016/j.cose.2022.102934
DO - 10.1016/j.cose.2022.102934
M3 - Article
AN - SCOPUS:85140000313
SN - 0167-4048
VL - 123
JO - Computers and Security
JF - Computers and Security
M1 - 102934
ER -