Privacy Aware Machine Learning

Biswas, Chandan

DSpace Home
→
Dissertation and Thesis
→
Theses
→
View Item

Privacy Aware Machine Learning

Biswas, Chandan

URI: http://hdl.handle.net/10263/7410

Date: 2023-08

Abstract:

Privacy preserving computation is of utmost importance in a cloud computing environ- ment where a client often requires to send sensitive data to servers, offering computing services, for computational purposes over untrusted networks. Sharing the raw or an ab- stract representation of a labelled or unlabelled dataset on cloud platforms can potentially expose sensitive information of the data to an adversary, e.g., in the case of an emotion classification task from text, an adversary-agnostic abstract representation of the text data may eventually lead an adversary to identify the demographics of the authors, such as their gender and age, etc. The leakage of sensitive information from the data may take place due to eavesdropping over the network or malware residing at the server. Privacy preserving computation workflows aim to prevent such leakage of sensitive information by introducing a suitable encoding transformation on sample data points. Such an encoding strategy has dual objectives, the first being that it should be difficult to reconstruct the original data in the absence of any knowledge of the encoding strategy and its parameters. Secondly, the computational results obtained using the encoded data should not be substantially different from those obtained using the same data in its original form. Standard encoding mechanisms, such as locality sensitive hashing (LSH), caters to the first objective of privacy preserving computation workflow, the second objective may not always be adequately satisfied. In this thesis, we focus on the second objective and the computational activity that we focus on is a supervised classification task in addition to the K-means clustering, which has been widely used for various data mining jobs. Here, we have addressed the problem of privacy preserving computation on the above two tasks in three different ways, Initially, we have proposed a new variant of the K-means algorithm which is capable of privacy preservation in the sense that it takes binary encoded data as input, and does not require access to the data in its original form at any stage of the computation. The proposed strategy is capable of producing the required number of clusters which are sufficiently close to the respective clusters computed from the original non-encoded data. The results of the proposed strategy on image or text data are either comparable or outperform the standard K-means clustering algorithm. Secondly, we have explored a deep metric learning approach to learn a parameterized encoding transformation with an objective of maximizing the alignment of the clusters obtained in the encoded space with the same obtained from the original data. To this end, we train a weakly supervised deep network using triplets constructed from the output of a clustering algorithm on a subset of the non-encoded data. Our proposed method of weakly- supervised approach yields more effective encoding in comparison to approaches where the encoding process is agnostic of the clustering objective. Finally, we propose a universal defense mechanism against malicious attempts of stealing sensitive information from data shared on cloud platforms. More specifically, our proposed method employs an informative subspace based multi-objective approach to produce a sensitive information aware encoding of the data representation. A number of experiments conducted on both standard text and image datasets demonstrate the ability of our proposed approach to reduce the effectiveness of the adversarial task without remarkably affecting the effectiveness of the primary task itself.