DSpace Repository

On Efficient Center-based Clustering : from Unsupervised Learning to Clustering under Weak Supervision

Show simple item record

dc.contributor.author Gupta, Avisek
dc.date.accessioned 2021-11-24T09:54:00Z
dc.date.available 2021-11-24T09:54:00Z
dc.date.issued 2021-11
dc.identifier.citation 188p. en_US
dc.identifier.uri http://hdl.handle.net/10263/7199
dc.description Thesis under the supervision of Dr. Swagatam Das en_US
dc.description.abstract The problem of clustering aims to partition unlabeled data so as to reveal the natural affinities between data instances. Modern learning algorithms need to be designed to be applicable on larger datasets that can also be high dimensional. While acquiring more features and instances can be beneficial, the addition of noisy and irrelevant features can also obfuscate the true structure of the data; distance metrics can also fail at high dimensions. To address these challenges, complex mathematical structures can be used to model different aspects of a problem, however they can also lead to algorithms with high computation costs, making the algorithms infeasible for larger datasets. Among existing classes of clustering methods, we focus on the class of center-based clustering which in general consists of methods with low computation costs that scale linearly with the size of datasets. We identify different factors that have influence over how effective center-based clustering methods can be. Estimating the number of clusters is still a challenge, for which we study existing approaches that have a wide range of computation costs, and propose two low-cost approaches based on two possible definitions of a cluster. Selecting a suitable distance metric for clustering is also an important factor. We incorporate a kernel metric in a center-based clustering method and investigate its performance in the presence of a large number of clusters. Feature selection and feature extraction methods exist to identify which features can help estimate the clusters. We focus on sparse clustering methods and propose a significantly lower computation approach to simultaneously select features while clustering. Another important factor is the nature of the clusters identified. Hard clustering methods identify discrete clusters, whereas soft clustering methods allow soft cluster assignments of data points to more than one cluster, thereby allowing overlapped clusters to be identified. We propose a multi-objective evolutionary fuzzy clustering method that can identify partitions at different degrees of overlap. Clustering in unsupervised conditions can come with a serious limitation. Instead of exploring a wide solution space completely unsupervised, some additional supervision can bias the method to identify clustering solutions that better fit a dataset. This motivates us to propose a transfer clustering method that learns a multiple kernel metric in a weakly supervised setting, and then transfers the learned metric to cluster a dataset in an unsupervised manner. A lower effort is required to provide weak supervision in comparison to full supervision, while drastically boosting clustering performance. We recommend weakly supervised clustering as a promising new direction to overcome the inherent limitations of identifying clusters in an unsupervised manner. en_US
dc.language.iso en en_US
dc.publisher Indian Statistical Institute,Kolkata en_US
dc.relation.ispartofseries ISI Ph. D Thesis;TH517
dc.subject Center-based Clustering en_US
dc.subject Data Clustering en_US
dc.subject Kernel Metric Learning en_US
dc.subject Multiple Kernel Metric en_US
dc.title On Efficient Center-based Clustering : from Unsupervised Learning to Clustering under Weak Supervision en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • Theses
    (ISI approved PhD theses)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account