On efficient center-based clustering: from unsupervised learning to clustering under weak supervision/ Avisek Gupta
Material type:
- 23 000SA.072 G977
- Guided by Prof. Swagatam Das
Item type | Current library | Call number | Status | Notes | Date due | Barcode | Item holds | |
---|---|---|---|---|---|---|---|---|
THESIS | ISI Library, Kolkata | 000SA.072 G977 (Browse shelf(Opens below)) | Available | E-Thesis | TH517 |
Browsing ISI Library, Kolkata shelves Close shelf browser (Hides shelf browser)
No cover image available | ||||||||
000SA.072 B995 Cluster analysis / | 000SA.072 B995 Cluster analysis / | 000SA.072 C392 Partitional clustering algorithms / | 000SA.072 G977 On efficient center-based clustering: from unsupervised learning to clustering under weak supervision/ | 000SA.072 H516 Handbook of cluster analysis / | 000SA.072 R614 Robust cluster analysis and variable selection / | 000SA.072 Su966 Dynamic mixed models for familial longitudinal data / |
Thesis (Ph.D.) -Indian Statistical Institute, 2021
Includes bibliographical references
1 Introduction to center-based clustering -- 2 Fast automatic estimation of the number of clusters from the minimum
inter-center distance for k-means clustering -- 3 On the unification of k-harmonic means and fuzzy c-means clustering
problems under kernelization -- 4 Improved efficient model selection for sparse hard and fuzzy center-
based clustering -- 5 Fuzzy clustering to identify clusters at different levels of fuzziness: an
evolutionary multi-objective optimization approach -- 6 Transfer clustering using multiple kernel metrics learned under multi-
instance weak supervision -- 7 Conclusion
Guided by Prof. Swagatam Das
The problem of clustering aims to partition unlabeled data so as to reveal the
natural affinities between data instances. Modern learning algorithms need to be
designed to be applicable on larger datasets that can also be high dimensional.
While acquiring more features and instances can be beneficial, the addition of
noisy and irrelevant features can also obfuscate the true structure of the data;
distance metrics can also fail at high dimensions. To address these challenges,
complex mathematical structures can be used to model different aspects of a
problem, however they can also lead to algorithms with high computation costs,
making the algorithms infeasible for larger datasets.
Among existing classes of clustering methods, we focus on the class of centerbased
clustering which in general consists of methods with low computation costs
that scale linearly with the size of datasets. We identify different factors that
have influence over how effective center-based clustering methods can be. Estimating
the number of clusters is still a challenge, for which we study existing
approaches that have a wide range of computation costs, and propose two lowcost
approaches based on two possible definitions of a cluster. Selecting a suitable
distance metric for clustering is also an important factor. We incorporate a kernel
metric in a center-based clustering method and investigate its performance in the
presence of a large number of clusters. Feature selection and feature extraction
methods exist to identify which features can help estimate the clusters. We focus
on sparse clustering methods and propose a significantly lower computation
approach to simultaneously select features while clustering. Another important
factor is the nature of the clusters identified. Hard clustering methods identify
discrete clusters, whereas soft clustering methods allow soft cluster assignments
of data points to more than one cluster, thereby allowing overlapped clusters to
be identified. We propose a multi-objective evolutionary fuzzy clustering method
that can identify partitions at different degrees of overlap.
Clustering in unsupervised conditions can come with a serious limitation. Instead
of exploring a wide solution space completely unsupervised, some additional supervision
can bias the method to identify clustering solutions that better fit a
dataset. This motivates us to propose a transfer clustering method that learns
a multiple kernel metric in a weakly supervised setting, and then transfers the
learned metric to cluster a dataset in an unsupervised manner. A lower effort
is required to provide weak supervision in comparison to full supervision, while
drastically boosting clustering performance. We recommend weakly supervised
clustering as a promising new direction to overcome the inherent limitations of
identifying clusters in an unsupervised manner.
There are no comments on this title.