On efficient center-based clustering: from unsupervised learning to clustering under weak supervision/ Avisek Gupta

By:

Gupta, Avisek [author]

Material type: Text

TextPublication details: Kolkata: Indian Statistical Institute, 2021Description: 175 pagesSubject(s):

Data Clustering

DDC classification:

23 000SA.072 G977

Online resources:

Full text

Contents:

1 Introduction to center-based clustering -- 2 Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering -- 3 On the unification of k-harmonic means and fuzzy c-means clustering problems under kernelization -- 4 Improved efficient model selection for sparse hard and fuzzy center- based clustering -- 5 Fuzzy clustering to identify clusters at different levels of fuzziness: an evolutionary multi-objective optimization approach -- 6 Transfer clustering using multiple kernel metrics learned under multi- instance weak supervision -- 7 Conclusion

Production credits:

Guided by Prof. Swagatam Das

Dissertation note: Thesis (Ph.D.) -Indian Statistical Institute, 2021 Summary: The problem of clustering aims to partition unlabeled data so as to reveal the natural affinities between data instances. Modern learning algorithms need to be designed to be applicable on larger datasets that can also be high dimensional. While acquiring more features and instances can be beneficial, the addition of noisy and irrelevant features can also obfuscate the true structure of the data; distance metrics can also fail at high dimensions. To address these challenges, complex mathematical structures can be used to model different aspects of a problem, however they can also lead to algorithms with high computation costs, making the algorithms infeasible for larger datasets. Among existing classes of clustering methods, we focus on the class of centerbased clustering which in general consists of methods with low computation costs that scale linearly with the size of datasets. We identify different factors that have influence over how effective center-based clustering methods can be. Estimating the number of clusters is still a challenge, for which we study existing approaches that have a wide range of computation costs, and propose two lowcost approaches based on two possible definitions of a cluster. Selecting a suitable distance metric for clustering is also an important factor. We incorporate a kernel metric in a center-based clustering method and investigate its performance in the presence of a large number of clusters. Feature selection and feature extraction methods exist to identify which features can help estimate the clusters. We focus on sparse clustering methods and propose a significantly lower computation approach to simultaneously select features while clustering. Another important factor is the nature of the clusters identified. Hard clustering methods identify discrete clusters, whereas soft clustering methods allow soft cluster assignments of data points to more than one cluster, thereby allowing overlapped clusters to be identified. We propose a multi-objective evolutionary fuzzy clustering method that can identify partitions at different degrees of overlap. Clustering in unsupervised conditions can come with a serious limitation. Instead of exploring a wide solution space completely unsupervised, some additional supervision can bias the method to identify clustering solutions that better fit a dataset. This motivates us to propose a transfer clustering method that learns a multiple kernel metric in a weakly supervised setting, and then transfers the learned metric to cluster a dataset in an unsupervised manner. A lower effort is required to provide weak supervision in comparison to full supervision, while drastically boosting clustering performance. We recommend weakly supervised clustering as a promising new direction to overcome the inherent limitations of identifying clusters in an unsupervised manner.

Tags from this library: No tags from this library for this title. Log in to add tags.

Holdings
Item type	Current library	Call number	Status	Notes	Date due	Barcode	Item holds
THESIS	ISI Library, Kolkata	000SA.072 G977 (Browse shelf(Opens below))	Available	E-Thesis		TH517

Total holds: 0

Browsing ISI Library, Kolkata shelves Close shelf browser (Hides shelf browser)

Previous				No cover image available				Next
Previous	000SA.072 B995 Cluster analysis /	000SA.072 B995 Cluster analysis /	000SA.072 C392 Partitional clustering algorithms /	000SA.072 G977 On efficient center-based clustering: from unsupervised learning to clustering under weak supervision/	000SA.072 H516 Handbook of cluster analysis /	000SA.072 R614 Robust cluster analysis and variable selection /	000SA.072 Su966 Dynamic mixed models for familial longitudinal data /	Next

Thesis (Ph.D.) -Indian Statistical Institute, 2021

Includes bibliographical references

1 Introduction to center-based clustering -- 2 Fast automatic estimation of the number of clusters from the minimum
inter-center distance for k-means clustering -- 3 On the unification of k-harmonic means and fuzzy c-means clustering
problems under kernelization -- 4 Improved efficient model selection for sparse hard and fuzzy center-
based clustering -- 5 Fuzzy clustering to identify clusters at different levels of fuzziness: an
evolutionary multi-objective optimization approach -- 6 Transfer clustering using multiple kernel metrics learned under multi-
instance weak supervision -- 7 Conclusion

Guided by Prof. Swagatam Das

The problem of clustering aims to partition unlabeled data so as to reveal the
natural affinities between data instances. Modern learning algorithms need to be
designed to be applicable on larger datasets that can also be high dimensional.
While acquiring more features and instances can be beneficial, the addition of
noisy and irrelevant features can also obfuscate the true structure of the data;
distance metrics can also fail at high dimensions. To address these challenges,
complex mathematical structures can be used to model different aspects of a
problem, however they can also lead to algorithms with high computation costs,
making the algorithms infeasible for larger datasets.
Among existing classes of clustering methods, we focus on the class of centerbased
clustering which in general consists of methods with low computation costs
that scale linearly with the size of datasets. We identify different factors that
have influence over how effective center-based clustering methods can be. Estimating
the number of clusters is still a challenge, for which we study existing
approaches that have a wide range of computation costs, and propose two lowcost
approaches based on two possible definitions of a cluster. Selecting a suitable
distance metric for clustering is also an important factor. We incorporate a kernel
metric in a center-based clustering method and investigate its performance in the
presence of a large number of clusters. Feature selection and feature extraction
methods exist to identify which features can help estimate the clusters. We focus
on sparse clustering methods and propose a significantly lower computation
approach to simultaneously select features while clustering. Another important
factor is the nature of the clusters identified. Hard clustering methods identify
discrete clusters, whereas soft clustering methods allow soft cluster assignments
of data points to more than one cluster, thereby allowing overlapped clusters to
be identified. We propose a multi-objective evolutionary fuzzy clustering method
that can identify partitions at different degrees of overlap.
Clustering in unsupervised conditions can come with a serious limitation. Instead
of exploring a wide solution space completely unsupervised, some additional supervision
can bias the method to identify clustering solutions that better fit a
dataset. This motivates us to propose a transfer clustering method that learns
a multiple kernel metric in a weakly supervised setting, and then transfers the
learned metric to cluster a dataset in an unsupervised manner. A lower effort
is required to provide weak supervision in comparison to full supervision, while
drastically boosting clustering performance. We recommend weakly supervised
clustering as a promising new direction to overcome the inherent limitations of
identifying clusters in an unsupervised manner.

There are no comments on this title.

to post a comment.

Place hold
Print
Add to your cart (remove)
Save record
BIBTEX Dublin Core MARCXML MARC (non-Unicode/MARC-8) MARC (Unicode/UTF-8) MARC (Unicode/UTF-8, Standard) MODS (XML) RIS
More searches

Search for this title in:
Other Libraries (WorldCat) Other Databases (Google Scholar) Online Stores (Bookfinder.com)