Abstract:
As we know Clustering is an unsupervised machine learning algorithm, where in
a collection of unlabelled data similar data items are grouped under same class
and dissimilar data goes to di erent class. One common algorithm for clustering
is K- Means Clustering, which initializes k initial centroids and data, centroids are
alternately updated to converge to exact k clusters.
But in real life scenario it is not possible to predetermine the number of clusters
from random data. For that we have Robust Continuous Clustering [1], which
takes representative points for each data points. Initially these representative points
are data points itself. Then representative points of data points that are likely to
be under a same cluster converge at one point. Gradually the representatives move
and collapse into easily distinguishable clusters.
There are more than one ways of looking at a data. Same data-set can be
expressed in multiple forms as di erent data matrices. Such data are called multi-
view data [2]. We cannot draw a conclusion about the clustering pattern just based
on a single view, considering all views is important for clustering them into groups.
Examples include a sentence can be visualised in multiple languages, a person can
be visualised by their face data, personality,etc. We need to take all these views into
consideration to get a bigger picture of the data pattern.
Using Multi-View Continuous Subspace Clustering [3], a consensus subspace
representation is initialized. Applying the continuous clustering algorithm on
subspace of cach view data points and summing the objective function of all views,
we arive at a view consensus clustered structure.
High dimensional data poses a challenge as assumptions of many algorithms
do not work in higher dimensions. We can use techniques to project the data
in lower dimensions, then obtain the cluster structure in lower dimensions. But
this algorithm of embedding in proper lower dimension and then obtaining cluster
structure has been proven to be less e ective than algorithm of dimension reduction
and clustering on the low dimension latent spaces simultaneously in each step, also
known as Deep Continuous Clustering [4]. It considers the reconstruction loss
and the clustering loss together in each iteration and reduces both simultaneously.
Thus we get a clustering of data points in lower dimensional latent space more
e ectively.
Now we propose to extend the Deep Continuous Clustering in Multi-View combining
the idea of Deep Continuous Clustering and Multi-View Continuous Subspace
Clustering, to cluster high dimensional multi-view data without pre-specifying the
number of clusters e ciently.