An approach to Multi View Deep Continuous Clustering using Subspace Projection

Mazumder, Agnip

An approach to Multi View Deep Continuous Clustering using Subspace Projection

Mazumder, Agnip

Date: 2021-07

Abstract:

As we know Clustering is an unsupervised machine learning algorithm, where in a collection of unlabelled data similar data items are grouped under same class and dissimilar data goes to di erent class. One common algorithm for clustering is K- Means Clustering, which initializes k initial centroids and data, centroids are alternately updated to converge to exact k clusters. But in real life scenario it is not possible to predetermine the number of clusters from random data. For that we have Robust Continuous Clustering [1], which takes representative points for each data points. Initially these representative points are data points itself. Then representative points of data points that are likely to be under a same cluster converge at one point. Gradually the representatives move and collapse into easily distinguishable clusters. There are more than one ways of looking at a data. Same data-set can be expressed in multiple forms as di erent data matrices. Such data are called multi- view data [2]. We cannot draw a conclusion about the clustering pattern just based on a single view, considering all views is important for clustering them into groups. Examples include a sentence can be visualised in multiple languages, a person can be visualised by their face data, personality,etc. We need to take all these views into consideration to get a bigger picture of the data pattern. Using Multi-View Continuous Subspace Clustering [3], a consensus subspace representation is initialized. Applying the continuous clustering algorithm on subspace of cach view data points and summing the objective function of all views, we arive at a view consensus clustered structure. High dimensional data poses a challenge as assumptions of many algorithms do not work in higher dimensions. We can use techniques to project the data in lower dimensions, then obtain the cluster structure in lower dimensions. But this algorithm of embedding in proper lower dimension and then obtaining cluster structure has been proven to be less e ective than algorithm of dimension reduction and clustering on the low dimension latent spaces simultaneously in each step, also known as Deep Continuous Clustering [4]. It considers the reconstruction loss and the clustering loss together in each iteration and reduces both simultaneously. Thus we get a clustering of data points in lower dimensional latent space more e ectively. Now we propose to extend the Deep Continuous Clustering in Multi-View combining the idea of Deep Continuous Clustering and Multi-View Continuous Subspace Clustering, to cluster high dimensional multi-view data without pre-specifying the number of clusters e ciently.