Abstract:
Multi-view learning is an emerging machine learning paradigm that focuses on discovering
patterns in data represented by multiple distinct views. One of the important
issues associated with real-life high-dimensional multi-view data is how to integrate
relevant and complementary information from multiple views, while generating discriminative
subspaces for analysis. Although the integration of multi-view data is
expected to provide an intrinsically more powerful model than its single-view counterpart,
it poses its own set of challenges. The most important problems associated
with multi-view data analysis are presence of noisy, irrelevant and heterogeneous views,
high-dimension low-sample size nature of individual views, and updating the databases
with new views.
In this regard, the thesis addresses the problem of multi-view data integration, for
both static and dynamic data sets, in the presence of high-dimensional noisy and redundant
views. The main contribution of the present work is to design some novel
algorithms, based on the theory of canonical correlation analysis (CCA), to extract
informative subspaces for multi-view classification, and theoretically analyze the important
properties of these transformed spaces and new algorithms. The “curse of
dimensionality” problem due to “high-dimension low-sample size” characteristics of
real-life data is addressed, by judiciously integrating the CCA and ridge regression
optimization technique. The relation between CCA and its regularized counterpart is
established, which enables extraction of relevant and significant features sequentially
from bimodal data sets for classification and addresses the scalability issue of real-life
high-dimensional data.
To integrate multi-view data using multiset CCA (MCCA), a new block matrix
representation is introduced. It facilitates generation of discriminative subspaces having
maximum pairwise correlation, and makes the MCCA model scalable to highdimensional
multi-view data. Integration of MCCA with multiset ridge regression
model addresses the “curse of dimensionality” problem of individual views. In order
to integrate dynamic multi-view data, a novel adaptive MCCA model is proposed,
which incrementally updates canonical variables when new views are available for the
analysis. The adaptive model ensures selection of relevant and complementary views
during data integration, while discarding irrelevant and redundant ones. To make
the adaptive framework scalable to high-dimensional data, a new model is introduced
under common latent representation. Finally, a graph based approach is judiciously
integrated with this adaptive model to utilize the underlying geometry of the data in
different views.