Abstract:
In this thesis, we identify a few gaps in the existing methods of dimensionality reduction
for data visualization and classification and propose some solutions to those as
summarized below.
Most of the data visualization methods do not learn any explicit function to project
high dimensional data to a lower dimension. To overcome the difficulty associated with
the absence of an explicit map, in Chapter 2, we propose a framework to estimate
explicit maps for data visualization in a supervised setting. The quality of output of
any regression-type system depends on the quality of the target data. However, even
for simple data, sometimes the target data for visualization may be severely distorted.
We present a framework that can significantly correct such distortions in the output
for data visualization.
For any supervised data visualization method the availability of target data is indispensable,
which limits the applicability of such methods. Another problem with
most of the methods is that they always produce some output given any input, even
when the test input is far from the “sampling window” of the training data. In Chapter
3, using a fuzzy rule-based system (FRBS), we propose an unsupervised approach
to learn explicit maps for data visualization that addresses the previously mentioned
issues. The proposed method can project out-of-sample instances in a straightforward
manner. It can also refuse to project an out-of-sample instance when it is far away
from the sampling window of the training data. We have demonstrated the generality
of the proposed framework using different objective functions for learning the FRBS.
When a data set has significant differences between its class and cluster structure,
features selected considering only the discrimination between classes would lead to poor
clustering performance. Similarly, features selected considering only the preservation of
cluster structures would lead to poor classification performance. To address this issue,
in Chapter 4, we propose a neural network-based feature selection method that focuses
both on class discrimination and structure preservation. For large datasets, to reduce
the computational overhead we propose an effective sample-based method.
When a data set has class-specific characteristics, selecting a single feature subset
for the entire data set may not characterize the data correctly, although the classifier
performance may be satisfactory. To address this, in Chapter 5, we have proposed
class-specific feature selection (CSFS) schemes using feature modulators embedded in
a fuzzy rule-based classifier. The parameters of the modulators are tuned by minimizing a loss function comprising classification error and a regularizer to make the modulators
completely select or reject features in a class-specific manner. Our method is free from
the hazards of most of the existing CSFS methods, which suffer due to the use of onevs-
all strategy. We have extended the CSFS scheme so that it can monitor class-specific
redundancy between selected features. We note here that data from a particular class
may have multiple clusters and different clusters may be effectively defined by different
subsets of features. To address this, finally, our CSFS framework is generalized to a
rule-specific feature selection framework.