Abstract:
Gaussian Processes are used in supervised learning. They have been in the world of
machine learning for quite some time, dealing with complex data sets where parametric
methods fail. While calculating the gaussian distribution function for a large feature
vector, we need a matrix inversion algorithm which has high run time complexity O(n3)
and space complexity O(n2). To increase its performance, subset sampling is an im-
portant technique used, one method was described in the paper Fast Gaussian Process
Regression for Big Data by Sourish Das, Sasanka Roy, Rajiv Sambasivan. It described
an algorithm involving combined estimates from models developed using subsets sampled
uniformly, much similar to bootstrap sampling. But as a drawback it has been found
that the method doesn't work well for all kinds of data. The results developed were based
on synthetic data only. In our work we shall provide a di erent sampling technique.
We put weights on the points and sample accordingly. This is thought to be a better
approach if the weights are chosen wisely. Empirical results to establish our idea have
been provided.