Abstract:
The last decade has seen an unprecedented adoption of Arti cial Intelligence
and Machine Learning Algorithms in various engineering applications such as
medical imaging, autonomous vehicles, web services, etc. Probabilistic Reasoning
lies at the heart of such algorithms. Probabilistic reasoning techniques
rely highly on sampling techniques and thus samplers that are able to generate
quality samples from a discrete distribution uniformly at random, are
in heavy demand. This has created a need for e cient samplers that come
with some correctness and performance guarantees, and indeed, a number of
proposals to this e ect have been developed in literature. This has also created
a need to nd an e cient and scalable way for testing the reliability of
a sampler with respect to the quality of the samples it generates. Unlike in
the eld of program testing, in which nding a single trace of execution is
su cient to prove the existence of a bug, the number of samples needed for
sampler testing is not one, in fact, most of the testing mechanisms existing in
literature, use exponential or sub-exponential number of samples to test for
uniformity of a given sampler. The objective of this work is to propose techniques
that can cut down on this complexity for certain classes of samplers.
To overcome the high sample complexity for property testing of distributions,
a framework for conditional sampling was proposed for checking properties
of distributions, with polynomial complexity in the number of dimensions of
the sample space. A recent framework named Barbarik that works on the
guidelines of conditional sampling has suggested a way to test uniformity of
samplers sampling from Boolean domains, using a constant number of samples.
Barbarik has been proposed in the context of uniformity testing of
samplers which claim to provide uniform witnesses for arbitrary Boolean formulae.
The objective of this thesis is to examine ways to develop similar
algorithms for testing other kind of samplers.
In the rst result presented in this thesis we propose a framework to harness
Barbarik to test uniformity of state-of-art samplers which are used in the
context of Horn formulae. Horn formulae is a subclass of general Boolean formulae
and have been quite popular in a variety of domains, due to their simple
yet powerful expressiveness, while being able to entail polynomial procedures
for solving. In this thesis, we propose an e cient procedure to test samplers
for Horn clause, and we demonstrate the e cacy of our testing framework
with experiments on large scale benchmarks.
In the second part of this thesis, using a di erent form of the conditional
sampling model, we investigate the problem of testing samplers in the context
of uniform Perfect Matching in a Graph. The problem of nding uniform
perfect matching in a graph is of importance in computer science and has a
large volume of research associated with it. Modern complex networks often
need to nd a uniform perfect matching. The problem of nding a uniform
perfect matching nds it's root in physical sciences like statistical mechanics
and chemistry. This has inspired research on methods for testing whether
the algorithm in use generates a perfect matching uniformly at random. Our
contribution in this thesis is a randomized algorithm using the framework of
sub-cube conditional sampling to test samplers for uniform perfect matching.