On Testing of Samplers and Related Problems

Chakraborty, Shayak

On Testing of Samplers and Related Problems

Chakraborty, Shayak

Date: 2019-07

Abstract:

The last decade has seen an unprecedented adoption of Arti cial Intelligence and Machine Learning Algorithms in various engineering applications such as medical imaging, autonomous vehicles, web services, etc. Probabilistic Reasoning lies at the heart of such algorithms. Probabilistic reasoning techniques rely highly on sampling techniques and thus samplers that are able to generate quality samples from a discrete distribution uniformly at random, are in heavy demand. This has created a need for e cient samplers that come with some correctness and performance guarantees, and indeed, a number of proposals to this e ect have been developed in literature. This has also created a need to nd an e cient and scalable way for testing the reliability of a sampler with respect to the quality of the samples it generates. Unlike in the eld of program testing, in which nding a single trace of execution is su cient to prove the existence of a bug, the number of samples needed for sampler testing is not one, in fact, most of the testing mechanisms existing in literature, use exponential or sub-exponential number of samples to test for uniformity of a given sampler. The objective of this work is to propose techniques that can cut down on this complexity for certain classes of samplers. To overcome the high sample complexity for property testing of distributions, a framework for conditional sampling was proposed for checking properties of distributions, with polynomial complexity in the number of dimensions of the sample space. A recent framework named Barbarik that works on the guidelines of conditional sampling has suggested a way to test uniformity of samplers sampling from Boolean domains, using a constant number of samples. Barbarik has been proposed in the context of uniformity testing of samplers which claim to provide uniform witnesses for arbitrary Boolean formulae. The objective of this thesis is to examine ways to develop similar algorithms for testing other kind of samplers. In the rst result presented in this thesis we propose a framework to harness Barbarik to test uniformity of state-of-art samplers which are used in the context of Horn formulae. Horn formulae is a subclass of general Boolean formulae and have been quite popular in a variety of domains, due to their simple yet powerful expressiveness, while being able to entail polynomial procedures for solving. In this thesis, we propose an e cient procedure to test samplers for Horn clause, and we demonstrate the e cacy of our testing framework with experiments on large scale benchmarks. In the second part of this thesis, using a di erent form of the conditional sampling model, we investigate the problem of testing samplers in the context of uniform Perfect Matching in a Graph. The problem of nding uniform perfect matching in a graph is of importance in computer science and has a large volume of research associated with it. Modern complex networks often need to nd a uniform perfect matching. The problem of nding a uniform perfect matching nds it's root in physical sciences like statistical mechanics and chemistry. This has inspired research on methods for testing whether the algorithm in use generates a perfect matching uniformly at random. Our contribution in this thesis is a randomized algorithm using the framework of sub-cube conditional sampling to test samplers for uniform perfect matching.