Master's thesis

This was my Master's Research done at the University of Minnesota.

My research looked at equivalencies in probabilistic topic models. The most significant discovery was that methods of inference for one model are not necessarily possible for an equivalent model which is very structurally similar. In this case, approximate Bayesian inference could be applied to one model, but would not be tractable for an equivalent and structurally similar second model.

Approximate Bayesian Inference in Generative Topic Models

Synopsis:

A topic model in this paper is a generative model for documents: it specifies a simple probabilistic procedure by which documents can be generated. To make a new document, one chooses a distribution over topics. Then, for each word in that document, one chooses a topic at random according to this distribution, and draws a word from that topic. Standard statistical techniques can be used to invert this process, inferring the set of topics that were responsible for generating a collection of documents. Although it is outside the scope of this work, these same methods can also be applied to knowledge sources such as images, genetic data, video recognition, or analyzing social networks. These topic models work on the assumption that documents contain a mixture of topics. Within each of these topics, certain words are more prevalent than others.

This paper looks at two algorithms, Gamma-Poisson(GaP) and Latent Dirichlet Allocation (LDA), as well as a model known as Discrete Component Analysis (DCA) that attempts to create a general and equivalent form of both algorithms. The methods discussed in this paper are described in a graphical, as well as algorithmic, way to help showcase the similarities and differences in the algorithms.