stochastic variational inference

16 January 2017

notes on (Hoffman et al., 2013).

summary

understanding: 5/10
code: https://github.com/ajbc/lda-svi

this paper talks about these things that were interesting to me:

mean-field variational inference
natural gradients
using stochastic gradient descent in variational inference

mean-field variational inference

\(q(x, y, z)\) is modelled as \(q(x, y, z) = q(x) q(y) q(z)\).

natural gradients

there was a nice explanation of natural gradients. basically, premultiply gradients with fisher information matrix. the purpose is to represent distances between distributions better. for example: Normal(0, 10000) and Normal(10, 10000) are similar but 10 apart in euclidean distance. Normal(0, 0.01) and Normal(0.1, 0.01) are very different but 0.1 apart in euclidean distance. the distance used is the symmetrized kl divergence.

sgd in vi

the elbo was analytically done because we were in the exponential-family land. the sgd part was regarding using subsamples of the dataset.

references

Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. W. (2013). Stochastic variational inference. Journal of Machine Learning Research, 14(1), 1303–1347.

@article{hoffman2013stochastic,
  title = {Stochastic variational inference.},
  author = {Hoffman, Matthew D and Blei, David M and Wang, Chong and Paisley, John William},
  journal = {Journal of Machine Learning Research},
  volume = {14},
  number = {1},
  pages = {1303--1347},
  year = {2013}
}