16 January 2017
notes on (Hoffman et al., 2013).
understanding: 5/10
code: https://github.com/ajbc/lda-svi
this paper talks about these things that were interesting to me:
\(q(x, y, z)\) is modelled as \(q(x, y, z) = q(x) q(y) q(z)\).
there was a nice explanation of natural gradients. basically, premultiply gradients with fisher information matrix. the purpose is to represent distances between distributions better. for example: Normal(0, 10000) and Normal(10, 10000) are similar but 10 apart in euclidean distance. Normal(0, 0.01) and Normal(0.1, 0.01) are very different but 0.1 apart in euclidean distance. the distance used is the symmetrized kl divergence.
the elbo was analytically done because we were in the exponential-family land. the sgd part was regarding using subsamples of the dataset.
@article{hoffman2013stochastic, title = {Stochastic variational inference.}, author = {Hoffman, Matthew D and Blei, David M and Wang, Chong and Paisley, John William}, journal = {Journal of Machine Learning Research}, volume = {14}, number = {1}, pages = {1303--1347}, year = {2013} }
[back]