importance weighted autoencoders

19 February 2017

notes on (Burda et al., 2016).

summary

understanding: 8/10
code: https://github.com/yburda/iwae

importance weighted autoencoders (iwae) improve variational autoencoders. the main difference is in the loss function. iwae uses: \begin{align} \mathcal L_K = \E_{q_{\phi}(x \given y)}\left[\log \frac{1}{K} \sum_{k = 1}^K w_k\right] \end{align} where \(w_k = p_{\theta}(x_k, y) / q_{\phi}(x_k \given y)\) and \(x_k \sim q_{\phi}(x \given y)\). vae uses the same, but always with \(K = 1\). both objectives are lower bounds of \(p_{\theta}(y)\).

iwae is better because:

the lower bound is better as \(K\) is larger (and converges to \(p_{\theta}(y)\) as \(K \to \infty\)).
iwae possibly uses the neural network modelling capacity better (more active units).
experimentally better estimate of \(p_{\theta}(y)\) (obtained by importance sampling with 5000 particles) than vae.

references

Burda, Y., Grosse, R., & Salakhutdinov, R. (2016). Importance Weighted Autoencoders. International Conference on Learning Representations (ICLR).

@inproceedings{burda2016importance,
  title = {Importance Weighted Autoencoders},
  author = {Burda, Yuri and Grosse, Roger and Salakhutdinov, Ruslan},
  year = {2016},
  booktitle = {International Conference on Learning Representations (ICLR)}
}

[back]

Tuan Anh Le

importance weighted autoencoders

summary

references