# Neural Variational Inference and Learning in Belief Networks (NVIL)

25 February 2018

The goal is to reduce variance for gradient estimators for variational autoencoders, especially when there are discrete latent variables and a REINFORCE gradient estimator must usually be used.

Let $p_\theta(z, x)$ be a generative network of latent variables $z$ and observations $x$. Let $q_\phi(z \given x)$ be the inference network. Given a dataset $(x^{(n)})_{n = 1}^N$, we want to maximize $\sum_{n = 1}^N \mathrm{ELBO}(\theta, \phi, x^{(n)})$ where: \begin{align} \mathrm{ELBO}(\theta, \phi, x) = \int q_\phi(z \given x) [\log p_\theta(z, x) - \log q_\phi(z \given x)] \,\mathrm dz. \label{eq:elbo} \end{align}

If $z$ is not reparameterizable, we must use the REINFORCE gradient estimator to estimate gradients of \eqref{eq:elbo} with respect to $\phi$: \begin{align} f_{\theta, \phi}(z, x) \nabla_\phi \log q_\phi(z \given x) + \nabla_\phi f_{\theta, \phi}(z, x), \label{eq:reinforce} \end{align} where $z \sim q_\phi(\cdot \given x)$ and $f_{\theta, \phi}(z, x) := \log p_\theta(z, x) - \log q_\phi(z \given x)$. Although this estimator is unbiased, it’s high variance due to the first term.

The NVIL gradient estimator is as follows: \begin{align} (f_{\theta, \phi}(z, x) - C_\rho(x)) \nabla_\phi \log q_\phi(z \given x) + \nabla_\phi f_{\theta, \phi}(z, x), \label{eq:nvil} \end{align} where $C_\rho(x)$ is a control variate, independent of $z$ and $\phi$. $C_\rho$ is a function parameterized by $\rho$ which is obtained by minimizing $\E_{q_\phi(z \given x)}[(f_{\theta, \phi}(z, x) - C_\rho(x))^2]$ using stochastic gradient descent concurrently with the ELBO maximization. The NVIL gradient estimator is also unbiased because the term $C_\rho(x) \nabla_\phi \log q_\phi(z \given x)$ has zero expectation under $q_\phi(\cdot \given x)$.

[back]