*25 February 2018*

The goal is to reduce variance for gradient estimators for variational autoencoders, especially when there are discrete latent variables and a REINFORCE gradient estimator must usually be used.

Let be a generative network of latent variables and observations . Let be the inference network. Given a dataset , we want to maximize where: \begin{align} \mathrm{ELBO}(\theta, \phi, x) = \int q_\phi(z \given x) [\log p_\theta(z, x) - \log q_\phi(z \given x)] \,\mathrm dz. \label{eq:elbo} \end{align}

If is not reparameterizable, we must use the REINFORCE gradient estimator to estimate gradients of \eqref{eq:elbo} with respect to : \begin{align} f_{\theta, \phi}(z, x) \nabla_\phi \log q_\phi(z \given x) + \nabla_\phi f_{\theta, \phi}(z, x), \label{eq:reinforce} \end{align} where and . Although this estimator is unbiased, itâ€™s high variance due to the first term.

The NVIL gradient estimator is as follows: \begin{align} (f_{\theta, \phi}(z, x) - C_\rho(x)) \nabla_\phi \log q_\phi(z \given x) + \nabla_\phi f_{\theta, \phi}(z, x), \label{eq:nvil} \end{align} where is a control variate, independent of and . is a function parameterized by which is obtained by minimizing using stochastic gradient descent concurrently with the ELBO maximization. The NVIL gradient estimator is also unbiased because the term has zero expectation under .

[back]