25 February 2018
The goal is to reduce variance for gradient estimators for variational autoencoders, especially when there are discrete latent variables and a REINFORCE gradient estimator must usually be used.
Let \(p_\theta(z, x)\) be a generative network of latent variables \(z\) and observations \(x\). Let \(q_\phi(z \given x)\) be the inference network. Given a dataset \((x^{(n)})_{n = 1}^N\), we want to maximize \(\sum_{n = 1}^N \mathrm{ELBO}(\theta, \phi, x^{(n)})\) where: \begin{align} \mathrm{ELBO}(\theta, \phi, x) = \int q_\phi(z \given x) [\log p_\theta(z, x) - \log q_\phi(z \given x)] \,\mathrm dz. \label{eq:elbo} \end{align}
If \(z\) is not reparameterizable, we must use the REINFORCE gradient estimator to estimate gradients of \eqref{eq:elbo} with respect to \(\phi\): \begin{align} f_{\theta, \phi}(z, x) \nabla_\phi \log q_\phi(z \given x) + \nabla_\phi f_{\theta, \phi}(z, x), \label{eq:reinforce} \end{align} where \(z \sim q_\phi(\cdot \given x)\) and \(f_{\theta, \phi}(z, x) := \log p_\theta(z, x) - \log q_\phi(z \given x)\). Although this estimator is unbiased, it’s high variance due to the first term.
The NVIL gradient estimator is as follows: \begin{align} (f_{\theta, \phi}(z, x) - C_\rho(x)) \nabla_\phi \log q_\phi(z \given x) + \nabla_\phi f_{\theta, \phi}(z, x), \label{eq:nvil} \end{align} where \(C_\rho(x)\) is a control variate, independent of \(z\) and \(\phi\). \(C_\rho\) is a function parameterized by \(\rho\) which is obtained by minimizing \(\E_{q_\phi(z \given x)}[(f_{\theta, \phi}(z, x) - C_\rho(x))^2]\) using stochastic gradient descent concurrently with the ELBO maximization. The NVIL gradient estimator is also unbiased because the term \(C_\rho(x) \nabla_\phi \log q_\phi(z \given x)\) has zero expectation under \(q_\phi(\cdot \given x)\).
[back]