# Conditional Variational Autoencoders

27 June 2018

This is a perspective on the conditional variational autoencoder.

## Variational Autoencoders

In a typical variational autoencoder (VAE), we have

• a generative model $p_\theta(z, x)$ on latent variables $z$ and data $x$, parameterized by $\theta$ and
• an inference network, parameterized by $\phi$, which mapping from data $x$ to a distribution that approximates the posterior $q_\phi(z \given x)$.

Goal: Given a true data distribution $p(x)$, we want to learn $(\theta, \phi)$ such that $p_\theta(x)$ approximates $p(x)$ and $q_\phi(z \given x)$ approximates $p_\theta(z \given x)$ for all $x$.

Let the evidence lower bound (ELBO) be defined as \begin{align} \mathrm{ELBO}(x, \theta, \phi) = \log p_\theta(x) - \KL{q_\phi(z \given x)}{p_\theta(z \given x)}. \end{align} Maximizing $\E_{p(x)}[\mathrm{ELBO}(x, \theta, \phi)]$ achieves our goal since it is equivalent to minimizing $\KL{p(x)}{p_\theta(x)} + \E_{p(x)}[\KL{q_\phi(z \given x)}{p_\theta(z \given x)}]$: \begin{align} \E_{p(x)}[\mathrm{ELBO}(x, \theta, \phi)] &= \E_{p(x)}[\log p_\theta(x) - \KL{q_\phi(z \given x)}{p_\theta(z \given x)}] \\ &= \E_{p(x)}[\log p_\theta(x) - \log p(x)] + \E_{p(x)}[\log p(x)] - \E_{p(x)}[\KL{q_\phi(z \given x)}{p_\theta(z \given x)}] \\ &= -\KL{p(x)}{p_\theta(x)} - \E_{p(x)}[\KL{q_\phi(z \given x)}{p_\theta(z \given x)}] + \E_{p(x)}[\log p(x)]. \end{align}

## Conditional Variational Autoencoders

In a conditional VAE, we have

• a conditional generative model $p_\theta(z, x \given c)$ on latent variables $z$, data $x$, conditioned on $c$ and parameterized by $\theta$ and
• a conditional inference network $q_\phi(z \given x, c)$, conditioned on $c$ and parameterized by $\phi$.

Goal: Given a true conditional data distribution $p(x \given c)$ for all $c$, we want to learn $(\theta, \phi)$ such that

• $p_\theta(x \given c)$ approximates $p(x \given c)$ for all $c$ and
• $q_\phi(z \given x, c)$ approximates $p_\theta(z \given x, c)$ for all $x, c$.

Let the conditional ELBO be defined as \begin{align} \mathrm{ELBO}(x, \theta, \phi \given c) = \log p_\theta(x \given c) - \KL{q_\phi(z \given x, c)}{p_\theta(z \given x, c)}. \end{align} Given a distribution $p(c)$ whose support contains is the set of all $c$, maximizing $\E_{p(x \given c) p(c)}[\mathrm{ELBO}(x, \theta, \phi \given c)]$ with respect to $(\theta, \phi)$ achieves our goal since it is equivalent to minimizing $\E_{p(c)}[\KL{p(x \given c)}{p_\theta(x \given c)}] + \E_{p(x \given c) p(c)}[\KL{q_\phi(z \given x, c)}{p_\theta(z \given x, c)}]$: \begin{align} \E_{p(x \given c) p(c)}[\mathrm{ELBO}(x, \theta, \phi \given c)] &= \E_{p(x \given c) p(c)}[\log p_\theta(x \given c) - \KL{q_\phi(z \given x, c)}{p_\theta(z \given x, c)}] \\ &= \E_{p(x \given c) p(c)}[\log p_\theta(x \given c) - \log p(x \given c)] + \E_{p(x \given c) p(c)}[\log p(x \given c)] - \E_{p(x \given c) p(c)}[\KL{q_\phi(z \given x, c)}{p_\theta(z \given x, c)}] \\ &= -\E_{p(c)}[\KL{p(x \given c)}{p_\theta(x \given c)}] - \E_{p(x \given c) p(c)}[\KL{q_\phi(z \given x, c)}{p_\theta(z \given x, c)}] + \E_{p(x \given c) p(c)}[\log p(x \given c)]. \end{align}

## Gaussian Unknown Mean Example

Let the conditional generative model be \begin{align} p_\theta(z \given c) &= \mathrm{Normal}(z \given \theta_1 + \theta_2 c, \sigma_0^2) \\ p_\theta(x \given z, c) &= \mathrm{Normal}(x \given z, \exp(\theta_3)), \end{align} where $\theta = (\theta_1, \theta_2, \theta_3)$ and the conditional inference network be \begin{align} q_\phi(z \given x, c) &= \mathrm{Normal}(z \given \phi_1 x + \phi_2 c + \phi_3, \exp(\phi_4)), \end{align} where $\phi = (\phi_1, \phi_2, \phi_3, \phi_4)$.

Let the true conditional data distribution $p(x \given c)$ be defined as a marginal distribution of $p(z \given c)p(x \given z, c)$ which is defined as: \begin{align} p(z \given c) &= \mathrm{Normal}(z \given \mu_0 + c, \sigma_0^2) \\ p(x \given z, c) &= \mathrm{Normal}(x \given z, \sigma^2). \end{align} The posterior can be analytically derived as \begin{align} p(z \given x, c) &= \mathrm{Normal}\left(z \given \frac{1/\sigma^2}{1/\sigma_0^2 + 1/\sigma^2} x + \frac{1/\sigma_0^2}{1/\sigma_0^2 + 1/\sigma^2} c + \frac{\mu_0/\sigma_0^2}{1/\sigma_0^2 + 1/\sigma^2}, \frac{1}{1/\sigma_0^2 + 1/\sigma^2}\right). \end{align}

Maximizing $\E_{p(x \given c) p(c)}[\mathrm{ELBO}(x, \theta, \phi \given c)]$ with respect to $(\theta, \phi)$ should yield: \begin{align} \theta_1^* &= \mu_0, \\ \theta_2^* &= 1, \\ \theta_3^* &= \log(\sigma^2), \\ \phi_1^* &= \frac{1/\sigma^2}{1/\sigma_0^2 + 1/\sigma^2}, \\ \phi_2^* &= \frac{1/\sigma_0^2}{1/\sigma_0^2 + 1/\sigma^2}, \\ \phi_3^* &= \frac{\mu_0/\sigma_0^2}{1/\sigma_0^2 + 1/\sigma^2}, \\ \phi_4^* &= \log\left(\frac{1}{1/\sigma_0^2 + 1/\sigma^2}\right).
\end{align}

[back]