# Reparameterization trick

05 September 2016

Consider a $(\Omega_X, \mathcal F_X)$-valued random variable $X$ and a $(\Omega_Y, \mathcal F_Y)$-valued random variable $Y$, both defined on a common probability space $(\Omega, \mathcal F, \mathbb P)$. Let $f: (\Omega_X, \mathcal F_X) \to (\mathbb R, \mathcal B(\mathbb R))$ be a measurable function (i.e. for all $B \in \mathcal B(\mathbb R)$, the pre-image $f^{-1}(B) \in \mathcal F_X$). Let $g: (\Omega_Y, \mathcal F_Y) \to (\Omega_X, \mathcal F_X)$ be a measurable function such that $X = g \circ Y$ (i.e. $X(\omega) = g(Y(\omega)), \forall \omega \in \Omega$). Hence we have, by definition, \begin{align} \E[f(X)] := \int_{\Omega} f(X(\omega)) \mathbb P(\mathrm d \omega) = \int_{\Omega} f(g(Y(\omega))) \mathbb P(\mathrm d \omega) =: \E[f(g(Y))]. \label{eq:reparam/exp} \end{align} Let $P_X := \mathbb P \circ X^{-1}, P_Y := \mathbb P \circ Y^{-1}$ be the probability distributions of $X, Y$. We have two Monte Carlo estimators of the same quantity in \eqref{eq:reparam/exp}: \begin{align} \E[f(X)] &\approx I_X^{MC} := \frac{1}{N} \sum_{i = 1}^N f(X^i), && X^i \sim P_X, i = 1, \dotsc, N \\ \E[f(g(Y))] &\approx I_Y^{MC} := \frac{1}{N} \sum_{i = 1}^N f \circ g (Y^i), && Y^i \sim P_Y, i = 1, \dotsc, N. \end{align}

## Why is it useful

Let the distribution $P_{X, \theta}$ be parameterized by $\theta$. We can’t evaluate $\frac{\partial I_X^{MC}}{\partial \theta}$. However, if $P_Y$ is not parameterized by $\theta$ and $g_{\theta}$ is parameterized by $\theta$ such that $X = g_{\theta} \circ Y$ then we can find $\frac{\partial I_Y^{MC}}{\partial \theta}$.

This is often used in variational autoencoders (missing reference) discussed here.

References

[back]