07 November 2016
Notes on the Concrete distribution (continuous relaxations of discrete distributions) which was introduced by (Maddison et al., 2016) and (Jang et al., 2016). Notes are based on the former.
Given a random variable \(X: \Omega \to \mathcal X\) with distribution \(p_{\phi}(x)\) parameterized by \(\phi\); and a function \(f: \mathcal X \to \mathbb R\). The goal is to approximate \(\frac{\partial}{\partial \phi} \E[f(X)]\) using the reparameterization trick (instead of the REINFORCE trick) when \(X\) is discrete, i.e. when \(\mathcal X\) is countable.
The support of a \(K\)-dimensional Concrete distribution is the simplex \begin{align} \Delta^{K - 1} = \left\{x \in \mathbb R^K: x_k \in [0, 1], \sum_{k = 1}^K x_k = 1\right\}. \end{align} A sample \(X = (X_1, \dotsc, X_K)\) from the Concrete distribution—parameterized by location parameters \(\alpha = (\alpha_1, \dotsc, \alpha_K) \in (0, \infty)^K\), and temperature \(\lambda \in (0, \infty)\)—can be generated as follows:
For \(k = 1, \dotsc, K\):
The Concrete distribution is actually defined by its density (see equation (11) in (Maddison et al., 2016)). As \(\lambda\) is smaller, the approximation to the one-hot discrete random variable with unnormalized weights \(\alpha\) is more accurate in the sense of Proposition 1 in (Maddison et al., 2016).
Let \(Z = (Z_1, \dotsc, Z_K)\). Let \(Z_k, k = 1, \dotsc, K\) be distributed from \(\mathrm{Gumbel}(0, 1)\), and the \(k\)th output of the function \(g_{\phi}: \mathbb R^K \to \mathbb R^K\) be: \begin{align} \left[{g_{\phi}(Z)}\right]_k := \frac{\exp((\log \alpha_k + Z_k)/\lambda)}{\sum_{i = 1}^K \exp((\log \alpha_i + Z_i)/\lambda)}. \end{align}
Hence, \(g_{\phi}(Z)\) is a Concrete random variable and we can write:
\begin{align}
\E_{X \sim \mathrm{Concrete}}[f(X)] &= \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}[f(g_{\phi}(Z))].
\end{align}
Hence the derivatives of the objective will be:
\begin{align}
\frac{\partial}{\partial \phi} \E_{X \sim \mathrm{Concrete}}[f(X)]
&= \frac{\partial}{\partial \phi} \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}[f(g_{\phi}(Z))] \\
&= \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}\left[\frac{\partial}{\partial \phi} \left(f(g_{\phi}(Z))\right)\right].
\end{align}
This can be approximated by the Monte Carlo estimator:
\begin{align}
&\frac{1}{N} \sum_{n = 1}^N \frac{\partial}{\partial \phi} \left(f(g_{\phi}(Z^n))\right) \\
Z^n &:= (Z_1^n, \dotsc, Z_K^n) & n = 1, \dotsc, N \\
Z_k^n &\sim \mathrm{Gumbel}(0, 1) & k = 1, \dotsc, K, n = 1, \dotsc, N.
\end{align}
References
@article{maddison2016concrete, title = {The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables}, author = {Maddison, Chris J. and Mnih, Andriy and Teh, Yee Whye}, journal = {arXiv preprint arXiv:1611.00712}, year = {2016} }
@article{jang2016categorical, title = {Categorical Reparameterization with Gumbel-Softmax}, author = {Jang, Eric and Gu, Shixiang and Poole, Ben}, journal = {arXiv preprint arXiv:1611.01144}, year = {2016} }
[back]