07 November 2016
This is a note about a Monte Carlo estimation method under various names: REINFORCE trick (Williams, 1992), score function estimator (Fu, 2006), likelihood-ratio estimator (Glynn, 1990).
Consider a random variable \(X: \Omega \to \mathcal X\) whose distribution is parameterized by \(\phi\); and a function \(f: \mathcal X \to \mathbb R\). The goal is to approximate \(\frac{\partial}{\partial \phi} \E[f(X)]\).
Let \(p_{\phi}(x)\) be the density of \(X\) with respect to the base measure \(\mathrm dx\). Using the identity \(\frac{\partial}{\partial \phi} p_{\phi}(x) = p_{\phi}(x) \frac{\partial}{\partial \phi} \log p_{\phi}(x)\), we get:
\begin{align}
\frac{\partial}{\partial \phi} \E[f(X)] &= \frac{\partial}{\partial \phi} \int_{\mathcal X} f(x) p_{\phi}(x) \,\mathrm dx \\
&= \int_{\mathcal X} f(x) \frac{\partial}{\partial \phi} p_{\phi}(x) \,\mathrm dx \\
&= \int_{\mathcal X} f(x) \frac{\partial}{\partial \phi} \log p_{\phi}(x) p_{\phi}(x) \,\mathrm dx \\
&= \E\left[f(x) \frac{\partial}{\partial \phi} \log p_{\phi}(x)\right].
\end{align}
Hence, \(\frac{\partial}{\partial \phi} \E[f(X)]\) can be approximated by a Monte Carlo estimator: \begin{align} \frac{1}{N} \sum_{n = 1}^N f(X^n) \frac{\partial}{\partial \phi} \log p_{\phi}(X^n) && X_n \sim p_{\phi}, n = 1, \dotsc, N. \end{align}
Thus, we only need \(\log p_{\phi}(x)\) to be differentiable with respect to \(\phi\). This estimator applicable to a wide range of distributions of \(X\) but suffers from high variance (why?).
References
@article{williams1992simple, title = {Simple statistical gradient-following algorithms for connectionist reinforcement learning}, author = {Williams, Ronald J}, journal = {Machine learning}, volume = {8}, number = {3-4}, pages = {229--256}, year = {1992}, publisher = {Springer} }
@article{fu2006gradient, title = {Gradient estimation}, author = {Fu, Michael C}, journal = {Handbooks in operations research and management science}, volume = {13}, pages = {575--616}, year = {2006}, publisher = {Elsevier} }
@article{glynn1990likelihood, title = {Likelihood ratio gradient estimation for stochastic systems}, author = {Glynn, Peter W}, journal = {Communications of the ACM}, volume = {33}, number = {10}, pages = {75--84}, year = {1990}, publisher = {ACM} }
[back]