*07 November 2016*

This is a note about a Monte Carlo estimation method under various names: REINFORCE trick (Williams, 1992), score function estimator (Fu, 2006), likelihood-ratio estimator (Glynn, 1990).

Consider a random variable whose distribution is parameterized by ; and a function . The goal is to approximate .

Let be the density of with respect to the base measure . Using the identity , we get: \begin{align} \frac{\partial}{\partial \phi} \E[f(X)] &= \frac{\partial}{\partial \phi} \int_{\mathcal X} f(x) p_{\phi}(x) \,\mathrm dx \\ &= \int_{\mathcal X} f(x) \frac{\partial}{\partial \phi} p_{\phi}(x) \,\mathrm dx \\ &= \int_{\mathcal X} f(x) \frac{\partial}{\partial \phi} \log p_{\phi}(x) p_{\phi}(x) \,\mathrm dx \\ &= \E\left[f(x) \frac{\partial}{\partial \phi} \log p_{\phi}(x)\right]. \end{align}

Hence, can be approximated by a Monte Carlo estimator: \begin{align} \frac{1}{N} \sum_{n = 1}^N f(X^n) \frac{\partial}{\partial \phi} \log p_{\phi}(X^n) && X_n \sim p_{\phi}, n = 1, \dotsc, N. \end{align}

Thus, we only need to be differentiable with respect to . This estimator applicable to a wide range of distributions of but suffers from high variance (why?).

