Tuan Anh Le

Importance Sampling

14 September 2017

Consider a probability distribution on with the density . (We will interchangeably denote the probability distribution and its density by the same letter). Given a function , our goal is to estimate \begin{align} I := \E[f(X)] := \int_{\mathcal X} f(x) p(x) \,\mathrm dx. \end{align}

Monte Carlo Estimator

We can estimate this using independently, identically distributed (iid) distributed samples from as follows: \begin{align} X_n &\sim p \\ I_N^{\text{MC}} &= \frac{1}{N} \sum_{n = 1}^N f(X_n). \end{align}

The expectation of is equal to : \begin{align} \E\left[I_N^{\text{MC}}\right] &= \E\left[\frac{1}{N} \sum_{n = 1}^N f(X_n)\right] \\ &= \frac{1}{N} \sum_{n = 1}^N \E\left[f(X_n)\right] \\ &= \frac{1}{N} \sum_{n = 1}^N I \\ &= I. \end{align}

The variance is: \begin{align} \Var\left[I_N^{\text{MC}}\right] &= \E\left[{I_N^{\text{MC}}}^2\right] - \E\left[I_N^{\text{MC}}\right]^2 \\ &= \E\left[\left(\frac{1}{N} \sum_{n = 1}^N f(X_n)\right)^2\right] - I^2 \\ &= \frac{1}{N^2} \E\left[\sum_{n = 1}^N f(X_n)^2 + \sum_{k, \ell, k \neq \ell} f(X_k) f(X_\ell)\right] - I^2 \\ &= \frac{1}{N^2} \left(\sum_{n = 1}^N \E\left[f(X_n)^2\right] + \sum_{k, \ell, k \neq \ell} \E\left[f(X_k) f(X_\ell)\right]\right) - I^2 \\ &= \frac{1}{N^2} \left(N \E\left[f(X)^2\right] + (N^2 - N) \E\left[f(X)\right]^2 \right) - I^2 \\ &= \frac{\E\left[f(X)^2\right] - \E\left[f(X)\right]^2}{N} \\ &= \frac{\Var[f(X)]}{N}. \end{align}

Importance Sampling

Instead of drawing samples from , assume that we can

The importance sampling estimator is as follows: \begin{align} X_n &\sim q \\ W_n &= \frac{p(X_n)}{q(X_n)} \\ I_N^{\text{IS}} &= \frac{1}{N} \sum_{n = 1}^N W_n f(X_n). \end{align}

The expectation of is equal to : \begin{align} \E\left[I_N^{\text{IS}}\right] &= \E\left[\frac{1}{N} \sum_{n = 1}^N W_n f(X_n)\right] \\ &= \frac{1}{N} \sum_{n = 1}^N \E\left[W_n f(X_n)\right] \\ &= \frac{1}{N} \sum_{n = 1}^N \E\left[\frac{p(X_n)}{q(X_n)} f(X_n)\right] \\ &= \frac{1}{N} \sum_{n = 1}^N \int_{\mathcal X} \frac{p(x_n)}{q(x_n)} f(x_n) q(x_n) \,\mathrm dx_n \\ &= \frac{1}{N} \sum_{n = 1}^N \int_{\mathcal X} p(x) f(x) \,\mathrm dx \\ &= I. \end{align}

The variance of is: \begin{align} \Var\left[I_N^{\text{IS}}\right] &= \E\left[{I_N^{\text{IS}}}^2\right] - I^2 \\ &= \E\left[\left(\frac{1}{N} \sum_{n = 1}^N W_n f(X_n)\right)^2\right] - I^2 \\ &= \frac{1}{N^2} \E\left[\sum_{n = 1}^N \left(W_n f(X_n)\right)^2 + \sum_{k, \ell, k \neq \ell} \left(W_k f(X_k)\right)\left(W_\ell f(X_\ell)\right)\right] - I^2 \\ &= \frac{1}{N^2} \sum_{n = 1}^N \E\left[\left(W_n f(X_n)\right)^2\right] + \frac{1}{N^2} \sum_{k, \ell, k \neq \ell} \E\left[\left(W_k f(X_k)\right)\left(W_\ell f(X_\ell)\right)\right] - I^2 \\ &= \frac{1}{N} \left(\int_{\mathcal X} \left(\frac{p(x)}{q(x)} f(x)\right)^2 q(x) \,\mathrm dx - I^2 \right) \\ &= \frac{1}{N} \left(\int_{\mathcal X} \frac{p(x)^2 f(x)^2}{q(x)} \,\mathrm dx - I^2 \right) \end{align}

The “optimal” proposal has the density \begin{align} q^{\text{opt}}(x) &= \frac{|f(x)| p(x)}{\int_{\mathcal X} |f(x’)| p(x’) \,\mathrm dx’}. \end{align} This proposal is optimal in the sense that estimator’s variance is lowest possible. The variance of an estimator using an optimal proposal, is not greater than the variance of an estimator using an arbitrary proposal , : \begin{align} \Var\left[I_N^{\text{IS, opt}}\right] &= \frac{1}{N} \left(\int_{\mathcal X} \frac{p(x)^2 f(x)^2}{\frac{|f(x)| p(x)}{\int_{\mathcal X} |f(x’)| p(x’) \,\mathrm dx’}} \,\mathrm dx - I^2 \right) \\ &= \frac{1}{N} \left(\int_{\mathcal X} |f(x)| p(x) \,\mathrm dx \cdot \int_{\mathcal X} \frac{p(x)^2 f(x)^2}{|f(x)| p(x)} \,\mathrm dx - I^2 \right) \\ &= \frac{1}{N} \left(\int_{\mathcal X} |f(x)| p(x) \,\mathrm dx \cdot \int_{\mathcal X} |f(x)| p(x) \,\mathrm dx - I^2 \right) \\ &= \frac{1}{N} \left(\left(\int_{\mathcal X} \frac{|f(x)| p(x)}{q(x)} \cdot 1 \cdot q(x) \,\mathrm dx\right)^2 - I^2 \right) \\ &\leq \frac{1}{N} \left(\left(\int_{\mathcal X} \left(\frac{|f(x)| p(x)}{q(x)}\right)^2 \cdot q(x) \,\mathrm dx\right) \cdot \left(\int_{\mathcal X} 1^2 \cdot q(x) \,\mathrm dx\right) - I^2 \right) & \text{(Cauchy-Schwarz ineq.)}\\ &= \frac{1}{N} \left(\int_{\mathcal X} \frac{f(x)^2 p(x)^2}{q(x)} \,\mathrm dx - I^2 \right) \\ &= \Var\left[I_N^{\text{IS}}\right]. \end{align}

Self-Normalized Importance Sampling

Now, assume that we can

The importance sampling estimator is as follows: \begin{align} X_n &\sim q \\ \tilde W_n &= \frac{\tilde p(X_n)}{q(X_n)} \\ \tilde I_N^{\text{IS}} &= \frac{\sum_{n = 1}^N \tilde W_n f(X_n)}{\sum_{n = 1}^N \tilde W_n}. \end{align} This estimator is consistent (why?).

The “optimal” proposal is something else than in the non self-normalized importance sampler (Tom references Tim Hestenberg’s thesis): \begin{align} \tilde q^{\text{opt}}(x) &= \frac{p(x)|f(x) - I|}{\int_{\mathcal X} p(x) |f(x) - I| \,\mathrm dx}. \end{align} (why?).

[back]