Unbiasedness of the Sequential Monte Carlo Based Normalizing Constant Estimator

05 April 2017

The following is from an email exchange with Prof. Arnaud Doucet. A lot of stuff is taken from the particle Markov chain Monte Carlo paper (Andrieu et al., 2010).

Notation

Consider

a sequence of target densities \(\pi_t(x_{1:t}), t = 1, \dotsc, T\),
a sequence of unnormalized target densities \(\gamma_t(x_{1:t}), t = 1, \dotsc, T\),
a sequence of normalizing constants \(Z_t = \int \gamma_t(x_{1:t}) \,\mathrm dx_{1:t} = \gamma_t(x_{1:t}) / \pi_t(x_{1:t}), t = 1, \dotsc, T\),
initial proposal density \(q_1(x_1)\), and
proposal densities \(q_t(x_t \given x_{1:t - 1}), t = 2, \dotsc, T\).

We will usually denote random variables with capital letters and their realizations with the corresponding small letters.

SMC algorithm

Let \(K\) be the number of particles. The sequential Monte Carlo (SMC) algorithm runs as follows:

for \(t = 1\):
- sample \begin{align} X_1^k \sim q_1(\cdot), \end{align}
- compute weights \begin{align} W_1^k = \frac{\gamma_1(X_1^k)}{q_1(X_1^k)}, \end{align}
- normalize weights \begin{align} \bar W_1^k = \frac{W_1^k}{\sum_{\ell = 1}^K W_1^\ell}, \end{align}
for \(t = 2, \dotsc, T\):
- sample \begin{align} A_{t - 1}^k \sim \mathrm{Discrete}(W_{t - 1}^1, \dotsc, W_{t - 1}^K), \end{align}
- sample \begin{align} X_t^k \sim q_t\left(\cdot \given X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right), \label{eq:proposal-t} \end{align}
- compute weights \begin{align} W_t^k = \frac{\gamma_t\left(X_t^k, X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right)}{\gamma_{t - 1}\left(X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right) q_t\left(X_t^k \given X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right)}, \label{eq:weights-t} \end{align}
- normalize weights \begin{align} \bar W_t^k = \frac{W_t^k}{\sum_{\ell = 1}^K W_t^\ell}. \end{align}

Normalizing Constant Estimator

The estimator of the normalizing constant at the last time step, \(Z_T\) is \begin{align} \hat Z_T = \prod_{t = 1}^T \frac{1}{K} \sum_{k = 1}^K W_t^k. \label{eq:evidence-estimator} \end{align}

We want to show that \(\hat Z_T\) is an unbiased estimator of \(Z_T\), i.e. \(\E[\hat Z_T] = Z_T\).

Sampling distribution of SMC

Define the density of the sampling distribution of SMC over the random variables \((X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})\) as follows: \begin{align} Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \left( \prod_{k = 1}^K q_1(x_1^k) \right) \left( \prod_{t = 2}^T \prod_{k = 1}^K q_t\left(x_t^k \given x_{t - 1}^{a_{t - 1}^k}, x_{t - 2}^{a_{t - 2}^{a_{t - 1}^k}}, \dotsc, x_1^{a_1^{\rddots{a_{t - 1}^k}}}\right) \right) \left( \prod_{t = 1}^{T - 1} \prod_{k = 1}^K \bar w_t^{a_t^k} \right). \label{eq:Q} \end{align}

Note that \(Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) integrates to one over \((x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\): \begin{align} \int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K} = 1. \end{align}

Extension of the Sampling Distribution of SMC

Let \(Q^+\) be an extension of \(Q\), over the random variables \((M, X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})\) where \(M\) is a discrete random variable with the range \(\{1, \dotsc, K\}\). Define the density of this extension to be: \begin{align} Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}). \label{eq:Q-plus} \end{align}

Note that if we integrate \(Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) with respect to \(m\), we obtain \(Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) because \(\bar w_T^m\) is normalized: \begin{align} \int Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dm &= \int \bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dm \\
&= Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \int \bar w_T^m \,\mathrm dm \\
&= Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}). \label{eq:Q-plus-marginal} \end{align}

Extension of the Target Distribution

We will introduce a distribution \(P^+\).

Particle Lineage

First, let us define a particle lineage. Let \(M\) be a random variable with the range \(\{1, \dotsc, K\}\). Define

\(B_T^M = M\),
\(B_t^M = A_t^{B_{t + 1}^M}\) for \(t = T - 1, \dotsc, 1\).

This makes it easier to talk about the particle lineage \(\left(X_T^M, X_{T - 1}^{A_{T - 1}^M}, X_{T - 2}^{A_{T - 2}^{A_{T - 1}^M}}, \dotsc, X_1^{A_1^{\rddots{A_{T - 1}^M}}}\right)\) which we can refer to as \(\left(X_T^{B_T^M}, X_{T - 1}^{B_{T - 1}^M}, X_{T - 2}^{B_{T - 2}^M}, \dotsc, X_1^{B_1^M}\right)\).

Note that this only allows us to easily talk about particle lineages ending at \(t = T\). To talk about particle lineages ending at \(t < T\) as has been done in equations \eqref{eq:proposal-t}, \eqref{eq:weights-t}, and \eqref{eq:Q}, we would have to do something else (?).

\(P^+\)

Let \(P^+\) be an extension of the target distribution \(\pi_T\) in the following sense. Let \(P^+\) be a distribution over random variables \((M, X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})\), i.e. over the same as \(Q^+\). Let the density of \(P^+\) be: \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \frac{\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{K^T} \frac{Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)}. \label{eq:P-plus} \end{align}

Note that

\({\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})} / {K^T}\) is a density over the random variables \((X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_T^M)\) which is the same as \((X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_{T - 1}^M, M)\) since \(B_T^M = M\),
\(Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) is a density over \((X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})\),
\(q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)\) is a density over \((X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_{T - 1}^M)\).

Since we have that \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \frac{d_1(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}, b_1^m, \dotsc, b_{T - 1}^m, m) d_2(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{d_3(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}, b_1^m, \dotsc, b_{T - 1}^m)}, \end{align} where \(d_1, d_2, d_3\) are normalized densities with respect to their arguments, we can show that \(P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) is a normalized density.

Ratio \(P^+ / Q^+\)

We will show that \begin{align} \frac{P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &= \frac{\hat z_T}{Z_T}. \label{eq:p-div-q} \end{align}

First, we need some identities \begin{align} \hat z_T &= \frac{1}{K^T} \prod_{t = 1}^T \sum_{k = 1}^K w_t^k &\text{(from \eqref{eq:evidence-estimator})} \\
\prod_{t = 1}^T \bar w_t^k &= \frac{\prod_{t = 1}^T w_t^k}{\prod_{t = 1}^T \sum_{\ell = 1}^K w_t^\ell} \\
&= \frac{\prod_{t = 1}^T w_t^k}{\hat z_T K^T} \label{eq:product-of-normalized-weights} \\
\prod_{t = 1}^T w_t^{b_t^m} &= w_1^{b_1^m} \cdot w_2^{b_2^m} \cdot \cdots \cdot w_T^{b_T^m} \\
&= \frac{\gamma_1(x_1^{b_1^m})}{q_1(x_1^{b_1^m})} \cdot \frac{\gamma_2(x_1^{b_1^m}, x_2^{b_2^m})}{\gamma_1(x_1^{b_1^m}) q_2(x_2^{b_2^m} \given x_1^{b_1^m})} \cdot \cdots \cdot \frac{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{\gamma_{T - 1}(x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m}) q_T(x_T^{b_T^m} \given x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m})} \\
&= \frac{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{q_1(x_1^{b_1^m}) q_2(x_2^{b_2^m} \given x_1^{b_1^m}) \cdots q_T(x_T^{b_T^m} \given x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m})}. \label{eq:product-of-unnormalized-weights} \end{align}

Now, we are ready: \begin{align} \frac{P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &= \frac{\frac{\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{K^T} \frac{Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)}}{\bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &\text{(subst. \eqref{eq:Q-plus} and \eqref{eq:P-plus})} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right) \bar w_T^m} &\text{(cancel } Q \text{)} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T \bar w_t^{b_t^m} \right)} &(b_T^m = m) \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T w_t^{b_t^m} \right) / \left( \hat z_T K^T \right)} &\text{(subst. \eqref{eq:product-of-normalized-weights})} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) \hat z_T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T w_t^{b_t^m} \right)} &\text{(cancel } K^T \text{)} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) \hat z_T }{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})} &\text{(subst. \eqref{eq:product-of-unnormalized-weights})} \\
&= \frac{\hat z_T}{Z_T} &\text{(by definition of } \gamma_T / \pi_T \text{)}. \end{align}

Unbiasedness of \(\hat Z_T\)

Again, we want to prove \(\E[\hat Z_T] = Z_T\). start from equation \eqref{eq:p-div-q}: \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) Z_T &= Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T. \end{align}

Integrate both sides over \((m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\):

since \(P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) is a normalized density and \(Z_T\) is a constant, the left hand side integrates to \(Z_T\).
since \(\hat z_T\) is independent of \(m\) and from equation \eqref{eq:Q-plus-marginal}, the right hand side integrates to \(\int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K}\).

Hence, we are left with \begin{align} Z_T &= \int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K} \\
&= \E[\hat Z]. \end{align}

References

Andrieu, C., Doucet, A., & Holenstein, R. (2010). Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3), 269–342.

@article{andrieu2010particle,
  title = {Particle markov chain monte carlo methods},
  author = {Andrieu, Christophe and Doucet, Arnaud and Holenstein, Roman},
  journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)},
  volume = {72},
  number = {3},
  pages = {269--342},
  year = {2010},
  publisher = {Wiley Online Library}
}

[back]

Tuan Anh Le