# Unbiasedness of the Sequential Monte Carlo Based Normalizing Constant Estimator

05 April 2017

The following is from an email exchange with Prof. Arnaud Doucet. A lot of stuff is taken from the particle Markov chain Monte Carlo paper (Andrieu et al., 2010).

## Notation

Consider

• a sequence of target densities $$\pi_t(x_{1:t}), t = 1, \dotsc, T$$,
• a sequence of unnormalized target densities $$\gamma_t(x_{1:t}), t = 1, \dotsc, T$$,
• a sequence of normalizing constants $$Z_t = \int \gamma_t(x_{1:t}) \,\mathrm dx_{1:t} = \gamma_t(x_{1:t}) / \pi_t(x_{1:t}), t = 1, \dotsc, T$$,
• initial proposal density $$q_1(x_1)$$, and
• proposal densities $$q_t(x_t \given x_{1:t - 1}), t = 2, \dotsc, T$$.

We will usually denote random variables with capital letters and their realizations with the corresponding small letters.

## SMC algorithm

Let $$K$$ be the number of particles. The sequential Monte Carlo (SMC) algorithm runs as follows:

• for $$t = 1$$:
• sample \begin{align} X_1^k \sim q_1(\cdot), \end{align}
• compute weights \begin{align} W_1^k = \frac{\gamma_1(X_1^k)}{q_1(X_1^k)}, \end{align}
• normalize weights \begin{align} \bar W_1^k = \frac{W_1^k}{\sum_{\ell = 1}^K W_1^\ell}, \end{align}
• for $$t = 2, \dotsc, T$$:
• sample \begin{align} A_{t - 1}^k \sim \mathrm{Discrete}(W_{t - 1}^1, \dotsc, W_{t - 1}^K), \end{align}
• sample \begin{align} X_t^k \sim q_t\left(\cdot \given X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right), \label{eq:proposal-t} \end{align}
• compute weights \begin{align} W_t^k = \frac{\gamma_t\left(X_t^k, X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right)}{\gamma_{t - 1}\left(X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right) q_t\left(X_t^k \given X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right)}, \label{eq:weights-t} \end{align}
• normalize weights \begin{align} \bar W_t^k = \frac{W_t^k}{\sum_{\ell = 1}^K W_t^\ell}. \end{align}

## Normalizing Constant Estimator

The estimator of the normalizing constant at the last time step, $$Z_T$$ is \begin{align} \hat Z_T = \prod_{t = 1}^T \frac{1}{K} \sum_{k = 1}^K W_t^k. \label{eq:evidence-estimator} \end{align}

We want to show that $$\hat Z_T$$ is an unbiased estimator of $$Z_T$$, i.e. $$\E[\hat Z_T] = Z_T$$.

## Sampling distribution of SMC

Define the density of the sampling distribution of SMC over the random variables $$(X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})$$ as follows: \begin{align} Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \left( \prod_{k = 1}^K q_1(x_1^k) \right) \left( \prod_{t = 2}^T \prod_{k = 1}^K q_t\left(x_t^k \given x_{t - 1}^{a_{t - 1}^k}, x_{t - 2}^{a_{t - 2}^{a_{t - 1}^k}}, \dotsc, x_1^{a_1^{\rddots{a_{t - 1}^k}}}\right) \right) \left( \prod_{t = 1}^{T - 1} \prod_{k = 1}^K \bar w_t^{a_t^k} \right). \label{eq:Q} \end{align}

Note that $$Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$$ integrates to one over $$(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$$: \begin{align} \int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K} = 1. \end{align}

## Extension of the Sampling Distribution of SMC

Let $$Q^+$$ be an extension of $$Q$$, over the random variables $$(M, X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})$$ where $$M$$ is a discrete random variable with the range $$\{1, \dotsc, K\}$$. Define the density of this extension to be: \begin{align} Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}). \label{eq:Q-plus} \end{align}

Note that if we integrate $$Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$$ with respect to $$m$$, we obtain $$Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$$ because $$\bar w_T^m$$ is normalized: \begin{align} \int Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dm &= \int \bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dm \\
&= Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \int \bar w_T^m \,\mathrm dm \\
&= Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}). \label{eq:Q-plus-marginal} \end{align}

## Extension of the Target Distribution

We will introduce a distribution $$P^+$$.

### Particle Lineage

First, let us define a particle lineage. Let $$M$$ be a random variable with the range $$\{1, \dotsc, K\}$$. Define

• $$B_T^M = M$$,
• $$B_t^M = A_t^{B_{t + 1}^M}$$ for $$t = T - 1, \dotsc, 1$$.

This makes it easier to talk about the particle lineage $$\left(X_T^M, X_{T - 1}^{A_{T - 1}^M}, X_{T - 2}^{A_{T - 2}^{A_{T - 1}^M}}, \dotsc, X_1^{A_1^{\rddots{A_{T - 1}^M}}}\right)$$ which we can refer to as $$\left(X_T^{B_T^M}, X_{T - 1}^{B_{T - 1}^M}, X_{T - 2}^{B_{T - 2}^M}, \dotsc, X_1^{B_1^M}\right)$$.

Note that this only allows us to easily talk about particle lineages ending at $$t = T$$. To talk about particle lineages ending at $$t < T$$ as has been done in equations \eqref{eq:proposal-t}, \eqref{eq:weights-t}, and \eqref{eq:Q}, we would have to do something else (?).

### $$P^+$$

Let $$P^+$$ be an extension of the target distribution $$\pi_T$$ in the following sense. Let $$P^+$$ be a distribution over random variables $$(M, X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})$$, i.e. over the same as $$Q^+$$. Let the density of $$P^+$$ be: \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \frac{\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{K^T} \frac{Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)}. \label{eq:P-plus} \end{align}

Note that

• $${\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})} / {K^T}$$ is a density over the random variables $$(X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_T^M)$$ which is the same as $$(X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_{T - 1}^M, M)$$ since $$B_T^M = M$$,
• $$Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$$ is a density over $$(X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})$$,
• $$q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)$$ is a density over $$(X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_{T - 1}^M)$$.

Since we have that \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \frac{d_1(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}, b_1^m, \dotsc, b_{T - 1}^m, m) d_2(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{d_3(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}, b_1^m, \dotsc, b_{T - 1}^m)}, \end{align} where $$d_1, d_2, d_3$$ are normalized densities with respect to their arguments, we can show that $$P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$$ is a normalized density.

## Ratio $$P^+ / Q^+$$

We will show that \begin{align} \frac{P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &= \frac{\hat z_T}{Z_T}. \label{eq:p-div-q} \end{align}

First, we need some identities \begin{align} \hat z_T &= \frac{1}{K^T} \prod_{t = 1}^T \sum_{k = 1}^K w_t^k &\text{(from \eqref{eq:evidence-estimator})} \\
\prod_{t = 1}^T \bar w_t^k &= \frac{\prod_{t = 1}^T w_t^k}{\prod_{t = 1}^T \sum_{\ell = 1}^K w_t^\ell} \\
&= \frac{\prod_{t = 1}^T w_t^k}{\hat z_T K^T} \label{eq:product-of-normalized-weights} \\
\prod_{t = 1}^T w_t^{b_t^m} &= w_1^{b_1^m} \cdot w_2^{b_2^m} \cdot \cdots \cdot w_T^{b_T^m} \\
&= \frac{\gamma_1(x_1^{b_1^m})}{q_1(x_1^{b_1^m})} \cdot \frac{\gamma_2(x_1^{b_1^m}, x_2^{b_2^m})}{\gamma_1(x_1^{b_1^m}) q_2(x_2^{b_2^m} \given x_1^{b_1^m})} \cdot \cdots \cdot \frac{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{\gamma_{T - 1}(x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m}) q_T(x_T^{b_T^m} \given x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m})} \\
&= \frac{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{q_1(x_1^{b_1^m}) q_2(x_2^{b_2^m} \given x_1^{b_1^m}) \cdots q_T(x_T^{b_T^m} \given x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m})}. \label{eq:product-of-unnormalized-weights} \end{align}

Now, we are ready: \begin{align} \frac{P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &= \frac{\frac{\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{K^T} \frac{Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)}}{\bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &\text{(subst. \eqref{eq:Q-plus} and \eqref{eq:P-plus})} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right) \bar w_T^m} &\text{(cancel } Q \text{)} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T \bar w_t^{b_t^m} \right)} &(b_T^m = m) \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T w_t^{b_t^m} \right) / \left( \hat z_T K^T \right)} &\text{(subst. \eqref{eq:product-of-normalized-weights})} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) \hat z_T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T w_t^{b_t^m} \right)} &\text{(cancel } K^T \text{)} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) \hat z_T }{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})} &\text{(subst. \eqref{eq:product-of-unnormalized-weights})} \\
&= \frac{\hat z_T}{Z_T} &\text{(by definition of } \gamma_T / \pi_T \text{)}. \end{align}

## Unbiasedness of $$\hat Z_T$$

Again, we want to prove $$\E[\hat Z_T] = Z_T$$. start from equation \eqref{eq:p-div-q}: \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) Z_T &= Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T. \end{align}

Integrate both sides over $$(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$$:

• since $$P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$$ is a normalized density and $$Z_T$$ is a constant, the left hand side integrates to $$Z_T$$.
• since $$\hat z_T$$ is independent of $$m$$ and from equation \eqref{eq:Q-plus-marginal}, the right hand side integrates to $$\int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K}$$.

Hence, we are left with \begin{align} Z_T &= \int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K} \\
&= \E[\hat Z]. \end{align}

1. Andrieu, C., Doucet, A., & Holenstein, R. (2010). Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3), 269–342.
@article{andrieu2010particle,
title = {Particle markov chain monte carlo methods},
author = {Andrieu, Christophe and Doucet, Arnaud and Holenstein, Roman},
journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)},
volume = {72},
number = {3},
pages = {269--342},
year = {2010},
publisher = {Wiley Online Library}
}


[back]