# Unbiasedness of the Sequential Monte Carlo Based Normalizing Constant Estimator

05 April 2017

The following is from an email exchange with Prof. Arnaud Doucet. A lot of stuff is taken from the particle Markov chain Monte Carlo paper (Andrieu, Doucet, & Holenstein, 2010).

## Notation

Consider

• a sequence of target densities $\pi_t(x_{1:t}), t = 1, \dotsc, T$,
• a sequence of unnormalized target densities $\gamma_t(x_{1:t}), t = 1, \dotsc, T$,
• a sequence of normalizing constants $Z_t = \int \gamma_t(x_{1:t}) \,\mathrm dx_{1:t} = \gamma_t(x_{1:t}) / \pi_t(x_{1:t}), t = 1, \dotsc, T$,
• initial proposal density $q_1(x_1)$, and
• proposal densities $q_t(x_t \given x_{1:t - 1}), t = 2, \dotsc, T$.

We will usually denote random variables with capital letters and their realizations with the corresponding small letters.

## SMC algorithm

Let $K$ be the number of particles. The sequential Monte Carlo (SMC) algorithm runs as follows:

• for $t = 1$:
• sample \begin{align} X_1^k \sim q_1(\cdot), \end{align}
• compute weights \begin{align} W_1^k = \frac{\gamma_1(X_1^k)}{q_1(X_1^k)}, \end{align}
• normalize weights \begin{align} \bar W_1^k = \frac{W_1^k}{\sum_{\ell = 1}^K W_1^\ell}, \end{align}
• for $t = 2, \dotsc, T$:
• sample \begin{align} A_{t - 1}^k \sim \mathrm{Discrete}(W_{t - 1}^1, \dotsc, W_{t - 1}^K), \end{align}
• sample \begin{align} X_t^k \sim q_t\left(\cdot \given X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right), \label{eq:proposal-t} \end{align}
• compute weights \begin{align} W_t^k = \frac{\gamma_t\left(X_t^k, X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right)}{\gamma_{t - 1}\left(X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right) q_t\left(X_t^k \given X_{t - 1}^{A_{t - 1}^k}, X_{t - 2}^{A_{t - 2}^{A_{t - 1}^k}}, \dotsc, X_1^{A_1^{\rddots{A_{t - 1}^k}}}\right)}, \label{eq:weights-t} \end{align}
• normalize weights \begin{align} \bar W_t^k = \frac{W_t^k}{\sum_{\ell = 1}^K W_t^\ell}. \end{align}

## Normalizing Constant Estimator

The estimator of the normalizing constant at the last time step, $Z_T$ is \begin{align} \hat Z_T = \prod_{t = 1}^T \frac{1}{K} \sum_{k = 1}^K W_t^k. \label{eq:evidence-estimator} \end{align}

We want to show that $\hat Z_T$ is an unbiased estimator of $Z_T$, i.e. $\E[\hat Z_T] = Z_T$.

## Sampling distribution of SMC

Define the density of the sampling distribution of SMC over the random variables $(X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})$ as follows: \begin{align} Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \left( \prod_{k = 1}^K q_1(x_1^k) \right) \left( \prod_{t = 2}^T \prod_{k = 1}^K q_t\left(x_t^k \given x_{t - 1}^{a_{t - 1}^k}, x_{t - 2}^{a_{t - 2}^{a_{t - 1}^k}}, \dotsc, x_1^{a_1^{\rddots{a_{t - 1}^k}}}\right) \right) \left( \prod_{t = 1}^{T - 1} \prod_{k = 1}^K \bar w_t^{a_t^k} \right). \label{eq:Q} \end{align}

Note that $Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$ integrates to one over $(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$: \begin{align} \int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K} = 1. \end{align}

## Extension of the Sampling Distribution of SMC

Let $Q^+$ be an extension of $Q$, over the random variables $(M, X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})$ where $M$ is a discrete random variable with the range $\{1, \dotsc, K\}$. Define the density of this extension to be: \begin{align} Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}). \label{eq:Q-plus} \end{align}

Note that if we integrate $Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$ with respect to $m$, we obtain $Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$ because $\bar w_T^m$ is normalized: \begin{align} \int Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dm &= \int \bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dm \\ &= Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \int \bar w_T^m \,\mathrm dm \\ &= Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}). \label{eq:Q-plus-marginal} \end{align}

## Extension of the Target Distribution

We will introduce a distribution $P^+$.

### Particle Lineage

First, let us define a particle lineage. Let $M$ be a random variable with the range $\{1, \dotsc, K\}$. Define

• $B_T^M = M$,
• $B_t^M = A_t^{B_{t + 1}^M}$ for $t = T - 1, \dotsc, 1$.

This makes it easier to talk about the particle lineage $\left(X_T^M, X_{T - 1}^{A_{T - 1}^M}, X_{T - 2}^{A_{T - 2}^{A_{T - 1}^M}}, \dotsc, X_1^{A_1^{\rddots{A_{T - 1}^M}}}\right)$ which we can refer to as $\left(X_T^{B_T^M}, X_{T - 1}^{B_{T - 1}^M}, X_{T - 2}^{B_{T - 2}^M}, \dotsc, X_1^{B_1^M}\right)$.

Note that this only allows us to easily talk about particle lineages ending at $t = T$. To talk about particle lineages ending at $% $ as has been done in equations \eqref{eq:proposal-t}, \eqref{eq:weights-t}, and \eqref{eq:Q}, we would have to do something else (?).

### $P^+$

Let $P^+$ be an extension of the target distribution $\pi_T$ in the following sense. Let $P^+$ be a distribution over random variables $(M, X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})$, i.e. over the same as $Q^+$. Let the density of $P^+$ be: \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \frac{\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{K^T} \frac{Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)}. \label{eq:P-plus} \end{align}

Note that

• ${\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})} / {K^T}$ is a density over the random variables $(X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_T^M)$ which is the same as $(X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_{T - 1}^M, M)$ since $B_T^M = M$,
• $Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$ is a density over $(X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})$,
• $q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)$ is a density over $(X_1^{B_1^M}, \dotsc, X_T^{B_T^M}, B_1^M, \dotsc, B_{T - 1}^M)$.

Since we have that \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \frac{d_1(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}, b_1^m, \dotsc, b_{T - 1}^m, m) d_2(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{d_3(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}, b_1^m, \dotsc, b_{T - 1}^m)}, \end{align} where $d_1, d_2, d_3$ are normalized densities with respect to their arguments, we can show that $P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$ is a normalized density.

## Ratio $P^+ / Q^+$

We will show that \begin{align} \frac{P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &= \frac{\hat z_T}{Z_T}. \label{eq:p-div-q} \end{align}

First, we need some identities \begin{align} \hat z_T &= \frac{1}{K^T} \prod_{t = 1}^T \sum_{k = 1}^K w_t^k &\text{(from \eqref{eq:evidence-estimator})} \\ \prod_{t = 1}^T \bar w_t^k &= \frac{\prod_{t = 1}^T w_t^k}{\prod_{t = 1}^T \sum_{\ell = 1}^K w_t^\ell} \\ &= \frac{\prod_{t = 1}^T w_t^k}{\hat z_T K^T} \label{eq:product-of-normalized-weights} \\ \prod_{t = 1}^T w_t^{b_t^m} &= w_1^{b_1^m} \cdot w_2^{b_2^m} \cdot \cdots \cdot w_T^{b_T^m} \\ &= \frac{\gamma_1(x_1^{b_1^m})}{q_1(x_1^{b_1^m})} \cdot \frac{\gamma_2(x_1^{b_1^m}, x_2^{b_2^m})}{\gamma_1(x_1^{b_1^m}) q_2(x_2^{b_2^m} \given x_1^{b_1^m})} \cdot \cdots \cdot \frac{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{\gamma_{T - 1}(x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m}) q_T(x_T^{b_T^m} \given x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m})} \\ &= \frac{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{q_1(x_1^{b_1^m}) q_2(x_2^{b_2^m} \given x_1^{b_1^m}) \cdots q_T(x_T^{b_T^m} \given x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m})}. \label{eq:product-of-unnormalized-weights} \end{align}

Now, we are ready: \begin{align} \frac{P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &= \frac{\frac{\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{K^T} \frac{Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)}}{\bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &\text{(subst. \eqref{eq:Q-plus} and \eqref{eq:P-plus})}\\ &= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right) \bar w_T^m} &\text{(cancel } Q \text{)} \\ &= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T \bar w_t^{b_t^m} \right)} &(b_T^m = m)\\ &= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T w_t^{b_t^m} \right) / \left( \hat z_T K^T \right)} &\text{(subst. \eqref{eq:product-of-normalized-weights})}\\ &= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) \hat z_T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T w_t^{b_t^m} \right)} &\text{(cancel } K^T \text{)}\\ &= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) \hat z_T }{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})} &\text{(subst. \eqref{eq:product-of-unnormalized-weights})}\\ &= \frac{\hat z_T}{Z_T} &\text{(by definition of } \gamma_T / \pi_T \text{)}. \end{align}

## Unbiasedness of $\hat Z_T$

Again, we want to prove $\E[\hat Z_T] = Z_T$. start from equation \eqref{eq:p-div-q}: \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) Z_T &= Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T. \end{align}

Integrate both sides over $(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$:

• since $P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})$ is a normalized density and $Z_T$ is a constant, the left hand side integrates to $Z_T$.
• since $\hat z_T$ is independent of $m$ and from equation \eqref{eq:Q-plus-marginal}, the right hand side integrates to $\int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K}$.

Hence, we are left with \begin{align} Z_T &= \int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K} \\ &= \E[\hat Z]. \end{align}

1. Andrieu, C., Doucet, A., & Holenstein, R. (2010). Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3), 269–342.
@article{andrieu2010particle,
title = {Particle markov chain monte carlo methods},
author = {Andrieu, Christophe and Doucet, Arnaud and Holenstein, Roman},
journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)},
volume = {72},
number = {3},
pages = {269--342},
year = {2010},
publisher = {Wiley Online Library}
}


[back]