05 April 2017
The following is from an email exchange with Prof. Arnaud Doucet. A lot of stuff is taken from the particle Markov chain Monte Carlo paper (Andrieu et al., 2010).
Consider
We will usually denote random variables with capital letters and their realizations with the corresponding small letters.
Let \(K\) be the number of particles. The sequential Monte Carlo (SMC) algorithm runs as follows:
The estimator of the normalizing constant at the last time step, \(Z_T\) is \begin{align} \hat Z_T = \prod_{t = 1}^T \frac{1}{K} \sum_{k = 1}^K W_t^k. \label{eq:evidence-estimator} \end{align}
We want to show that \(\hat Z_T\) is an unbiased estimator of \(Z_T\), i.e. \(\E[\hat Z_T] = Z_T\).
Define the density of the sampling distribution of SMC over the random variables \((X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})\) as follows: \begin{align} Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \left( \prod_{k = 1}^K q_1(x_1^k) \right) \left( \prod_{t = 2}^T \prod_{k = 1}^K q_t\left(x_t^k \given x_{t - 1}^{a_{t - 1}^k}, x_{t - 2}^{a_{t - 2}^{a_{t - 1}^k}}, \dotsc, x_1^{a_1^{\rddots{a_{t - 1}^k}}}\right) \right) \left( \prod_{t = 1}^{T - 1} \prod_{k = 1}^K \bar w_t^{a_t^k} \right). \label{eq:Q} \end{align}
Note that \(Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) integrates to one over \((x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\): \begin{align} \int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K} = 1. \end{align}
Let \(Q^+\) be an extension of \(Q\), over the random variables \((M, X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})\) where \(M\) is a discrete random variable with the range \(\{1, \dotsc, K\}\). Define the density of this extension to be: \begin{align} Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}). \label{eq:Q-plus} \end{align}
Note that if we integrate \(Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) with respect to \(m\), we obtain \(Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) because \(\bar w_T^m\) is normalized:
\begin{align}
\int Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dm &= \int \bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \,\mathrm dm \\
&= Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \int \bar w_T^m \,\mathrm dm \\
&= Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}). \label{eq:Q-plus-marginal}
\end{align}
We will introduce a distribution \(P^+\).
First, let us define a particle lineage. Let \(M\) be a random variable with the range \(\{1, \dotsc, K\}\). Define
This makes it easier to talk about the particle lineage \(\left(X_T^M, X_{T - 1}^{A_{T - 1}^M}, X_{T - 2}^{A_{T - 2}^{A_{T - 1}^M}}, \dotsc, X_1^{A_1^{\rddots{A_{T - 1}^M}}}\right)\) which we can refer to as \(\left(X_T^{B_T^M}, X_{T - 1}^{B_{T - 1}^M}, X_{T - 2}^{B_{T - 2}^M}, \dotsc, X_1^{B_1^M}\right)\).
Note that this only allows us to easily talk about particle lineages ending at \(t = T\). To talk about particle lineages ending at \(t < T\) as has been done in equations \eqref{eq:proposal-t}, \eqref{eq:weights-t}, and \eqref{eq:Q}, we would have to do something else (?).
Let \(P^+\) be an extension of the target distribution \(\pi_T\) in the following sense. Let \(P^+\) be a distribution over random variables \((M, X_{1:T}^{1:K}, A_{1:T - 1}^{1:K})\), i.e. over the same as \(Q^+\). Let the density of \(P^+\) be: \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \frac{\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{K^T} \frac{Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)}. \label{eq:P-plus} \end{align}
Note that
Since we have that \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) &= \frac{d_1(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}, b_1^m, \dotsc, b_{T - 1}^m, m) d_2(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{d_3(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}, b_1^m, \dotsc, b_{T - 1}^m)}, \end{align} where \(d_1, d_2, d_3\) are normalized densities with respect to their arguments, we can show that \(P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\) is a normalized density.
We will show that \begin{align} \frac{P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &= \frac{\hat z_T}{Z_T}. \label{eq:p-div-q} \end{align}
First, we need some identities
\begin{align}
\hat z_T &= \frac{1}{K^T} \prod_{t = 1}^T \sum_{k = 1}^K w_t^k &\text{(from \eqref{eq:evidence-estimator})} \\
\prod_{t = 1}^T \bar w_t^k &= \frac{\prod_{t = 1}^T w_t^k}{\prod_{t = 1}^T \sum_{\ell = 1}^K w_t^\ell} \\
&= \frac{\prod_{t = 1}^T w_t^k}{\hat z_T K^T} \label{eq:product-of-normalized-weights} \\
\prod_{t = 1}^T w_t^{b_t^m} &= w_1^{b_1^m} \cdot w_2^{b_2^m} \cdot \cdots \cdot w_T^{b_T^m} \\
&= \frac{\gamma_1(x_1^{b_1^m})}{q_1(x_1^{b_1^m})} \cdot \frac{\gamma_2(x_1^{b_1^m}, x_2^{b_2^m})}{\gamma_1(x_1^{b_1^m}) q_2(x_2^{b_2^m} \given x_1^{b_1^m})} \cdot \cdots \cdot \frac{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{\gamma_{T - 1}(x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m}) q_T(x_T^{b_T^m} \given x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m})} \\
&= \frac{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{q_1(x_1^{b_1^m}) q_2(x_2^{b_2^m} \given x_1^{b_1^m}) \cdots q_T(x_T^{b_T^m} \given x_1^{b_1^m}, \dotsc, x_{T - 1}^{b_{T - 1}^m})}. \label{eq:product-of-unnormalized-weights}
\end{align}
Now, we are ready:
\begin{align}
\frac{P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &= \frac{\frac{\pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})}{K^T} \frac{Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})}{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right)}}{\bar w_T^m Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})} &\text{(subst. \eqref{eq:Q-plus} and \eqref{eq:P-plus})} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^{T - 1} \bar w_t^{b_t^m} \right) \bar w_T^m} &\text{(cancel } Q \text{)} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T \bar w_t^{b_t^m} \right)} &(b_T^m = m) \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) / K^T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T w_t^{b_t^m} \right) / \left( \hat z_T K^T \right)} &\text{(subst. \eqref{eq:product-of-normalized-weights})} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) \hat z_T }{q_1(x_1^{b_1^m}) \left( \prod_{t = 2}^T q_t(x_t^{b_t^m} \given x_1^{b_1^m}, \dotsc, x_{t - 1}^{b_{t - 1}^m}) \right) \left( \prod_{t = 1}^T w_t^{b_t^m} \right)} &\text{(cancel } K^T \text{)} \\
&= \frac{ \pi_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m}) \hat z_T }{\gamma_T(x_1^{b_1^m}, \dotsc, x_T^{b_T^m})} &\text{(subst. \eqref{eq:product-of-unnormalized-weights})} \\
&= \frac{\hat z_T}{Z_T} &\text{(by definition of } \gamma_T / \pi_T \text{)}.
\end{align}
Again, we want to prove \(\E[\hat Z_T] = Z_T\). start from equation \eqref{eq:p-div-q}: \begin{align} P^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) Z_T &= Q^+(m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T. \end{align}
Integrate both sides over \((m, x_{1:T}^{1:K}, a_{1:T - 1}^{1:K})\):
Hence, we are left with
\begin{align}
Z_T &= \int Q(x_{1:T}^{1:K}, a_{1:T - 1}^{1:K}) \hat z_T \,\mathrm dx_{1:T}^{1:K} \,\mathrm da_{1:T - 1}^{1:K} \\
&= \E[\hat Z].
\end{align}
@article{andrieu2010particle, title = {Particle markov chain monte carlo methods}, author = {Andrieu, Christophe and Doucet, Arnaud and Holenstein, Roman}, journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, volume = {72}, number = {3}, pages = {269--342}, year = {2010}, publisher = {Wiley Online Library} }
[back]