18 November 2022
Sequential Monte Carlo (SMC) is a general framework using which we can understand many inference algorithms such as particle filtering, resample-move versions of particle filtering, annealed importance sampling and more. Despite its simplicity, SMC is applicable to a wide class of models—universal probabilistic programs.
There are at least two ways to think about and formalize a general recipe for SMC. One based on a sequence increasing spaces (Doucet and Johansen, 2008; Section 3.5) and the other based on “SMC samplers” (SMCS) (Del Moral et al., 2006). They are equivalent in that we can derive one from the other and vice versa. While the increasing-spaces formulation is more intuitive as a generalization of particle filtering, the SMCS formulation exposes more flexibility for inference design. For a long time, this was not obvious to me, mainly due to the fact that the standard proof of correctness of the SMCS formulation relies on viewing it as a special case of the increasing-spaces formulation (see below).
We have a sequence of unnormalized target densities \(\{\gamma_t(z_{1:t})\}_{t = 1}^T\) with normalizing constants \(Z_t = \int \gamma_t(z_{1:t}) \,\mathrm dz_{1:t}\). We want to estimate the normalizing constants \(Z_t\) and expectations with respect to the (normalized) target distributions \(\pi_t(z_{1:t}) = \gamma_t(z_{1:t}) / Z_t\). To do this, we design a sequence of proposal distributions \(\{q_t(z_t \given z_{1:t - 1})\}_{t = 1}^T\).
Assuming a particle budget \(K\) and resampling at every step, the increasing-spaces SMC procedure is
At every timestep, we can estimate the normalizing constant and expectations as
\begin{align}
Z_t &\approx \prod_{\tau = 1}^t \left(\frac{1}{K} \sum_{k = 1}^K w_\tau^k \right), \\
\E_{\pi_t(z_{1:t})}[f(z_{1:t})] &\approx \frac{1}{\sum_{k = 1}^K w_t^k} \sum_{k = 1}^K w_t^k f(z_{1:t}^k).
\end{align}
We have a sequence of unnormalized target densities \(\{\gamma_t(z_t)\}_{t = 1}^T\) with normalizing constants \(Z_t = \int \gamma_t(z_t) \,\mathrm dz_t\). We want to estimate the normalizing constants \(Z_t\) and expectations with respect to the (normalized) target distributions \(\pi_t(z_t) = \gamma_t(z_t) / Z_t\). To do this, we design a sequence of forward kernels \(\{q_t(z_t \given z_{t - 1})\}_{t = 1}^T\) and a sequence of reverse kernels \(\{r_{t - 1}(z_{t - 1} \given z_t\}_{t = 2}^T\).
Assuming a particle budget \(K\) and resampling at every step, the SMCS procedure is
At every timestep, we can estimate the normalizing constant and expectations as
\begin{align}
Z_t &\approx \prod_{\tau = 1}^t \left(\frac{1}{K} \sum_{k = 1}^K w_\tau^k \right), \\
\E_{\pi_t(z_t)}[f(z_t)] &\approx \frac{1}{\sum_{k = 1}^K w_k} \sum_{k = 1}^K w_k f(z_t^k).
\end{align}
The standard way to justify the SMCS formulation is to view it as a special case of the increasing-spaces formulation.
To do this, we set
\begin{align}
\gamma_t^{\text{IS}}(z_{1:t}) &= \gamma_t^{\text{SMCS}}(z_t) \prod_{\tau = 1}^{t - 1} r_\tau(z_\tau \given z_{\tau + 1}) \\
q_t^{\text{IS}}(z_t \given z_{1:t - 1}) &= q_t^{\text{SMCS}}(z_t \given z_{t - 1})
\end{align}
which implies
\begin{align}
Z_t^{\text{IS}} &= \int \gamma_t^{\text{IS}}(z_{1:t}) \,\mathrm dz_{1:t} \\
&= \int \gamma_t^{\text{SMCS}}(z_t) \prod_{\tau = 1}^{t - 1} r_\tau(z_\tau \given z_{\tau + 1}) \,\mathrm dz_{1:t} \\
&= \int \gamma_t^{\text{SMCS}}(z_t) \,\mathrm dz_t \\
&= Z_t^{\text{SMCS}}, \\
w_t^{\text{IS}} &=\frac{\gamma_t^{\text{IS}}(z_{1:t})}{\gamma_{t - 1}^{\text{IS}}(z_{1:t - 1}) q_t^{\text{IS}}(z_t \given z_{1:t - 1})} \\
&= \frac{\gamma_t^{\text{SMCS}}(z_t) \prod_{\tau = 1}^{t - 1} r_\tau(z_\tau \given z_{\tau + 1})}{\left(\gamma_{t - 1}^{\text{SMCS}}(z_{t - 1}) \prod_{\tau = 1}^{t - 2} r_\tau(z_\tau \given z_{\tau + 1}) \right) q_t^{\text{SMCS}}(z_t \given z_{t - 1})} \\
&= \frac{\gamma_t^{\text{SMCS}}(z_t) r_{t - 1}(z_{t - 1} \given z_t)}{\gamma_{t - 1}^{\text{SMCS}}(z_{t - 1}) q_t^{\text{SMCS}}(z_t \given z_{t - 1})} \\
&= w_t^{\text{SMCS}}.
\end{align}
To view the increasing-spaces formulation as a special case of the SMCS formulation, we need to view a sequence \(z_{1:t}\) as a different object from \(z_{1:t - 1}\) rather than as a result of concatenating \(z_t\).
We will distinguish \(z_{1:t}^{(t)}\) from \(z_{1:t - 1}^{(t - 1)}\) using the “\((t)\)”/”\((t - 1)\)” superscript. Setting
\begin{align}
z_t^{\text{SMCS}} &= z_{1:t}^{(t), \text{IS}} \\
\gamma_t^{\text{SMCS}}(z_t^{\text{SMCS}}) &= \gamma_t^{\text{IS}}(z_{1:t}^{(t), \text{IS}})
\end{align}
implies
\begin{align}
Z_t^{\text{SMCS}} &= \int \gamma_t^{\text{SMCS}}(z_t^{\text{SMCS}}) \,\mathrm dz_t^{\text{SMCS}} \\
&= \int \gamma_t^{\text{IS}}(z_{1:t}^{(t), \text{IS}}) \,\mathrm dz_{1:t}^{(t), \text{IS}} \\
&= Z_t^{\text{IS}}.
\end{align}
To enforce \(z_{1:t - 1}^{(t), \text{IS}} = z_{1:t - 1}^{(t - 1), \text{IS}}\), we include copy kernels in both forward and reverse kernels:
\begin{align}
q_t^{\text{SMCS}}(z_t^{\text{SMCS}} | z_{t - 1}^{\text{SMCS}}) &= \delta_{z_{1:t - 1}^{(t - 1), \text{IS}}}(z_{1:t - 1}^{(t), \text{IS}}) q_t^{\text{IS}}(z_t^{(t), \text{IS}} | z_{1:t - 1}^{(t - 1), \text{IS}}), \\
r_{t - 1}^{\text{SMCS}}(z_{t - 1}^{\text{SMCS}} | z_t^{\text{SMCS}}) &= \delta_{z_{1:t - 1}^{(t), \text{IS}}}(z_{1:t - 1}^{(t - 1), \text{IS}}).
\end{align}
The “propose” step in SMCS will therefore only propose \(z_t^{(t), \text{IS}}\) given \(z_{1:t - 1}^{(t - 1), \text{IS}}\) as is done in the “propose” step in the increasing-spaces formulation.
The weights also match:
\begin{align}
w_t^{\text{SMCS}} &= \frac{\gamma_t^{\text{SMCS}}(z_t^{\text{SMCS}}) r_{t - 1}^{\text{SMCS}}(z_{t - 1}^{\text{SMCS}} | z_t^{\text{SMCS}})}{\gamma_{t - 1}^{\text{SMCS}}(z_{t - 1}^{\text{SMCS}}) q_t^{\text{SMCS}}(z_t^{\text{SMCS}} | z_{t - 1}^{\text{SMCS}})} \\
&= \frac{\gamma_t^{\text{IS}}(z_{1:t}^{(t), \text{IS}}) \delta_{z_{1:t - 1}^{(t), \text{IS}}}(z_{1:t - 1}^{(t - 1), \text{IS}})}{\gamma_{t - 1}^{\text{IS}}(z_{1:t - 1}^{(t - 1), \text{IS}}) \delta_{z_{1:t - 1}^{(t - 1), \text{IS}}}(z_{1:t - 1}^{(t), \text{IS}}) q_t^{\text{IS}}(z_t^{(t), \text{IS}} | z_{1:t - 1}^{(t - 1), \text{IS}})} \\
&= \frac{\gamma_t^{\text{IS}}(z_{1:t}^{(t), \text{IS}})}{\gamma_{t - 1}^{\text{IS}}(z_{1:t - 1}^{(t - 1), \text{IS}}) q_t^{\text{IS}}(z_t^{(t), \text{IS}} | z_{1:t - 1}^{(t - 1), \text{IS}})} \\
&= w_t^{\text{IS}}.
\end{align}
Mathematically, these formulations are equivalent. In practice, the SMCS formulation makes it easier to think about the design of a wider range of algorithms. This is because in the IS-first view, we are only free to choose the unnormalized targets and the proposal for the next step, given previous steps. To be able to propose changes to the whole sequence like we would do in SMCS, we would have to make the full SMCS state be part of the per-step states of the IS-view. Moreover, we would need to design the reverse kernels implicitly via the unnormalized target densities.
[back]