04 September 2016
Consider the following generative model for \(N\) \(D\)-dimensional data points \(x_{1:N}\):
\begin{align}
  \mu &\sim \mathrm{Normal}(\mu_0, \Sigma_0), \\
  x_n \mid \mu &\sim \mathrm{Normal}(\mu, \Sigma), && n = 1, \dotsc, N,
\end{align}
where \(\mu_0 \in \mathbb R^D, \Sigma_0 \in \mathbb R^{D \times D}\) are the prior mean and covariances and \(\Sigma \in \mathbb R^{D \times D}\) is the data covariance.
Then the joint density of \(x_{1:N}\) and \(\mu\) can be expressed as
\begin{align}
  p(x_{1:N}, \mu) &= p(\mu) p(x_{1:N} \mid \mu) \\
                  &= \frac{1}{Z_1} \exp\left(\frac{1}{2}(\mu - \mu_0)^T \Sigma_0^{-1} (\mu - \mu_0)\right) \cdot \prod_{n = 1}^N \frac{1}{Z_2} \exp\left(\frac{1}{2}(x_n - \mu)^T \Sigma^{-1} (x_n - \mu)\right) \\
                  &= \frac{1}{Z_1 Z_2} \exp\left(\frac{1}{2}\left(\mu^T \Sigma_0^{-1} \mu - 2\mu^T\Sigma_0^{-1}\mu_0 + \mu_0^T\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N x_n^T \Sigma^{-1} x_n - 2x_n^T\Sigma^{-1}\mu + \mu^T\Sigma^{-1}\mu\right)\right) \\
                  &= \frac{1}{Z_1 Z_2 Z_3} \exp\left(\frac{1}{2}\left(\mu^T(\Sigma_0^{-1} + N\Sigma^{-1})\mu - 2\mu^T\left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right) \right)\right),
\end{align}
where \(Z_1, Z_2\) are normalisation constants of the prior and likelihood normal densities, and \(Z_3\) is a constant with respect to \(\mu\).
The posterior density of \(\mu\) has the form of
\begin{align}
  p(\mu \mid x_{1:N}) &= \frac{p(x_{1:N}, \mu)}{p(x_{1:N})} \\
                      &= \frac{1}{Z_4} p(x_{1:N}, \mu) \\
                      &= \frac{1}{Z_1 Z_2 Z_3 Z_4} \exp\left(\frac{1}{2}\left(\mu^T(\Sigma_0^{-1} + N\Sigma^{-1})\mu - 2\mu^T\left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right) \right)\right). \label{eq:gaussian/quadratic}
\end{align}
We can see that this term has the form \(\exp(\mu^T A \mu + \mu^T B + C)\) for some \(A \in \mathbb R^{D \times D}\) positive semidefinite, \(B \in \mathbb R^D\), and \(C \in \mathbb R\). It must also integrate to one, i.e. \(\int p(\mu \mid x_{1:N}) \,\mathrm d\mu = 1\). Since this is a known form of a normal distribution, we can fit the parameters of the presupposed posterior normal density with posterior mean \(\mu_N\) and posterior covariance \(\Sigma_N\) with quantities in \eqref{eq:gaussian/quadratic} as follows:
\begin{align}
  p(\mu \mid x_{1:N}) &= \mathrm{Normal}(\mu; \mu_N, \Sigma_N) \\
  &= \frac{1}{Z_5} \exp\left(\frac{1}{2}\left(\mu^T \Sigma_N^{-1} \mu - 2\mu^T\Sigma_N^{-1}\mu_N + \mu_N^T\Sigma_N^{-1}\mu_N\right)\right) \\
  \implies
  \Sigma_N &= (\Sigma_0^{-1} + N\Sigma^{-1})^{-1} \\
  \mu_N &= \Sigma_N \left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right).
\end{align}
The marginal likelihood of this model is
\begin{align}
  p(x_{1:N}) &= \mathrm{Normal}\left(
    x_{1:N};
    \begin{bmatrix}
      \mu_0 \\
      \vdots \\
      \mu_0
    \end{bmatrix},
    \begin{bmatrix}
      \Sigma & &  \\
      & \ddots & \\
      & & \Sigma
    \end{bmatrix}
    +
    \begin{bmatrix}
      \Sigma_0 & \cdots & \Sigma_0 \\
      \vdots & \ddots & \vdots \\
      \Sigma_0 & \cdots & \Sigma_0
    \end{bmatrix}
  \right).
\end{align}
This can be derived using the Gaussian identity in equation 5 in Michael Osborneās note where we use the following substitutions:
\begin{align}
  \color{blue}{x} &\leftarrow \mu, \\
  \color{blue}{y} &\leftarrow x_{1:N}, \\
  \color{blue}{\mu} &\leftarrow \mu_0, \\
  \color{blue}{A} &\leftarrow \Sigma_0, \\
  \color{blue}{M} &\leftarrow
  \begin{bmatrix}
    I \\
    \vdots \\
    I
  \end{bmatrix}, \\
  \color{blue}{c} &\leftarrow 0, \\
  \color{blue}{L} &\leftarrow
  \begin{bmatrix}
    \Sigma & &  \\
    & \ddots & \\
    & & \Sigma
  \end{bmatrix}.
\end{align}
[back]