04 September 2016
Consider the following generative model for \(N\) \(D\)-dimensional data points \(x_{1:N}\):
\begin{align}
\mu &\sim \mathrm{Normal}(\mu_0, \Sigma_0), \\
x_n \mid \mu &\sim \mathrm{Normal}(\mu, \Sigma), && n = 1, \dotsc, N,
\end{align}
where \(\mu_0 \in \mathbb R^D, \Sigma_0 \in \mathbb R^{D \times D}\) are the prior mean and covariances and \(\Sigma \in \mathbb R^{D \times D}\) is the data covariance.
Then the joint density of \(x_{1:N}\) and \(\mu\) can be expressed as
\begin{align}
p(x_{1:N}, \mu) &= p(\mu) p(x_{1:N} \mid \mu) \\
&= \frac{1}{Z_1} \exp\left(\frac{1}{2}(\mu - \mu_0)^T \Sigma_0^{-1} (\mu - \mu_0)\right) \cdot \prod_{n = 1}^N \frac{1}{Z_2} \exp\left(\frac{1}{2}(x_n - \mu)^T \Sigma^{-1} (x_n - \mu)\right) \\
&= \frac{1}{Z_1 Z_2} \exp\left(\frac{1}{2}\left(\mu^T \Sigma_0^{-1} \mu - 2\mu^T\Sigma_0^{-1}\mu_0 + \mu_0^T\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N x_n^T \Sigma^{-1} x_n - 2x_n^T\Sigma^{-1}\mu + \mu^T\Sigma^{-1}\mu\right)\right) \\
&= \frac{1}{Z_1 Z_2 Z_3} \exp\left(\frac{1}{2}\left(\mu^T(\Sigma_0^{-1} + N\Sigma^{-1})\mu - 2\mu^T\left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right) \right)\right),
\end{align}
where \(Z_1, Z_2\) are normalisation constants of the prior and likelihood normal densities, and \(Z_3\) is a constant with respect to \(\mu\).
The posterior density of \(\mu\) has the form of
\begin{align}
p(\mu \mid x_{1:N}) &= \frac{p(x_{1:N}, \mu)}{p(x_{1:N})} \\
&= \frac{1}{Z_4} p(x_{1:N}, \mu) \\
&= \frac{1}{Z_1 Z_2 Z_3 Z_4} \exp\left(\frac{1}{2}\left(\mu^T(\Sigma_0^{-1} + N\Sigma^{-1})\mu - 2\mu^T\left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right) \right)\right). \label{eq:gaussian/quadratic}
\end{align}
We can see that this term has the form \(\exp(\mu^T A \mu + \mu^T B + C)\) for some \(A \in \mathbb R^{D \times D}\) positive semidefinite, \(B \in \mathbb R^D\), and \(C \in \mathbb R\). It must also integrate to one, i.e. \(\int p(\mu \mid x_{1:N}) \,\mathrm d\mu = 1\). Since this is a known form of a normal distribution, we can fit the parameters of the presupposed posterior normal density with posterior mean \(\mu_N\) and posterior covariance \(\Sigma_N\) with quantities in \eqref{eq:gaussian/quadratic} as follows:
\begin{align}
p(\mu \mid x_{1:N}) &= \mathrm{Normal}(\mu; \mu_N, \Sigma_N) \\
&= \frac{1}{Z_5} \exp\left(\frac{1}{2}\left(\mu^T \Sigma_N^{-1} \mu - 2\mu^T\Sigma_N^{-1}\mu_N + \mu_N^T\Sigma_N^{-1}\mu_N\right)\right) \\
\implies
\Sigma_N &= (\Sigma_0^{-1} + N\Sigma^{-1})^{-1} \\
\mu_N &= \Sigma_N \left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right).
\end{align}
The marginal likelihood of this model is
\begin{align}
p(x_{1:N}) &= \mathrm{Normal}\left(
x_{1:N};
\begin{bmatrix}
\mu_0 \\
\vdots \\
\mu_0
\end{bmatrix},
\begin{bmatrix}
\Sigma & & \\
& \ddots & \\
& & \Sigma
\end{bmatrix}
+
\begin{bmatrix}
\Sigma_0 & \cdots & \Sigma_0 \\
\vdots & \ddots & \vdots \\
\Sigma_0 & \cdots & \Sigma_0
\end{bmatrix}
\right).
\end{align}
This can be derived using the Gaussian identity in equation 5 in Michael Osborneās note where we use the following substitutions:
\begin{align}
\color{blue}{x} &\leftarrow \mu, \\
\color{blue}{y} &\leftarrow x_{1:N}, \\
\color{blue}{\mu} &\leftarrow \mu_0, \\
\color{blue}{A} &\leftarrow \Sigma_0, \\
\color{blue}{M} &\leftarrow
\begin{bmatrix}
I \\
\vdots \\
I
\end{bmatrix}, \\
\color{blue}{c} &\leftarrow 0, \\
\color{blue}{L} &\leftarrow
\begin{bmatrix}
\Sigma & & \\
& \ddots & \\
& & \Sigma
\end{bmatrix}.
\end{align}
[back]