# Gaussian unknown mean

04 September 2016

Consider the following generative model for $N$ $D$-dimensional data points $x_{1:N}$: \begin{align} \mu &\sim \mathrm{Normal}(\mu_0, \Sigma_0), \\ x_n \mid \mu &\sim \mathrm{Normal}(\mu, \Sigma), && n = 1, \dotsc, N, \end{align} where $\mu_0 \in \mathbb R^D, \Sigma_0 \in \mathbb R^{D \times D}$ are the prior mean and covariances and $\Sigma \in \mathbb R^{D \times D}$ is the data covariance.

## The joint

Then the joint density of $x_{1:N}$ and $\mu$ can be expressed as \begin{align} p(x_{1:N}, \mu) &= p(\mu) p(x_{1:N} \mid \mu) \\ &= \frac{1}{Z_1} \exp\left(\frac{1}{2}(\mu - \mu_0)^T \Sigma_0^{-1} (\mu - \mu_0)\right) \cdot \prod_{n = 1}^N \frac{1}{Z_2} \exp\left(\frac{1}{2}(x_n - \mu)^T \Sigma^{-1} (x_n - \mu)\right) \\ &= \frac{1}{Z_1 Z_2} \exp\left(\frac{1}{2}\left(\mu^T \Sigma_0^{-1} \mu - 2\mu^T\Sigma_0^{-1}\mu_0 + \mu_0^T\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N x_n^T \Sigma^{-1} x_n - 2x_n^T\Sigma^{-1}\mu + \mu^T\Sigma^{-1}\mu\right)\right) \\ &= \frac{1}{Z_1 Z_2 Z_3} \exp\left(\frac{1}{2}\left(\mu^T(\Sigma_0^{-1} + N\Sigma^{-1})\mu - 2\mu^T\left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right) \right)\right), \end{align} where $Z_1, Z_2$ are normalisation constants of the prior and likelihood normal densities, and $Z_3$ is a constant with respect to $\mu$.

## The posterior

The posterior density of $\mu$ has the form of \begin{align} p(\mu \mid x_{1:N}) &= \frac{p(x_{1:N}, \mu)}{p(x_{1:N})} \\ &= \frac{1}{Z_4} p(x_{1:N}, \mu) \\ &= \frac{1}{Z_1 Z_2 Z_3 Z_4} \exp\left(\frac{1}{2}\left(\mu^T(\Sigma_0^{-1} + N\Sigma^{-1})\mu - 2\mu^T\left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right) \right)\right). \label{eq:gaussian/quadratic} \end{align} We can see that this term has the form $\exp(\mu^T A \mu + \mu^T B + C)$ for some $A \in \mathbb R^{D \times D}$ positive semidefinite, $B \in \mathbb R^D$, and $C \in \mathbb R$. It must also integrate to one, i.e. $\int p(\mu \mid x_{1:N}) \,\mathrm d\mu = 1$. Since this is a known form of a normal distribution, we can fit the parameters of the presupposed posterior normal density with posterior mean $\mu_N$ and posterior covariance $\Sigma_N$ with quantities in \eqref{eq:gaussian/quadratic} as follows: \begin{align} p(\mu \mid x_{1:N}) &= \mathrm{Normal}(\mu; \mu_N, \Sigma_N) \\ &= \frac{1}{Z_5} \exp\left(\frac{1}{2}\left(\mu^T \Sigma_N^{-1} \mu - 2\mu^T\Sigma_N^{-1}\mu_N + \mu_N^T\Sigma_N^{-1}\mu_N\right)\right) \\ \implies \Sigma_N &= (\Sigma_0^{-1} + N\Sigma^{-1})^{-1} \\ \mu_N &= \Sigma_N \left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right). \end{align}

## The marginal likelihood

The marginal likelihood of this model is \begin{align} p(x_{1:N}) &= \mathrm{Normal}\left( x_{1:N}; \begin{bmatrix} \mu_0 \\ \vdots \\ \mu_0 \end{bmatrix}, \begin{bmatrix} \Sigma & & \\ & \ddots & \\ & & \Sigma \end{bmatrix} + \begin{bmatrix} \Sigma_0 & \cdots & \Sigma_0 \\ \vdots & \ddots & \vdots \\ \Sigma_0 & \cdots & \Sigma_0 \end{bmatrix} \right). \end{align} This can be derived using the Gaussian identity in equation 5 in Michael Osborne’s note where we use the following substitutions: \begin{align} \color{blue}{x} &\leftarrow \mu, \\ \color{blue}{y} &\leftarrow x_{1:N}, \\ \color{blue}{\mu} &\leftarrow \mu_0, \\ \color{blue}{A} &\leftarrow \Sigma_0, \\ \color{blue}{M} &\leftarrow \begin{bmatrix} I \\ \vdots \\ I \end{bmatrix}, \\ \color{blue}{c} &\leftarrow 0, \\ \color{blue}{L} &\leftarrow \begin{bmatrix} \Sigma & & \\ & \ddots & \\ & & \Sigma \end{bmatrix}. \end{align}

[back]