# Measure theory for probability (UNFINISHED)

Minimum amount of measure theory necessary to understand probability theory behind machine learning.

These notes are based on (Capinski & Kopp, 2013), (Rosenthal, 2006) and (Qian, 2016).

Definition ($$\sigma$$-algebra). Let $$\Omega$$ be a set. Then a $$\sigma$$-algebra $$\mathcal F$$ is a nonempty collection of subsets of $$\Omega$$ such that

1. $$\Omega \in \mathcal F$$.
2. If $$A$$ is in $$\mathcal F$$, then so is the complement of $$A$$.
3. If $$A_n$$ is a sequence of elements of $$\mathcal F$$, then the union of $$A_n$$ is in $$\mathcal F$$.

Call $$(\Omega, \mathcal F)$$ a measurable space. $$\square$$

Definition (Measure). Let $$(\Omega, \mathcal F)$$ be a measurable space. Let $$\mu: \mathcal F \to \bar{\mathbb R}$$ be a mapping, where $$\bar{\mathbb R}$$ denotes the set of extended real numbers. Then $$\mu$$ is called a measure on $$\mathcal F$$ if and only if it has the following properties:

1. For every $$F \in \mathcal F$$, $$\mu(F) \geq 0$$.
2. For every sequence of pairwise disjoint sets $$S_n \subseteq \Omega$$: \begin{align} \mu\left(\cup_{n = 1}^\infty S_n \right) = \sum_{n = 1}^\infty \mu(S_n). \end{align} (that is, $$\mu$$ is a countably additive function)
3. $$\mu(\emptyset) = 0$$. $$\square$$

Definition (Probability measure). Let $$(\Omega, \mathcal F)$$ be a measurable space. A measure $$P$$ on this space is called a probability measure if $$P(\Omega) = 1$$.

Call $$(\Omega, \mathcal F, P)$$ a probability triple. $$\square$$

Definition (Measurable function). Let $$(\Omega, \mathcal F)$$ be a measurable space. Let $$(\mathcal X, \mathcal E)$$ be another measurable space. Let $$f: \Omega \to \mathcal X$$ be a function. Define $$f^{-1}(E) := \{\omega: \omega \in \Omega, f(\omega) \in E\}$$ for $$E \in \mathcal E$$. $$f$$ is said to be $$\mathcal F$$-measurable if $$f^{-1}(E) \in \mathcal F$$ for all $$E \in \mathcal E$$. $$\square$$

Definition (Random variable). Let $$(\Omega, \mathcal F, P)$$ be a probability triple. Let $$(\mathcal X, \mathcal E)$$ be a measurable space. Then a function $$X: \Omega \to \mathcal X$$ is called a random variable if it is $$\mathcal F$$-measurable. $$\square$$

Definition (Probability distribution). Given a random variable $$X$$ on a probability triple $$(\Omega, \mathcal F, P)$$ and the output space $$(\mathcal X, \mathcal E)$$, the probability distribution of $$X$$ is $$P \circ X^{-1}$$. We write $$P_X := P \circ X^{-1}$$.

Note that $$P_X$$ is a valid measure on $$(\mathcal X, \mathcal E)$$.

We also call $$P_X$$ law of $$X$$ and denote $$\mathcal L(X)$$. $$\square$$

Definition (Integration).

Definition (Expectation).

Definition (Product measures).

Theorem (Radon-Nikodym).

Definition (Probability density).

Definition (Conditional expectation).

Definition (Conditional probability).

Theorem (Bayes’ rule).

Theorem (Sum rule).

Theorem (Product rule).

References

1. Capinski, M., & Kopp, P. E. (2013). Measure, integral and probability. Springer Science & Business Media.
@book{capinski2013measure,
title = {Measure, integral and probability},
author = {Capinski, Marek and Kopp, Peter E},
year = {2013},
publisher = {Springer Science \& Business Media}
}

2. Rosenthal, J. S. (2006). A first look at rigorous probability theory. World Scientific.
@book{rosenthal2006first,
title = {A first look at rigorous probability theory},
author = {Rosenthal, Jeffrey Seth},
year = {2006},
publisher = {World Scientific}
}

3. Qian, Z. (2016). Lecture notes on the course “B8.1 Martingales through Measure Theory.” Mathematical Institute, University of Oxford.
@misc{qian2016martingales,
author = {Qian, Zhongmin},
title = {Lecture notes on the course B8.1 Martingales through Measure Theory''},
month = sep,
year = {2016},
publisher = {Mathematical Institute, University of Oxford},
link = {https://courses.maths.ox.ac.uk/node/124},
file = {../assets/pdf/qian2016martingales.pdf}
}


[back]