# Measure theory for probability (UNFINISHED)

Minimum amount of measure theory necessary to understand probability theory behind machine learning.

These notes are based on (Capinski & Kopp, 2013), (Rosenthal, 2006) and (Qian, 2016).

Definition ($\sigma$-algebra). Let $\Omega$ be a set. Then a $\sigma$-algebra $\mathcal F$ is a nonempty collection of subsets of $\Omega$ such that

1. $\Omega \in \mathcal F$.
2. If $A$ is in $\mathcal F$, then so is the complement of $A$.
3. If $A_n$ is a sequence of elements of $\mathcal F$, then the union of $A_n$ is in $\mathcal F$.

Call $(\Omega, \mathcal F)$ a measurable space. $\square$

Definition (Measure). Let $(\Omega, \mathcal F)$ be a measurable space. Let $\mu: \mathcal F \to \bar{\mathbb R}$ be a mapping, where $\bar{\mathbb R}$ denotes the set of extended real numbers. Then $\mu$ is called a measure on $\mathcal F$ if and only if it has the following properties:

1. For every $F \in \mathcal F$, $\mu(F) \geq 0$.
2. For every sequence of pairwise disjoint sets $S_n \subseteq \Omega$: \begin{align} \mu\left(\cup_{n = 1}^\infty S_n \right) = \sum_{n = 1}^\infty \mu(S_n). \end{align} (that is, $\mu$ is a countably additive function)
3. $\mu(\emptyset) = 0$. $\square$

Definition (Probability measure). Let $(\Omega, \mathcal F)$ be a measurable space. A measure $P$ on this space is called a probability measure if $P(\Omega) = 1$.

Call $(\Omega, \mathcal F, P)$ a probability triple. $\square$

Definition (Measurable function). Let $(\Omega, \mathcal F)$ be a measurable space. Let $(\mathcal X, \mathcal E)$ be another measurable space. Let $f: \Omega \to \mathcal X$ be a function. Define $f^{-1}(E) := \{\omega: \omega \in \Omega, f(\omega) \in E\}$ for $E \in \mathcal E$. $f$ is said to be $\mathcal F$-measurable if $f^{-1}(E) \in \mathcal F$ for all $E \in \mathcal E$. $\square$

Definition (Random variable). Let $(\Omega, \mathcal F, P)$ be a probability triple. Let $(\mathcal X, \mathcal E)$ be a measurable space. Then a function $X: \Omega \to \mathcal X$ is called a random variable if it is $\mathcal F$-measurable. $\square$

Definition (Probability distribution). Given a random variable $X$ on a probability triple $(\Omega, \mathcal F, P)$ and the output space $(\mathcal X, \mathcal E)$, the probability distribution of $X$ is $P \circ X^{-1}$. We write $P_X := P \circ X^{-1}$.

Note that $P_X$ is a valid measure on $(\mathcal X, \mathcal E)$.

We also call $P_X$ law of $X$ and denote $\mathcal L(X)$. $\square$

Definition (Integration).

Definition (Expectation).

Definition (Product measures).

Definition (Probability density).

Definition (Conditional expectation).

Definition (Conditional probability).

Theorem (Bayes’ rule).

Theorem (Sum rule).

Theorem (Product rule).

References

1. Capinski, M., & Kopp, P. E. (2013). Measure, integral and probability. Springer Science & Business Media.
@book{capinski2013measure,
title = {Measure, integral and probability},
author = {Capinski, Marek and Kopp, Peter E},
year = {2013},
publisher = {Springer Science \& Business Media}
}

2. Rosenthal, J. S. (2006). A first look at rigorous probability theory. World Scientific.
@book{rosenthal2006first,
title = {A first look at rigorous probability theory},
author = {Rosenthal, Jeffrey Seth},
year = {2006},
publisher = {World Scientific}
}

3. Qian, Z. (2016, September). Lecture notes on the course “B8.1 Martingales through Measure Theory.” Mathematical Institute, University of Oxford.
@misc{qian2016martingales,
author = {Qian, Zhongmin},
title = {Lecture notes on the course B8.1 Martingales through Measure Theory''},
month = sep,
year = {2016},
publisher = {Mathematical Institute, University of Oxford},