Minimum amount of measure theory necessary to understand probability theory behind machine learning.
These notes are based on (Capinski & Kopp, 2013), (Rosenthal, 2006) and (Qian, 2016).
Definition (\(\sigma\)-algebra). Let \(\Omega\) be a set. Then a \(\sigma\)-algebra \(\mathcal F\) is a nonempty collection of subsets of \(\Omega\) such that
Call \((\Omega, \mathcal F)\) a measurable space. \(\square\)
Definition (Measure). Let \((\Omega, \mathcal F)\) be a measurable space. Let \(\mu: \mathcal F \to \bar{\mathbb R}\) be a mapping, where \(\bar{\mathbb R}\) denotes the set of extended real numbers. Then \(\mu\) is called a measure on \(\mathcal F\) if and only if it has the following properties:
Definition (Probability measure). Let \((\Omega, \mathcal F)\) be a measurable space. A measure \(P\) on this space is called a probability measure if \(P(\Omega) = 1\).
Call \((\Omega, \mathcal F, P)\) a probability triple. \(\square\)
Definition (Measurable function). Let \((\Omega, \mathcal F)\) be a measurable space. Let \((\mathcal X, \mathcal E)\) be another measurable space. Let \(f: \Omega \to \mathcal X\) be a function. Define \(f^{-1}(E) := \{\omega: \omega \in \Omega, f(\omega) \in E\}\) for \(E \in \mathcal E\). \(f\) is said to be \(\mathcal F\)-measurable if \(f^{-1}(E) \in \mathcal F\) for all \(E \in \mathcal E\). \(\square\)
Definition (Random variable). Let \((\Omega, \mathcal F, P)\) be a probability triple. Let \((\mathcal X, \mathcal E)\) be a measurable space. Then a function \(X: \Omega \to \mathcal X\) is called a random variable if it is \(\mathcal F\)-measurable. \(\square\)
Definition (Probability distribution). Given a random variable \(X\) on a probability triple \((\Omega, \mathcal F, P)\) and the output space \((\mathcal X, \mathcal E)\), the probability distribution of \(X\) is \(P \circ X^{-1}\). We write \(P_X := P \circ X^{-1}\).
Note that \(P_X\) is a valid measure on \((\mathcal X, \mathcal E)\).
We also call \(P_X\) law of \(X\) and denote \(\mathcal L(X)\). \(\square\)
Definition (Integration).
Definition (Expectation).
Definition (Product measures).
Theorem (Radon-Nikodym).
Definition (Probability density).
Definition (Conditional expectation).
Definition (Conditional probability).
Theorem (Bayes’ rule).
Theorem (Sum rule).
Theorem (Product rule).
References
@book{capinski2013measure, title = {Measure, integral and probability}, author = {Capinski, Marek and Kopp, Peter E}, year = {2013}, publisher = {Springer Science \& Business Media} }
@book{rosenthal2006first, title = {A first look at rigorous probability theory}, author = {Rosenthal, Jeffrey Seth}, year = {2006}, publisher = {World Scientific} }
@misc{qian2016martingales, author = {Qian, Zhongmin}, title = {Lecture notes on the course ``B8.1 Martingales through Measure Theory''}, month = sep, year = {2016}, publisher = {Mathematical Institute, University of Oxford}, link = {https://courses.maths.ox.ac.uk/node/124}, file = {../assets/pdf/qian2016martingales.pdf} }
[back]