Tuan Anh Le

probabilistic view of ai

10 March 2017

this note is about formulating the problem of solving ai (ai problem) from a perspective of combining probabilistic inference and decision theory. it’s trying to present a coherent picture of the ai problem and how various subfields of machine learning fit into it and how they approximate solutions to this problem. the hope is to identify and prioritize interesting topics for research. this is heavily based on chapters 16, 17 and 21 of the ai book (Russell & Norvig, 2010).

1 non-sequential decision problem

graphical model of the non-sequential decision problem. square means a decision node.

1.1 definitions

1.2 expected utility of an action

expected utility of an action is a function : \begin{align} EU(a, o) = \E[\underbrace{U(S’ \given A = a, O = o)}_{\text{random variable, } \Omega \to \mathbb R}]. \end{align}

1.3 maximum expected utility principle

the maximum expected utility principle says that, given an observation , a rational agent should choose the action that maximizes the agent’s expected utility: \begin{align} a^\ast = \argmax_{a} EU(a, o). \end{align}

why this makes sense is philosophical. checkout discussion in notes on utility theory.

the quantities needed to evaluate are

1.4 world model (prior)

1.5 world model (transition)

1.6 world model (emission)

1.7 agent’s model (policy)

1.8 next state posterior

1.9 expected utility of an action

finally, \begin{align} EU(a, o) &= \E[U(S’ \given A = a, O = o)] \\ &= \int_{\mathcal S’} U(s’) h(s’ \given a, o) \,\mathrm ds’ \\ &= \int_{\mathcal S’} U(s’) \left[ \int_{\mathcal S} f(s’ \given a, s) \pi(a \given o) g(o \given s) \mu(s) \,\mathrm ds \right]\,\mathrm ds’ \end{align}

now we turn into the full sequential problem…

2 sequential decision problem

graphical model of the sequential decision problem. square means a decision node.

2.1 definitions

we’ll define

2.2 world model (prior)

2.3 world model (transition)

2.4 world model (emission)

2.5 agent’s model (policy)

2.6 next state posterior

2.7 all states posterior

2.8 expected utility of an action

3 reinforcement learning

value iteration

policy iteration

q-learning

td-learning

policy gradients

actor critic

4 bayesian (model-based) reinforcement learning


references

  1. Russell, S. J., & Norvig, P. (2010). Artificial Intelligence (A Modern Approach). Prentice Hall.
    @misc{russell2010artificial,
      title = {Artificial Intelligence (A Modern Approach)},
      author = {Russell, Stuart J and Norvig, Peter},
      year = {2010},
      publisher = {Prentice Hall}
    }
    

[back]