读书笔记: 博弈论导论 - 02 - 引入不肯定性和时间

时间 2019-11-10

标签读书笔记博弈论导论引入不肯定性时间栏目应用数学繁體版

原文原文链接

读书笔记: 博弈论导论 - 02 - 引入不肯定性和时间

前言

本文是Game Theory An Introduction (by Steven Tadelis) 的学习笔记。html

术语

几率分布函数(probability distribution function)
一个简单投机(lottery)(行动$a \in A$)在结果 $ X = { x_1, x_2, \cdots, x_n }$上的几率分布记作
\[ p = (p(x_1|a), p(x_2|a), \cdots, p(x_n|a)), \\ where \\ p(x_k|a) \geq 0 \text{: the probability that } x_k \text{ occurs when take action a} \\ \sum_{k=1}^n p(x_k|a) = 1 \]。编程
累积分布函数(cumulative distribution function)
一个简单投机(lottery)行动$a \in A$，在结果区间$X = [\underline{x}, \overline{x}]$上的累积分布函数：
\[ F : X \to [0, 1] \\ where \\ f(\hat{x} | a) = Pr{x \leq \hat{x}} \text{: the probability that the outcome is less than or equal to } \hat{x}. \]less
指望收益(expected payoff from the lottery function)
一个简单投机(lottery)行动$a \in A$，在结果区间$X = [x_1, x_2, \cdots, x_n]$上的指望收益函数：
\[ E[u(x)|a] = \sum_{k=1}^n p_k u(x_k) \\ where \\ u(x) \text{: the payoff function} \\ p = (p_1, p_2, \cdots, p_n) \text{: probability distribution} \]函数
连续案例：指望收益(expected payoff from the lottery function)
一个简单投机(lottery)行动$a \in A$，在结果区间$X = [\underline{x}, \overline{x}]$上的指望收益函数：
\[ E[u(x) | a] = \int_{\underline{x}}^{\overline{x}} u(x)f(x)dx \\ where \\ u(x) \text{: the payoff function} \\ f(x|a) \text{: the cumulative distribution function} \]学习
经济人2
咱们称一我的是理性的，若是这我的选择最大指望收益。
\[ \text{choose } a^* \in A \iff v(a^*) = E[u(x)|a^*] \geq E[u(x)|a^*] = v(a), a \in A \]spa

考虑次序和时间

逆向概括法(backward induction)
或者称为动态编程(dynamic programming)。
就是说在连续的随机案例下，从后向前，每一个简单的投机，
都使用最大指望收益推算其投机行为，做为投机的计算行为，向前计算。htm
折扣合计指望(discounted sum of future payoffs)
\[ v(x_1, x_2, \cdots, x_n) = \sum_{t=1}^{T} \delta^{t-1} u(x_t) \\ where \\ T \text{: period} \\ u(x) \text{: the payoff function of outcome x} \\ \]blog

风险态度

中立风险 - risk neutral
认为一样指望回报的价值相同。get
厌恶风险 - risk averse
倾向于一个肯定性的回报，不肯意采用一个拥有一样指望回报的不肯定性方案。it
喜好风险 - risk loving
更严格地倾向于采用拥有一样指望回报的赌注。

到如今，基本上就是强化学习。

参照

Game Theory An Introduction (by Steven Tadelis)
读书笔记: 博弈论导论 - 01 - 单人决策问题