正在加载图片...
Strategies implemented by machines are Markovian; informally, they are"simple"strate- Nash Folk Theorem for the Discounting Criterion Definition 3 Fix a normal-form game G. The minmax payoff for player i, vi, is defined by t=mina-∈A,maxa;∈At1(a1,a-i) Definition 4 Fix a normal-form game G. A payoff profile u E R is feasible if u aea D u(a) for some set of rational coefficients(, (Ba)aeA) with aea Ba =7. It is en- forceable ift≥t(r U2) for all i∈N It should be clear that no payoff profile w such that wi vi for some player i E N can be achieved in a Nash equilibrium of the infinite repetition of G Proposition 0.1 OR 144.1/If s is a Nash equilibrium of the infinitely repeated game t, then the resulting payoff profile is enforceable The proof is in OR. In particular, in a two-player game, if the machine M2=(Q2, f2, 92, T2) implements a strategy for Player 2, then the machine M1=(Q2, f1, 92, T2)with f1(q2)E rI(2(q2)) yields at least u to Player 1. This is OR Exercise [144.2 The main result in this lecture is known as the Nash Folk Theorem with Discounting Proposition 0.2 /OR 145.2 For any feasible, strictly enforceable payoff profile w, and for every E>0, there exists d l and a feasible, strictly enforceable payoff profile w such that Ww-w< e and, for 8>8, w is a Nash equilibrium payoff profile of r The intuition is easiest to grasp if w=u(a) for some a E lien ai: for each player, we construct a strategy which conforms to a as long as no other player has deviated, and plays an action which minmaxes the first deviator for every history following a deviation. For high enough discount factor, the benefit from a single-stage deviation is outweighted by the fact that the deviator receives his minmax payoff forever after For arbitrary feasible payoffs, we must use cycles. If the payoff aggregation criterion was limit-of-means, we would be home free, because(1)the value of a finite stream in which each ui(a)is repeated exactly Ba times is precisely 2eA Baui(a)=w; and(2)the value of ar infinite repetition of the finite stream just defined is again w. This is OR, Theorem 144.3 However, for discounting we need to do some extra bookkeeping. The value of the finite stream defined in(1)is not exactly w, but it will be close to w for high enough 8: this will be our w. The trick is then to measure time in terms of y-cycles, redefining the discount factor appropriately.Strategies implemented by machines are Markovian; informally, they are “simple” strate￾gies. Nash Folk Theorem for the Discounting Criterion Definition 3 Fix a normal-form game G. The minmax payoff for player i, vi , is defined by vi = mina−i∈A−i maxai∈Ai ui(ai , a−i). Definition 4 Fix a normal-form game G. A payoff profile u ∈ RN P is feasible if u = a∈A βa γ u(a) for some set of rational coefficients (γ,(βa)a∈A) with P a∈A βa = γ. It is en￾forceable (resp. strictly enforceable) if ui ≥ vi (resp. ui > vi) for all i ∈ N It should be clear that no payoff profile w such that wi < vi for some player i ∈ N can be achieved in a Nash equilibrium of the infinite repetition of G: Proposition 0.1 [OR 144.1] If s is a Nash equilibrium of the infinitely repeated game Γ, then the resulting payoff profile is enforceable. The proof is in OR. In particular, in a two-player game, if the machine M2 = (Q2, f2, q0 2 , τ2) implements a strategy for Player 2, then the machine M1 = (Q2, f1, q0 2 , τ2) with f1(q2) ∈ r1(f2(q2)) yields at least v1 to Player 1. This is OR Exercise [144.2]. The main result in this lecture is known as the Nash Folk Theorem with Discounting: Proposition 0.2 [OR 145.2] For any feasible, strictly enforceable payoff profile w, and for every  > 0, there exists δ < 1 and a feasible, strictly enforceable payoff profile w 0 such that |w − w 0 | <  and, for δ > δ, w 0 is a Nash equilibrium payoff profile of Γ. The intuition is easiest to grasp if w = u(a) for some a ∈ Q i∈N ai : for each player, we construct a strategy which conforms to a as long as no other player has deviated, and plays an action which minmaxes the first deviator for every history following a deviation. For high enough discount factor, the benefit from a single-stage deviation is outweighted by the fact that the deviator receives his minmax payoff forever after. For arbitrary feasible payoffs, we must use cycles. If the payoff aggregation criterion was limit-of-means, we would be home free, because (1) the value of a finite stream in which each ui(a) is repeated exactly βa times is precisely P a∈A βa γ ui(a) = w; and (2) the value of an infinite repetition of the finite stream just defined is again w. This is OR, Theorem 144.3. However, for discounting we need to do some extra bookkeeping. The value of the finite stream defined in (1) is not exactly w, but it will be close to w for high enough δ: this will be our w 0 . The trick is then to measure time in terms of γ-cycles, redefining the discount factor appropriately. 4
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有