正在加载图片...
ote:OR actually allows for arbitrary state spaces. For most proofs, finite state spaces are enough. There is one exception, which I will note below Definition 2 Fix a normal-form game G. The minmax payoff for r player i, vi, is defined by t=mina-∈A,maxa∈A4t(a1,a-) We are not going to worry about implementing mixtures of action profiles in this lecture so I just remind you of the definition of enforceability Definition 3 Fix a normal-form game G. A payoff profile u E R is enforceable (resp strictly enforceable) if u;> v;(resp. ui>vi)for alliE N Perfect folk theorems The strategies in the proof of the Nash folk theorem( say, with discounting) call for indefinite punishment following a deviation from the equilibrium. This is fine in games such as the Prisoner's Dilemma, in which the minmax actions are actually an equilibrium(indeed, in PD, they are the only equilibrium! ) That is, the threat of indefinite punishment is indeed credible A D A2,31,6 D0.10,1 Figure 1: Indefinite Punishment is not credible However, consider the game in Figure 1. The Row player can hold the Column player down to his minimum payoff by choosing D, but D is strictly dominated for her. Hence, while (2, 3) can be enforced as a Nash equilibrium payoff profile in the infinitely repeated version of the game by assuming that players switch to(D, D) following a deviation, this would not work if we required subgame perfection, regardless of the payoff aggregation criterion A warm-up exercise: Perfect Folk Theorems for Limit-of-Means Thus, we must be smarter than that. a key intuition is that, after all, a deviator need not be punished forever, but only long enough to wipe out any gains from his deviation Formally, fix a game G=(N, (Ai, lilieN )and let M= maxienaea ui (a). Fix an outcome aof G; clearly, a one-time deviation by Player i cannot yield more than M-ui(a). Thus, if Player i deviates, we need only punish him so as to wipe out this payoff differential; clearly, how long this will be depends on the payoff aggregation criterion Intuitively, limit-of-means should make things very simple, because, following a deviation punishers face only a finite number of periods in which they must forego their equilibriumNote: OR actually allows for arbitrary state spaces. For most proofs, finite state spaces are enough. There is one exception, which I will note below. Definition 2 Fix a normal-form game G. The minmax payoff for player i, vi , is defined by vi = mina−i∈A−i maxai∈Ai ui(ai , a−i). We are not going to worry about implementing mixtures of action profiles in this lecture, so I just remind you of the definition of enforceability: Definition 3 Fix a normal-form game G. A payoff profile u ∈ RN is enforceable (resp. strictly enforceable) if ui ≥ vi (resp. ui > vi) for all i ∈ N Perfect Folk Theorems The strategies in the proof of the Nash folk theorem (say, with discounting) call for indefinite punishment following a deviation from the equilibrium. This is fine in games such as the Prisoner’s Dilemma, in which the minmax actions are actually an equilibrium (indeed, in PD, they are the only equilibrium!). That is, the threat of indefinite punishment is indeed credible. A D A 2,3 1,6 D 0,1 0,1 Figure 1: Indefinite Punishment is not credible However, consider the game in Figure 1. The Row player can hold the Column player down to his minimum payoff by choosing D, but D is strictly dominated for her. Hence, while (2,3) can be enforced as a Nash equilibrium payoff profile in the infinitely repeated version of the game by assuming that players switch to (D,D) following a deviation, this would not work if we required subgame perfection, regardless of the payoff aggregation criterion. A warm-up exercise: Perfect Folk Theorems for Limit-of-Means Thus, we must be smarter than that. A key intuition is that, after all, a deviator need not be punished forever, but only long enough to wipe out any gains from his deviation. Formally, fix a game G = (N,(Ai , ui)i∈N ) and let M = maxi∈N,a∈A ui(a). Fix an outcome a ∗ of G; clearly, a one-time deviation by Player i cannot yield more than M −ui(a ∗ ). Thus, if Player i deviates, we need only punish him so as to wipe out this payoff differential; clearly, how long this will be depends on the payoff aggregation criterion. Intuitively, limit-of-means should make things very simple, because, following a deviation, punishers face only a finite number of periods in which they must forego their equilibrium 2
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有