payoff stream in order to punish Play_中国高校课件下载中心

正在加载图片...

payoff stream in order to punish Player i. Under limit-of-means aggregation, finite periods do not matter, so punishers are indifferent between punishing and not punishing There is only one subtlety: of course, the same argument applies to Player i: no one-time deviation can be profitable for her! So, what exactly are we trying to deter? te The answer is that, although no finite deviation from the infinite repetition of a* is prof- itable for Player i, the following strategy could be. Suppose that, in the putative equilibrium we wish to construct, a deviation is followed by L rounds of punishment(you can think of L'=I if you wish). Thus, if Player i deviates once, she gets an extra payoff of (at most) M-ui(a), but then loses ui (a*)-vi utils in each of the subsequent L perioc Now suppose L is small, so that M-ui(a*>Lfu(a)-vi. For example, in the game of Figure 1, suppose that i is the Column player, that a*=(A, A), and that L= 1. Then a deviation yields 3 utils, whereas one round of punishment costs 2 utils. Then, Player i can dopt the following strategy: deviate, then play a best-response to p-i(in Figure 1, play D for L periods; then, as soon as play is supposed to return to a*, deviate and best-respond to the minmax profile, and so on. This is a profitable deviation. Observe that it is also a neat example of a game in which the one-deviation property holds Thus, we must choose L large enough so ViE N, M-u(a)< Lui(a)-vil In figure 1. it is enough to choose l To complete the argument, we must specify what happens if more than one player de- viates, or if somebody deviates from the punishment stage. As in the proof of the Nash Folk Theorem, multiple deviations lead to the lowest-index player being punished(they will not occur in equilibrium anyway, so players cannot hope to get away with deviating because somebody else will deviate, too Finally, if one or more punishers fail to punish, we simply disregard these further(un- profitable) deviations: again, in equilibrium they are not going to occur, so players cannot count on them to improve their predicament after a first deviation This Proposition 0.1 OR, Proposition 146.2 Fix a game G. Any feasible, strictly enforce- able payoff profile of G is a subgame-perfect equilibrium payoff profile of the limit-of-means infinitely repeated version of G Machines You will undoubtedly notice that describing these strategies verbally is awkward; doing so formally(as we have tried to do in the proof of the Nash Folk theorem) is even worse Machines can help. For instance, here is a set of machines that players can use to implement the strategies in the proof of Proposition 0.1: Player i uses machine M (Q, q, fi, T)(where Q, q, T are common to all players) defined as followspayoff stream in order to punish Player i. Under limit-of-means aggregation, finite periods do not matter, so punishers are indifferent between punishing and not punishing. There is only one subtlety: of course, the same argument applies to Player i: no one-time deviation can be profitable for her ! So, what exactly are we trying to deter? The answer is that, although no finite deviation from the infinite repetition of a ∗ is profitable for Player i, the following strategy could be. Suppose that, in the putative equilibrium we wish to construct, a deviation is followed by L rounds of punishment (you can think of L 0 = 1 if you wish). Thus, if Player i deviates once, she gets an extra payoff of (at most) M − ui(a ∗ ), but then loses ui(a ∗ ) − vi utils in each of the subsequent L periods. Now suppose L is small, so that M − ui(a ∗ ) > L[ui(a ∗ ) − vi ]. For example, in the game of Figure 1, suppose that i is the Column player, that a ∗ = (A, A), and that L = 1. Then a deviation yields 3 utils, whereas one round of punishment costs 2 utils. Then, Player i can adopt the following strategy: deviate, then play a best-response to p−i (in Figure 1, play D) for L periods; then, as soon as play is supposed to return to a ∗ , deviate and best-respond to the minmax profile, and so on. This is a profitable deviation. [Observe that it is also a neat example of a game in which the one-deviation property holds!] Thus, we must choose L large enough so ∀i ∈ N, M − ui(a ∗ ) < L[ui(a ∗ ) − vi ] In Figure 1, it is enough to choose L 0 = 2. To complete the argument, we must specify what happens if more than one player deviates, or if somebody deviates from the punishment stage. As in the proof of the Nash Folk Theorem, multiple deviations lead to the lowest-index player being punished (they will not occur in equilibrium anyway, so players cannot hope to get away with deviating because somebody else will deviate, too). Finally, if one or more punishers fail to punish, we simply disregard these further (unprofitable) deviations: again, in equilibrium they are not going to occur, so players cannot count on them to improve their predicament after a first deviation. This proves: Proposition 0.1 [OR, Proposition 146.2] Fix a game G. Any feasible, strictly enforceable payoff profile of G is a subgame-perfect equilibrium payoff profile of the limit-of-means infinitely repeated version of G. Machines You will undoubtedly notice that describing these strategies verbally is awkward; doing so formally (as we have tried to do in the proof of the Nash Folk theorem) is even worse. Machines can help. For instance, here is a set of machines that players can use to implement the strategies in the proof of Proposition 0.1: Player i uses machine Mi = (Q, q0 , fi , τ ) (where Q, q0 , τ are common to all players) defined as follows. 3

<<向上翻页向下翻页>>

点击下载：《博弈论》（英文版） lect 13 Repeated Games