正在加载图片...
First, Q=N, P(,t)). N is the normal state, in which a* is played. P(, t )is the state in which i is punished, and t more rounds of punishment are required Second, q=N. Third, fi(N)=a*, fi(P(, t))=p-ii if jti, and f, (P(, t)=ri(p-i This should be obvious Finally, T(, ) is such that we remain in N if nobody deviates, we switch from N to P(, L)if j is the lowest-index deviator, we always move from P(, t)to P(, t-1)if t+0 and we always move from P(, O) back to N Easy Discounting The strategy profile thus constructed is not a subgame-perfect equilibrium of the game in Figure 1 if payoffs are aggregated via discounting, for any discount factor. Suppose the Column player deviates: then the Row player is supposed to choose D for 2 periods, and hence receive a How payoff of 0 units. She will then receive 2 forever after. However, if she deviates to A, nothing happens: the Column player continues to choose D, so again the row player can choose A. Obviously, she prefers(1, 1, 2, 2, .. )to(0,0, 2, 2,...)! Thus, the key intuition is that we must somehow ensure that punishers are willing te punish. In principle, one could think of punishing punishers, but this may fail to work with discounting: essentially, second deviations might require longer punishment periods(because the burden of carrying out the first punishment lasts for L periods, and not just one), third deviations might require even longer punishments, and so on. This is certainly the case in the game of Figure 1 The alternative is to reward punishers. This leads to OR's Proposition 151.1 o I am only going to offer a few comments on the proof. The"conciliation"states C()serve reward punishers, in a way. However, this is subtle: we never go back to the Nash state C(O). What happens is, if i deviates and i punishes him, then after punishment we move to C(, which i prefers to C(i). Otherwise, we go to C(i) and stay there until somebody else deviates.In my opinion, this is also a punishment of sorts, but of course you are welcome to differ Also the remark about the first condition on d being sufficient has to do with the fact that, after L periods of punishment, we move to C(): since ui(a())<u(a())by assump- tion, this is actually a further punishment(which, er, actually reinforces my interpretation but never mind that). The point is that this punishment may not be enough, or may come 1 Again, let me repeat this because I first got it wrong in class(but then you guys spotted me!):whenever we are in state C(), we remain there if somebody deviates, and move to the state P(k, L)in which Player a deviation by Player j is the threat of further punishment; a() need not be a Nash equilibrium peps after k's punishment begins if k deviates from a(). Thus, what supports a() as a continuation equilibriumFirst, Q = {N, P(j, t)}. N is the normal state, in which a ∗ is played. P(j, t) is the state in which j is punished, and t more rounds of punishment are required. Second, q 0 = N. Third, fi(N) = a ∗ i , fi(P(j, t)) = p−j,i if j 6= i, and fj (P(j, t)) = rj (p−j ). This should be obvious. Finally, τ (·, ·) is such that we remain in N if nobody deviates, we switch from N to P(j, L) if j is the lowest-index deviator, we always move from P(j, t) to P(j, t − 1) if t 6= 0, and we always move from P(j, 0) back to N. Easy! Discounting The strategy profile thus constructed is not a subgame-perfect equilibrium of the game in Figure 1 if payoffs are aggregated via discounting, for any discount factor. Suppose the Column player deviates: then the Row player is supposed to choose D for 2 periods, and hence receive a flow payoff of 0 units. She will then receive 2 forever after. However, if she deviates to A, nothing happens: the Column player continues to choose D, so again the Row player can choose A. Obviously, she prefers (1,1,2,2,. . .) to (0,0,2,2,. . .)! Thus, the key intuition is that we must somehow ensure that punishers are willing to punish. In principle, one could think of punishing punishers, but this may fail to work with discounting: essentially, second deviations might require longer punishment periods (because the burden of carrying out the first punishment lasts for L periods, and not just one), third deviations might require even longer punishments, and so on. This is certainly the case in the game of Figure 1. The alternative is to reward punishers. This leads to OR’s Proposition 151.1. I am only going to offer a few comments on the proof. The “conciliation” states C(j) serve to reward punishers, in a way. However, this is subtle: we never go back to the Nash state C(0). What happens is, if j deviates and i punishes him, then after punishment we move to C(j), which i prefers to C(i). Otherwise, we go to C(i) and stay there until somebody else deviates.1 In my opinion, this is also a punishment of sorts, but of course you are welcome to differ. Also: the remark about the first condition on δ being sufficient has to do with the fact that, after L periods of punishment, we move to C(j); since ui(a(j)) < ui(a(0)) by assump￾tion, this is actually a further punishment (which, er, actually reinforces my interpretation, but never mind that). The point is that this punishment may not be enough, or may come 1Again, let me repeat this because I first got it wrong in class (but then you guys spotted me!): whenever we are in state C(j), we remain there if somebody deviates, and move to the state P(k, L) in which Player k’s punishment begins if k deviates from a(j). Thus, what supports a(j) as a continuation equilibrium after a deviation by Player j is the threat of further punishment; a(j) need not be a Nash equilibrium per se. 4
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有