gametheory《经济学理论》 lecture 11: Subgame perfection.pdf_大学文库

The normal form of the Entry Deterrence game has two pure Nash equilibria, namely (N, f) and(E, a). However, the standard reasoning goes, the first equilibrium is somewhat unreasonable: after all, if Player 1 were to Enter, Player 2 would not want to fight, ex-post Thus, the(n, f)equilibrium is supported by a non-credible threat. For this reason, we wish to rule it out Reinhard Selten proposed the following. First, note that any non-terminal history h in game with observable actions defines a subgame--the game defined by all histories of which h is a subhistory. For instance, in the simple game of Figure 1, there is a subgame starting at(E), corresponding the histories(E),(E, a) and(E, f). The only other subgame is the whole game, which may be viewed as beginning after 0, the initial history A subgame-perfect equilibrium of the original game is then a profile of strategies which induces Nash equilibria in every subgame. Thus, the (N, f) equilibrium is not subgame- perfect, because f is not a Nash equilibrium(trivially, an optimal strategy) in the subgame tarting at(E) We are ready for a few formal definitions Definition 1 Fix an extensive game with observable actions T=(N, A, H, Z, P, UiieN and a non-terminal history hH\Z. The subgame starting at h is the extensive game with observable actions r(h)=(N, A Zh, Plh, (Uiln)ieN), where Hn=h:(h, h)E H is the set of continuation histories, i.e. seque profiles h' such that(h, h) is a history in H; Zn is defined analogously Phh is the restriction of the player correspondence P to continuation histories: formally, for every h'E Hh, Pln(h)=P((h, h)); and each UiIn is defined analogously Similarly, for any strategy profiles=(si)ieN E S in the game r, shh=(siIn)ieN is the strategy profile induced by s in r(h) by letting sin(h)=si((h, h)) for every h'E H]h and i∈Ph(h Note that, according to the definition, the whole game r corresponds to the subgame r(0) Definition 2 Consider a game r with observable actions. A strategy profile s E s is subgame-perfect equilibrium of r iff, for every non-terminal history hE H\Z, sIh is a Nash equilibrium of r(h) Since r=r(0), any subgame-perfect equilibrium(or SPE for short)is necessarily also a Nash equilibrium. The example given above shows that the converse is false Whenever one introduces a solution concept, the first order of business is to establish the existence of a solution according to the concept under consideration. In finite games with perfect information, this is easy to achieve using the backward induction algorithm. Proposition 0.1 Every finite extensive game with perfect information has a SPE

The normal form of the Entry Deterrence game has two pure Nash equilibria, namely (N, f) and (E, a). However, the standard reasoning goes, the first equilibrium is somewhat unreasonable: after all, if Player 1 were to Enter, Player 2 would not want to fight, ex-post. Thus, the (N, f) equilibrium is supported by a non-credible threat. For this reason, we wish to rule it out. Reinhard Selten proposed the following. First, note that any non-terminal history h in a game with observable actions defines a subgame—the game defined by all histories of which h is a subhistory. For instance, in the simple game of Figure 1, there is a subgame starting at (E), corresponding the histories (E), (E, a) and (E, f). The only other subgame is the whole game, which may be viewed as beginning after ∅, the initial history. A subgame-perfect equilibrium of the original game is then a profile of strategies which induces Nash equilibria in every subgame. Thus, the (N, f) equilibrium is not subgameperfect, because f is not a Nash equilibrium (trivially, an optimal strategy) in the subgame starting at (E). We are ready for a few formal definitions. Definition 1 Fix an extensive game with observable actions Γ = (N, A, H, Z, P,(Ui)i∈N ) and a non-terminal history h ∈ H \ Z. The subgame starting at h is the extensive game with observable actions Γ(h) = (N, A, H|h, Z|h, P|h,(Ui |h)i∈N ), where: H|h = {h 0 : (h, h0 ) ∈ H} is the set of continuation histories, i.e. sequences of action profiles h 0 such that (h, h0 ) is a history in H; Z|h is defined analogously; P|h is the restriction of the player correspondence P to continuation histories: formally, for every h 0 ∈ H|h, P|h(h 0 ) = P((h, h0 )); and each Ui |h is defined analogously. Similarly, for any strategy profile s = (si)i∈N ∈ S in the game Γ, s|h = (si |h)i∈N is the strategy profile induced by s in Γ(h) by letting si |h(h 0 ) = si((h, h0 )) for every h 0 ∈ H|h and i ∈ P|h(h 0 ). Note that, according to the definition, the whole game Γ corresponds to the subgame Γ(∅). Definition 2 Consider a game Γ with observable actions. A strategy profile s ∈ S is a subgame-perfect equilibrium of Γ iff, for every non-terminal history h ∈ H \Z, s|h is a Nash equilibrium of Γ(h). Since Γ = Γ(∅), any subgame-perfect equilibrium (or SPE for short) is necessarily also a Nash equilibrium. The example given above shows that the converse is false. Whenever one introduces a solution concept, the first order of business is to establish the existence of a solution according to the concept under consideration. In finite games with perfect information, this is easy to achieve using the backward induction algorithm. Proposition 0.1 Every finite extensive game with perfect information has a SPE. 2

Proof: We argue by induction on the length of histories. Let L=maxe(h): hEH\Z (L) Pick any h with e(h)=L. Then T(h)is a single-person decision problem; let bi (h) denote the optimal choice of i= P(h)at H, and define o(h)=(h, bi (h)), the corresponding outcome, and c(h)=bi(h), the continuation history induced by bi at h (e) Assume that the procedure has been carried out for all histories of length (+1,.,L and consider a history h with ((h)=e. Consider the decision problem faced by Player i P(h)and defined by the action set A (h), and by the assumption that action a E A: (h) leads to outcome o(h, a)). Denote by bi(h) Player is choice at h, and define o(h)=o((h, bi(h)) and c(h)=(bi(h), c((h, bi(h))) The procedure ends in finitely many steps. It should be clear that the resulting profile b=(b2) eN is a SPE.■ Note that the only reason why the argument does not generalize to games with observable actions is that we insist on choosing pure actions in each subgame Backward induction is viewed as the natural extension of dynamic programming to in- teractive environments. From a purely formal standpoint, this view is of course correct However, the interpretation of the two procedures is rather different; we return to this point belo The analogy with dynamic programming suggests that a version of the one-deviation principle might hold. This is indeed the case A preliminary definition: say that a game has finite horizon iff it has no infinite histories Thus, a finite game has finite horizon, but not conversely Proposition 0.2 Consider an extensive game r with observable actions and finite horizon A strategy profile s is a Spe of r iff there exists no player i E N, history h such that iE P(h), and action a'+si(h)with the property that the continuation strategy s'h defined sAh(o and Wh'E Hn\0), sln(h)=si(h, h) is a profitable deviation in r(h) Thus, assume there is no profitable one-time deviation, but that s is not a SPE. in r(h) Proof: Clearly, if s is a SPE, sih cannot be a profitable deviation from si Then there must exist a player iE N, a history h and at least one continuation strategy h that sin is a profitable dey siIn in r(h) Note that any profitable deviation sIh must differ from silh at more than one history Thus, pick a deviation sh which differs from siIn the least, i. e. such that no other devia- tion differs from silh at fewer histories(if there is more than one such deviations, pick one arbitrarily). This is well-defined because the game has a finite horizon Now let h' be the longest(continuation) history of r(h)where sAh differs from silh, i such that sn(h)+siln(h"). Again, this is well-defined because the game has a finite horizon

Proof: We argue by induction on the length of histories. Let L = max{`(h) : h ∈ H \Z}. (L) Pick any h with `(h) = L. Then Γ(h) is a single-person decision problem; let bi(h) denote the optimal choice of i = P(h) at H, and define o(h) = (h, bi(h)), the corresponding outcome, and c(h) = bi(h), the continuation history induced by bi at h. (`) Assume that the procedure has been carried out for all histories of length `+1, . . . , L, and consider a history h with `(h) = `. Consider the decision problem faced by Player i = P(h) and defined by the action set Ai(h), and by the assumption that action a ∈ Ai(h) leads to outcome o((h, a)). Denote by bi(h) Player i’s choice at h, and define o(h) = o((h, bi(h)) and c(h) = (bi(h), c((h, bi(h)))). The procedure ends in finitely many steps. It should be clear that the resulting profile b = (bi)i∈N is a SPE. Note that the only reason why the argument does not generalize to games with observable actions is that we insist on choosing pure actions in each subgame. Backward induction is viewed as the natural extension of dynamic programming to interactive environments. From a purely formal standpoint, this view is of course correct. However, the interpretation of the two procedures is rather different; we return to this point below. The analogy with dynamic programming suggests that a version of the one-deviation principle might hold. This is indeed the case. A preliminary definition: say that a game has finite horizon iff it has no infinite histories. Thus, a finite game has finite horizon, but not conversely. Proposition 0.2 Consider an extensive game Γ with observable actions and finite horizon. A strategy profile s is a SPE of Γ iff there exists no player i ∈ N, history h such that i ∈ P(h), and action a 0 6= si(h) with the property that the continuation strategy s 0 i |h defined by s 0 i |h(∅) = a 0 and ∀h 0 ∈ H|h \ {∅}, s0 i |h(h 0 ) = si(h, h0 ) is a profitable deviation in Γ(h). Proof: Clearly, if s is a SPE, s 0 i |h cannot be a profitable deviation from si |h in Γ(h). Thus, assume there is no profitable one-time deviation, but that s is not a SPE. Then there must exist a player i ∈ N, a history h and at least one continuation strategy s 00 i |h such that s 00 i |h is a profitable deviation from si |h in Γ(h). Note that any profitable deviation s 00 i |h must differ from si |h at more than one history. Thus, pick a deviation s 0 i |h which differs from si |h the least, i.e. such that no other deviation differs from si |h at fewer histories (if there is more than one such deviations, pick one arbitrarily). This is well-defined because the game has a finite horizon. Now let h 0 be the longest (continuation) history of Γ(h) where s 0 i |h differs from si |h, i.e such that s 0 i |h(h 0 ) 6= si |h(h 0 ). Again, this is well-defined because the game has a finite horizon. 3

Here(did2, D) is not a SPE, but it is a Nash equilibrium. It fails the test of perfection because, upon being reached (contrary to the equilibrium prescription! Player 2"should" really expect Player 1 to continue with a2, not d2, because "Player 2 is certain that Player 1 is rational. But this argument is even more ambiguous Theorists have fiercely argued over these and similar examples. I cannot do justice to either side of the debate without a full-blown model of interactive beliefs specifically developed to account for dynamically evolving beliefs. However, for the time being, let me e a few informal but rigorous observation First, in a dynamic game, statements such as "Player 2 is certain that Player l is rational are meaningless. One must specify when, in the course of the game, that statement is assumed to be true-when, for example, Player 2 is certain that his opponent is rational, or Player 2 is certain that Player 1 expects Player 2 to choose D The reason is that, in a dynamic game(and especially in a game with observable actions! players acquire new information as the play unfolds. This information may lead them to update, i. e. "refine"their beliefs, or it may lead them to completely revise them For example, Player 2 might be certain, at the initial history, that Player 1 chooses di at the initial history. However, if Player 1 chooses al instead, Player 2 observes this, so it would be impossible for her to continue to be certain after the history(a,) that she chooses d1. Player 2 has to revise her beliefs Reasoning along these lines, the argument against backward induction in the Centipede game can be reformulated more convincingly. Suppose that, at the beginning of the game both players' beliefs are consistent with the unique SPE. Moreover, suppose that, at the beginning of the game, both players are certain that their opponent is rational (you can assume that there is common certainty of this: it is not going to matter Could it be the case that Player 1 chooses a1 at the initial history, contrary to the SPe prediction? The answer is, surprisingly, yes! For instance, this will be the unique best response of Player 1 if she is certain at the initial node that (1) Player 2 is certain, at the initial node, that Player I will play dyd2, and that Player I is rational(this much we had already assumed) (2) However, Player 2, upon observing al, will revise her beliefs and become certain at (an) that Player 1 is actually irrational and will choose a2 after(a1, A) (3)Player 2 is rational(again, this mud Ir assumptions Note that(2)+ 3)imply that Player 2, if he has to choose, will pick A and not D. But hen, of course, Player 1 will rationally choose an herself The key point is that, even if players'beliefs are concentrated on the equilibrium at the beginning of the game, it can still be the case that the way they revise their beliefs leads them to act differently, both off and on the equilibrium path To put it differently, in order to justify SPE one has to make assumptions about how players revise their beliefs. Characterizing backward induction in perfect-information games

Here (d1d2, D) is not a SPE, but it is a Nash equilibrium. It fails the test of perfection because, upon being reached (contrary to the equilibrium prescription!) Player 2 “should” really expect Player 1 to continue with a2, not d2, because “Player 2 is certain that Player 1 is rational.” But this argument is even more ambiguous! Theorists have fiercely argued over these and similar examples. I cannot do justice to either side of the debate without a full-blown model of interactive beliefs specifically developed to account for dynamically evolving beliefs. However, for the time being, let me propose a few informal but rigorous observation. First, in a dynamic game, statements such as “Player 2 is certain that Player 1 is rational” are meaningless. One must specify when, in the course of the game, that statement is assumed to be true—when, for example, Player 2 is certain that his opponent is rational, or Player 2 is certain that Player 1 expects Player 2 to choose D. The reason is that, in a dynamic game (and especially in a game with observable actions!), players acquire new information as the play unfolds. This information may lead them to update, i.e. “refine” their beliefs, or it may lead them to completely revise them. For example, Player 2 might be certain, at the initial history, that Player 1 chooses d1 at the initial history. However, if Player 1 chooses a1 instead, Player 2 observes this, so it would be impossible for her to continue to be certain after the history (a1) that she chooses d1. Player 2 has to revise her beliefs. Reasoning along these lines, the argument against backward induction in the Centipede game can be reformulated more convincingly. Suppose that, at the beginning of the game, both players’ beliefs are consistent with the unique SPE. Moreover, suppose that, at the beginning of the game, both players are certain that their opponent is rational (you can assume that there is common certainty of this: it is not going to matter). Could it be the case that Player 1 chooses a1 at the initial history, contrary to the SPE prediction? The answer is, surprisingly, yes! For instance, this will be the unique best response of Player 1 if she is certain at the initial node that: (1) Player 2 is certain, at the initial node, that Player 1 will play d1d2, and that Player 1 is rational (this much we had already assumed). (2) However, Player 2, upon observing a1, will revise her beliefs and become certain at (a1) that Player 1 is actually irrational and will choose a2 after (a1, A). (3) Player 2 is rational (again, this much was already part of our assumptions). Note that (2)+(3) imply that Player 2, if he has to choose, will pick A and not D. But then, of course, Player 1 will rationally choose a1 herself! The key point is that, even if players’ beliefs are concentrated on the equilibrium at the beginning of the game, it can still be the case that the way they revise their beliefs leads them to act differently, both off and on the equilibrium path. To put it differently, in order to justify SPE one has to make assumptions about how players revise their beliefs. Characterizing backward induction in perfect-information games 5