正在加载图片...
Proof: We argue by induction on the length of histories. Let L=maxe(h): hEH\Z (L) Pick any h with e(h)=L. Then T(h)is a single-person decision problem; let bi (h) denote the optimal choice of i= P(h)at H, and define o(h)=(h, bi (h)), the corresponding outcome, and c(h)=bi(h), the continuation history induced by bi at h (e) Assume that the procedure has been carried out for all histories of length (+1,.,L and consider a history h with ((h)=e. Consider the decision problem faced by Player i P(h)and defined by the action set A (h), and by the assumption that action a E A: (h) leads to outcome o(h, a)). Denote by bi(h) Player is choice at h, and define o(h)=o((h, bi(h)) and c(h)=(bi(h), c((h, bi(h))) The procedure ends in finitely many steps. It should be clear that the resulting profile b=(b2) eN is a SPE.■ Note that the only reason why the argument does not generalize to games with observable actions is that we insist on choosing pure actions in each subgame Backward induction is viewed as the natural extension of dynamic programming to in- teractive environments. From a purely formal standpoint, this view is of course correct However, the interpretation of the two procedures is rather different; we return to this point belo The analogy with dynamic programming suggests that a version of the one-deviation principle might hold. This is indeed the case A preliminary definition: say that a game has finite horizon iff it has no infinite histories Thus, a finite game has finite horizon, but not conversely Proposition 0.2 Consider an extensive game r with observable actions and finite horizon A strategy profile s is a Spe of r iff there exists no player i E N, history h such that iE P(h), and action a'+si(h)with the property that the continuation strategy s'h defined sAh(o and Wh'E Hn\0), sln(h)=si(h, h) is a profitable deviation in r(h) Thus, assume there is no profitable one-time deviation, but that s is not a SPE. in r(h) Proof: Clearly, if s is a SPE, sih cannot be a profitable deviation from si Then there must exist a player iE N, a history h and at least one continuation strategy h that sin is a profitable dey siIn in r(h) Note that any profitable deviation sIh must differ from silh at more than one history Thus, pick a deviation sh which differs from siIn the least, i. e. such that no other devia- tion differs from silh at fewer histories(if there is more than one such deviations, pick one arbitrarily). This is well-defined because the game has a finite horizon Now let h' be the longest(continuation) history of r(h)where sAh differs from silh, i such that sn(h)+siln(h"). Again, this is well-defined because the game has a finite horizonProof: We argue by induction on the length of histories. Let L = max{`(h) : h ∈ H \Z}. (L) Pick any h with `(h) = L. Then Γ(h) is a single-person decision problem; let bi(h) denote the optimal choice of i = P(h) at H, and define o(h) = (h, bi(h)), the corresponding outcome, and c(h) = bi(h), the continuation history induced by bi at h. (`) Assume that the procedure has been carried out for all histories of length `+1, . . . , L, and consider a history h with `(h) = `. Consider the decision problem faced by Player i = P(h) and defined by the action set Ai(h), and by the assumption that action a ∈ Ai(h) leads to outcome o((h, a)). Denote by bi(h) Player i’s choice at h, and define o(h) = o((h, bi(h)) and c(h) = (bi(h), c((h, bi(h)))). The procedure ends in finitely many steps. It should be clear that the resulting profile b = (bi)i∈N is a SPE. Note that the only reason why the argument does not generalize to games with observable actions is that we insist on choosing pure actions in each subgame. Backward induction is viewed as the natural extension of dynamic programming to in￾teractive environments. From a purely formal standpoint, this view is of course correct. However, the interpretation of the two procedures is rather different; we return to this point below. The analogy with dynamic programming suggests that a version of the one-deviation principle might hold. This is indeed the case. A preliminary definition: say that a game has finite horizon iff it has no infinite histories. Thus, a finite game has finite horizon, but not conversely. Proposition 0.2 Consider an extensive game Γ with observable actions and finite horizon. A strategy profile s is a SPE of Γ iff there exists no player i ∈ N, history h such that i ∈ P(h), and action a 0 6= si(h) with the property that the continuation strategy s 0 i |h defined by s 0 i |h(∅) = a 0 and ∀h 0 ∈ H|h \ {∅}, s0 i |h(h 0 ) = si(h, h0 ) is a profitable deviation in Γ(h). Proof: Clearly, if s is a SPE, s 0 i |h cannot be a profitable deviation from si |h in Γ(h). Thus, assume there is no profitable one-time deviation, but that s is not a SPE. Then there must exist a player i ∈ N, a history h and at least one continuation strategy s 00 i |h such that s 00 i |h is a profitable deviation from si |h in Γ(h). Note that any profitable deviation s 00 i |h must differ from si |h at more than one history. Thus, pick a deviation s 0 i |h which differs from si |h the least, i.e. such that no other devia￾tion differs from si |h at fewer histories (if there is more than one such deviations, pick one arbitrarily). This is well-defined because the game has a finite horizon. Now let h 0 be the longest (continuation) history of Γ(h) where s 0 i |h differs from si |h, i.e such that s 0 i |h(h 0 ) 6= si |h(h 0 ). Again, this is well-defined because the game has a finite horizon. 3
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有