中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 06 Game Playing

Games Perfect play（最优策略） minimax decisions α − β Pruning Resource limits and approximate evaluation Games of chance (包含几率因素的游戏) Games of imperfect information

团购合买资源类别：文库，文档格式：PDF，文档页数：78，文件大小：5.73MB

Game Playing 吉建民 USTC jianminOustc.edu.cn 2022年3月28日口◆4日1三1，是90C

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Game Playing 吉建民 USTC jianmin@ustc.edu.cn 2022 年 3 月 28 日

Used Materials Disclaimer:本课件采用了S.Russell and P.Norvig's Artificial Intelligence-A modern approach slides,徐林莉老师课件和其他网络课程课件，也采用了GitHub中开源代码，以及部分网络博客内容口卡4三4色进分QC

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Used Materials Disclaimer: 本课件采用了 S. Russell and P. Norvig’s Artificial Intelligence –A modern approach slides, 徐林莉老师课件和其他网络课程课件，也采用了 GitHub 中开源代码，以及部分网络博客内容

Table of Contents Games Perfect play(最优策略 minimax decisions a-B Pruning Resource limits and approximate evaluation Games of chance(包含几率因素的游戏 Games of imperfect information 4口◆464三+1=，￥9QC

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table of Contents Games Perfect play（最优策略） minimax decisions α − β Pruning Resource limits and approximate evaluation Games of chance (包含几率因素的游戏) Games of imperfect information

Game Playing Game playing was thought to be a good problem for Al research: game playing is non-trivial ~Perfect play(最优策略) ,players need“human--like”intelligence games can be very complex (e.g.,Chess,Go) requires decision making within limited time games usually are: well-defined and repeatable fully observable and limited environments can directly compare humans and computers 口卡·三4色，是分QC

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Game Playing Game playing was thought to be a good problem for AI research: ▶ game playing is non-trivial ▶ Perfect play（最优策略） ▶ players need “human-like”intelligence ▶ games can be very complex (e.g., Chess, Go) ▶ requires decision making within limited time ▶ games usually are: ▶ well-defined and repeatable ▶ fully observable and limited environments ▶ can directly compare humans and computers

Computers Playing Chess 4口卡404三·1=生0C

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computers Playing Chess

Computers Playing Go KBA Google DeepMind o:AlphaGo Challenge Match 口·三4，进分双0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computers Playing Go

Games vs.search problems “Unpredictable”opponent(不可预测的对手)→solution is a strategy specifying a move for every possible opponent reply Time limits=unlikely to find goal,must approximate 游戏对于低效率有严厉的惩罚 Plan of attack: Computer considers possible lines of play (Babbage,1846) Algorithm for perfect play (Zermelo,1912;Von Neumann, 1944) Finite horizon,approximate evaluation(Zuse,1945;Wiener, 1948:Shannon,1950) First chess program(Turing,1951) Machine learning to improve evaluation accuracy (Samuel, 1952-57) Pruning (to allow deeper search(McCarthy,1956)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Games vs. search problems “Unpredictable”opponent（不可预测的对手）⇒ solution is a strategy specifying a move for every possible opponent reply Time limits ⇒ unlikely to find goal, must approximate 游戏对于低效率有严厉的惩罚 Plan of attack: ▶ Computer considers possible lines of play (Babbage, 1846) ▶ Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944) ▶ Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948;Shannon, 1950) ▶ First chess program (Turing, 1951) ▶ Machine learning to improve evaluation accuracy (Samuel, 1952-57) ▶ Pruning（剪枝）to allow deeper search (McCarthy, 1956)

Types of games Deterministic Stochastic (chance perfect chess,checkers, Backgammon（西洋双 information G0（围棋)，陆棋) othello monopoly imperfect battleships, bridge,poker,scrabble information blind tictactoe (拼字游戏) nuclear war 口卡回·三4色是分QC

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of games

Game Theory Models strategic interactions as games In normal-form games(matrix games),all players simultaneously select an action,and their joint action determines their individual payoff One-shot interaction Can be represented as an n-dimensional payoff matrix,for n players A player's strategy is defined as a probability distribution over his possible actions Stochastic games is an extension of normal-form games and MDPs in the sense that they deal with multiple agents in a multiple state situation. 4口◆4⊙t1三1=，￥9QC

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Game Theory ▶ Models strategic interactions as games ▶ In normal-form games (matrix games), all players simultaneously select an action, and their joint action determines their individual payoff ▶ One-shot interaction ▶ Can be represented as an n-dimensional payoff matrix, for n players ▶ A player’s strategy is defined as a probability distribution over his possible actions ▶ Stochastic games is an extension of normal-form games and MDPs in the sense that they deal with multiple agents in a multiple state situation

Normal-Form Game A normal-form game can be defined as a tuple (n,A1...n:R1..)where: n is the number of agents A;is the action set for player i A=A1 x...x An is the joint action set Ri:A-R is the reward function of player i Each agent i selects policyπi:Ai→[0，l刂（πi∈PD(A), takes action ai E A;with probability i(ai),and receives utility Ri(a1,...,an) Given policy profile(πi,,πn）,expected utility to iis R(m1,....)=>R(a)IIm(aj) =1 Agents want to maximise their expected utilities 口卡回t·三4色，是分Q0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normal-Form Game ▶ A normal-form game can be defined as a tuple (n, A1...n, R1...n) where: ▶ n is the number of agents ▶ Ai is the action set for player i ▶ A = A1 × · · · × An is the joint action set ▶ Ri : A → R is the reward function of player i ▶ Each agent i selects policy πi : Ai → [0, 1] (πi ∈ PD(Ai)), takes action ai ∈ Ai with probability πi(ai), and receives utility Ri(a1, . . . , an) ▶ Given policy profile ⟨π1, . . . , πn⟩, expected utility to i is Ri(π1, . . . , πn) = ∑ a∈A Ri(a) ∏n j=1 πj(aj) ▶ Agents want to maximise their expected utilities

点击进入文档下载页（PDF格式）

共78页，可试读20页，点击继续阅读 ↓↓

点击下载（PDF格式）

浏览记录