《电子商务 E-business》阅读文献：Reinforcement Learning An Introduction

团购合买资源类别：文库，文档格式：PDF，文档页数：369，文件大小：2.57MB

Code for Sutton Barto Book: Reinforcement Learning: An Introductio Code for. Reinforcement Learning An Introduction by richard s Sutton and andrew G. Barto Below are links to a variety of software related to examples and exercises in the book, organized by chapters(some files appear in multiple places). See particularly the Mountain Car code. Most of the rest of the code is written in Common Lisp and requires utility routines available here. For the graphics, you will need the the packages for G and in some cases my graphing tool. Even if you can not run this code, it still may clarify some of the details of the experiments. However, there is no guarantee that the examples in the book were run using exactly the software given. This code also has not been extensively tested or documented and is being made available"as is". If you have corrections, extensions, additions or improvements of any kind, please send them to me at rich @richsutton com for inclusion here Chapter 1: Introduction o Tic-Tac-Toe Example (Lisp). In C Chapter 2: Evaluative Feedback o 10-armed Testbed Example, Figure 2. 1 ( Lisp) o Testbed with Softmax Action Selection, Exercise 2.2(Lisp) o Bandits A and B, Figure 2.3(Lisp) o Testbed with Constant Alpha, cf Exercise 2.7 ( Lisp) o Optimistic Initial Values Example, Figure 2. 4(Lisp) o Code Pertaining to Reinforcement Comparison: Filel, File2, File3 Lisp) o Pursuit Methods Example, Figure 2.6 ( Lisp) Chapter 3: The Reinforcement Learning Problem o Pole-Balancing Example, Figure 3. 2(C) o Gridworld Example 3.8. Code for Figures 3.5 and 3.8 Lisp) Chapter 4: Dynamic Programming Policy evaluation, Gridworld Example 4. 1 Figure 4.2 Lisp o Policy Iteration, Jacks Car Rental Example, Figure 4.4 Lisp o Value Iteration, Gambler's Problem Example, Figure 4.6 Lisp) Chapter 5: Monte Carlo Methods o Monte Carlo Policy Evaluation, Blackjack Example 5. 1, Figure 5.2(Lisp) o Monte Carlo es, Blackjack Example 5.3, Figure 5.5 ( Lisp) Chapter 6: Temporal-Difference Learning o TD Prediction in Random Walk. Example 6.2, Figures 6.5 and 6.6 Lisp) o TD Prediction in Random Walk with Batch Training, Example 6.3, Figure 6.8(Lisp) le: /Cook/ode/code. html (1 of 2)[2808/382 03: 12: 58 1]

Code for Sutton & Barto Book: Reinforcement Learning: An Introduction Code for: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto Below are links to a variety of software related to examples and exercises in the book, organized by chapters (some files appear in multiple places). See particularly the Mountain Car code. Most of the rest of the code is written in Common Lisp and requires utility routines available here. For the graphics, you will need the the packages for G and in some cases my graphing tool. Even if you can not run this code, it still may clarify some of the details of the experiments. However, there is no guarantee that the examples in the book were run using exactly the software given. This code also has not been extensively tested or documented and is being made available "as is". If you have corrections, extensions, additions or improvements of any kind, please send them to me at rich@richsutton.com for inclusion here. ● Chapter 1: Introduction ❍ Tic-Tac-Toe Example (Lisp). In C. ● Chapter 2: Evaluative Feedback ❍ 10-armed Testbed Example, Figure 2.1 (Lisp) ❍ Testbed with Softmax Action Selection, Exercise 2.2 (Lisp) ❍ Bandits A and B, Figure 2.3 (Lisp) ❍ Testbed with Constant Alpha, cf. Exercise 2.7 (Lisp) ❍ Optimistic Initial Values Example, Figure 2.4 (Lisp) ❍ Code Pertaining to Reinforcement Comparison: File1, File2, File3 (Lisp) ❍ Pursuit Methods Example, Figure 2.6 (Lisp) ● Chapter 3: The Reinforcement Learning Problem ❍ Pole-Balancing Example, Figure 3.2 (C) ❍ Gridworld Example 3.8, Code for Figures 3.5 and 3.8 (Lisp) ● Chapter 4: Dynamic Programming ❍ Policy Evaluation, Gridworld Example 4.1, Figure 4.2 (Lisp) ❍ Policy Iteration, Jack's Car Rental Example, Figure 4.4 (Lisp) ❍ Value Iteration, Gambler's Problem Example, Figure 4.6 (Lisp) ● Chapter 5: Monte Carlo Methods ❍ Monte Carlo Policy Evaluation, Blackjack Example 5.1, Figure 5.2 (Lisp) ❍ Monte Carlo ES, Blackjack Example 5.3, Figure 5.5 (Lisp) ● Chapter 6: Temporal-Difference Learning ❍ TD Prediction in Random Walk, Example 6.2, Figures 6.5 and 6.6 (Lisp) ❍ TD Prediction in Random Walk with Batch Training, Example 6.3, Figure 6.8 (Lisp) file:///C|/book/code/code.html (1 of 2) [28/08/1382 03:12:58 ﾕﾈﾍ]

点击进入文档下载页（PDF格式）

共369页，可试读40页，点击继续阅读 ↓↓

点击下载（PDF格式）

浏览记录