





定价:79元
印次:1-1
ISBN:9787302684718
出版日期:2025.04.01
印刷日期:2025.03.31
图书责编:古雪
图书分类:教材
第一章,从阿尔法零的卓越性能出发,深入解读其背后着实不易的成长历程,揭示其数学模型。第二章,从确定性和随机动态规划问题入手,介绍决策问题的数学模型。第三章,从抽象视角回顾纷繁复杂的强化学习算法,揭示值函数近似与滚动改进的重要作用。第四章,从经典的线性二次型**控制问题入手,分析从阿尔法零的成功中学到的经验。第五章,分别从鲁棒、自适应、模型预测控制等问题入手,分析值函数近似与滚动改进对算法性能的提升潜力。第六章,从离散优化的视角审视阿尔法零的成功经验。第七章,总结全书。适合作为本领域研究者作为学术专著阅读,也适合作为研究生和本科生作为参考书使用。
[美]德梅萃·P. 博塞克斯(Dimitri P. Bertseka),美国MIT终身教授,美国国家工程院院士,清华大学复杂与网络化系统研究中心客座教授。电气工程与计算机科学领域国际知名作者,著有《非线性规划》《网络优化》《动态规划》《凸优化》《强化学习与**控制》等十几本畅销教材和专著。
Preface With four parameters I can fit an elephant, and with five I can make him wiggle his trunk4 John von Neumann The purpose of this monograph is to propose and develop a new concep- tual framework for approximate Dynamic Programming (DP) and Rein- forcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in syn- ergy through the powerful mechanism of Newton's method. We call these the off-line training and the on-line play algorithms; the names are bor- rowed from some of the major successes of RL involving games. Primary examples are the recent (2017) AlphaZero program (which plays chess...
Contents
1. AlphaZero, Off-Line Training, and On-Line Play
1.1. Off-Line Training and Policy Iteration P. 3
1.2. On-Line Play and Approximation in Value Space -
Truncated Rollout p. 6
1.3. The Lessons of AlphaZero p. 8
1.4. A New Conceptual Framework for Reinforcement Learning p. 11
1.5. Notes and Sources p. 14
2. Deterministic and Stochastic Dynamic Programming
2.1. Optimal Control Over an Infinite Horizon p. 20
2.2. Approximation in Value Space p. 25
2.3. Notes and Sources p. 30
3. An Abstract View of Reinforcement Learning
3.1. Bellman Operators p. 32
3.2. Approximation in Value Space and Newton's Method p. 39
3.3. Region of Stability p. 46
3.4. Policy...