DOI: 10.11992/tis.202003031 强化学习稀疏奖励算

正在加载图片...

第15卷第5期智能系统学报 Vol.15 No.5 2020年9月 CAAI Transactions on Intelligent Systems Sep.2020 D0L:10.11992tis.202003031 强化学习稀疏奖励算法研究——理论与实验杨瑞，严江鹏'，李秀 (1.清华大学自动化系，北京100084：2.清华大学深圳国际研究生院，广东深圳518055) 摘要：近年来，强化学习在游戏、机器人控制等序列决策领域都获得了巨大的成功，但是大量实际问题中奖励信号十分稀疏，导致智能体难以从与环境的交互中学习到最优的策略，这一问题被称为稀疏奖励问题。稀疏奖励问题的研究能够促进强化学习实际应用与落地，在强化学习理论研究中具有重要意义。本文调研了稀硫奖励问题的研究现状，以外部引导信息为线索，分别介绍了奖励塑造、模仿学习、课程学习、事后经验回放、好奇心驱动、分层强化学习等方法。本文在稀疏奖励环境Fetch Reach上实现了以上6类方法的代表性算法进行实验验证和比较分析。使用外部引导信息的算法平均表现好于无外部引导信息的算法，但是后者对数据的依赖性更低，两类方法均具有重要的研究意义。最后，本文对稀疏奖励算法研究进行了总结与展望。关键词：强化学习：深度强化学习：机器学习：稀疏奖励：神经网络；人工智能：深度学习中图分类号：TP181文献标志码：A文章编号：1673-4785(2020)05-0888-12 中文引用格式：杨瑞，严江鹏，李秀.强化学习稀疏奖励算法研究一一理论与实验.智能系统学报，2020,15（⑤）：888-899. 英文引用格式：YANG Rui,YAN Jiangpeng,.LI Xiu..Survey of sparse reward algorithms in reinforcement learning一theory and experiment JI.CAAI transactions on intelligent systems,2020,15(5):888-899. Survey of sparse reward algorithms in reinforcement learning-theory and experiment YANG Rui',YAN Jiangpeng',LI Xiu'2 (1.Department of Automation,Tsinghua University,Beijing 100084,China;2.Shenzhen International Graduate School,Tsinghua University,Shenzhen 518055,China) Abstract:In recent years,reinforcement learning has achieved great success in a range of sequential decision-making applications such as games and robotic control.However,the reward signals are very sparse in many real-world situ- ations,which makes it difficult for agents to determine an optimal strategy based on interaction with the environment. This problem is called the sparse reward problem.Research on sparse reward can advance both the theory and actual ap- plications of reinforcement learning.We investigated the current research status of the sparse reward problem and used the external information as the clue to introduce the following six classes of algorithms:reward shaping,imitation learn- ing,curriculum learning,hindsight experience replay,curiosity-driven algorithms,and hierarchical reinforcement learn- ing.To conduct experiments in the sparse reward environment Fetch Reach,we implemented typical algorithms from the above six classes,followed by thorough comparison and analysis of the results.Algorithms that utilize external in- formation were found to outperform those without external information,but the latter are less dependent on data.Both methods have great research significance.At last,summarize the current sparse reward algorithms and forecast future work Keywords:reinforcement learning,deep reinforcement learning;machine learning:sparse reward;neural networks;arti- ficial intelligence;deep learning 收稿日期：2020-03-19. 基金项目：国家自然科学基金项目(41876098). 强化学习(reinforcement learning)是一类智能通信作者：李秀.E-mail:li.xiu(@sz.tsinghua.edu.cn 体在与环境的交互中不断试错来学习最优策略的DOI: 10.11992/tis.202003031 强化学习稀疏奖励算法研究——理论与实验杨瑞1 ，严江鹏1 ，李秀1,2 （1. 清华大学自动化系，北京 100084; 2. 清华大学深圳国际研究生院，广东深圳 518055）摘要：近年来，强化学习在游戏、机器人控制等序列决策领域都获得了巨大的成功，但是大量实际问题中奖励信号十分稀疏，导致智能体难以从与环境的交互中学习到最优的策略，这一问题被称为稀疏奖励问题。稀疏奖励问题的研究能够促进强化学习实际应用与落地，在强化学习理论研究中具有重要意义。本文调研了稀疏奖励问题的研究现状，以外部引导信息为线索，分别介绍了奖励塑造、模仿学习、课程学习、事后经验回放、好奇心驱动、分层强化学习等方法。本文在稀疏奖励环境 Fetch Reach 上实现了以上 6 类方法的代表性算法进行实验验证和比较分析。使用外部引导信息的算法平均表现好于无外部引导信息的算法，但是后者对数据的依赖性更低，两类方法均具有重要的研究意义。最后，本文对稀疏奖励算法研究进行了总结与展望。关键词：强化学习；深度强化学习；机器学习；稀疏奖励；神经网络；人工智能；深度学习中图分类号：TP181 文献标志码：A 文章编号：1673−4785(2020)05−0888−12 中文引用格式：杨瑞, 严江鹏, 李秀. 强化学习稀疏奖励算法研究——理论与实验 [J]. 智能系统学报, 2020, 15(5): 888–899. 英文引用格式：YANG Rui, YAN Jiangpeng, LI Xiu. Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J]. CAAI transactions on intelligent systems, 2020, 15(5): 888–899. Survey of sparse reward algorithms in reinforcement learning — theory and experiment YANG Rui1 ，YAN Jiangpeng1 ，LI Xiu1,2 (1. Department of Automation, Tsinghua University, Beijing 100084, China; 2. Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China) Abstract: In recent years, reinforcement learning has achieved great success in a range of sequential decision-making applications such as games and robotic control. However, the reward signals are very sparse in many real-world situations, which makes it difficult for agents to determine an optimal strategy based on interaction with the environment. This problem is called the sparse reward problem. Research on sparse reward can advance both the theory and actual applications of reinforcement learning. We investigated the current research status of the sparse reward problem and used the external information as the clue to introduce the following six classes of algorithms: reward shaping, imitation learning, curriculum learning, hindsight experience replay, curiosity-driven algorithms, and hierarchical reinforcement learning. To conduct experiments in the sparse reward environment Fetch Reach, we implemented typical algorithms from the above six classes, followed by thorough comparison and analysis of the results. Algorithms that utilize external information were found to outperform those without external information, but the latter are less dependent on data. Both methods have great research significance. At last, summarize the current sparse reward algorithms and forecast future work. Keywords: reinforcement learning; deep reinforcement learning; machine learning; sparse reward; neural networks; artificial intelligence; deep learning 强化学习 (reinforcement learning) 是一类智能体在与环境的交互中不断试错来学习最优策略的收稿日期：2020−03−19. 基金项目：国家自然科学基金项目 (41876098). 通信作者：李秀. E-mail：li.xiu@sz.tsinghua.edu.cn. 第 15 卷第 5 期智能系统学报 Vol.15 No.5 2020 年 9 月 CAAI Transactions on Intelligent Systems Sep. 2020

向下翻页>>

点击下载：【智能系统】强化学习稀疏奖励算法研究——理论与实验