文本生成领域的深度强化学习研究进展徐

正在加载图片...

工程科学学报.第42卷，第4期：399-411.2020年4月 Chinese Journal of Engineering,Vol.42,No.4:399-411,April 2020 https://doi.org/10.13374/j.issn2095-9389.2019.06.16.030;http://cje.ustb.edu.cn 文本生成领域的深度强化学习研究进展徐聪2)，李擎)区，张德政2，)，陈鹏，崔家瑞) 1)北京科技大学自动化学院，北京1000832)材料领域知识工程北京市重点实验室，北京1000833)北京科技大学计算机与通信工程学院.北京100083 ☒通信作者，E-mail:liging@ies.ustb.edu.cn 摘要谷歌的人工智能系统(AlphaGo)在围棋领域取得了一系列成功，使得深度强化学习得到越来越多的关注.深度强化学习融合了深度学习对复杂环境的感知能力和强化学习对复杂情景的决策能力.而自然语言处理过程中有着数量巨大的词汇或者语句需要表征，并且在对话系统、机器翻译和图像描述等文本生成任务中存在大量难以建模的决策问题.这使得深度强化学习在自然语言处理的文本生成任务中能够发挥重要的作用，帮助改进现有的模型结构或者训练机制，并且已经取得了很多显著的成果.为此，本文系统阐述深度强化学习应用在不同的文本生成任务中的一些主要方法，梳理其发展的轨迹，分析算法特点.最后，展望深度强化学习与自然语言处理任务融合的前景和挑战关键词深度强化学习：自然语言处理：文本生成：对话系统：机器翻译：图像描述分类号TP183 Research progress of deep reinforcement learning applied to text generation XU Cong2),LI Qing,ZHANG De-zheng,CHEN Peng,CUl Jia-rui) 1)School of Automation and Electrical Engineering,University of Science and Technology Beijing,Beijing 100083,China 2)Beijing Key Laboratory of Knowledge Engineering for Materials Science,Beijing 100083.China 3)School of Computer&Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China Corresponding author,E-mail:liqing @ies.ustb.edu.cn ABSTRACT With the recent exciting achievements of Google's artificial intelligence system in the game of Go,deep reinforcement learning (DRL)has witnessed considerable development.DRL combines the abilities of sensing and making decisions provided by deep learning and reinforcement learning.Natural language processing(NLP)involves a large number of vocabularies or statements that have to be represented,and its subtasks,such as the dialogue system and machine translation,involve many decision problems that are difficult to model.Because of the aforementioned reasons,DRL can be appropriately applied to various NLP tasks such as named entity recognition,relation extraction,dialogue system,image caption,and machine translation.Further,DRL is helpful in improving the framework or the training pipeline of the aforementioned tasks,and notable achievements have been obtained.DRL is not an algorithm or a method but a paradigm.Many researchers fit plenty of NLP tasks in this paradigm and achieve better performance.Specifically,in text generation based on the reinforcement learning paradigm,the learning process that is used to produce a predicted sequence from the given source sequence can be considered to be the Markov decision process(MDP).In MDP,an agent interacts with the environment by receiving a sequence of observations and scaled rewards and subsequently produces the next action or word.This causes the text generation model to achieve decision-making ability,which can result in future success.Thus,the text generation task integrated with reinforcement learning is an attractive and promising research field.This study presented a comprehensive introduction and a systemic overview.First,we presented the basic methods in DRL and its variations.Then,we showed the main applications of DRL during the 收稿日期：2019-06-16 基金项目：国家重点研发计划云计算和大数据专项资助项目(2017YFB1002304)文本生成领域的深度强化学习研究进展徐聪1,2)，李擎1) 苣，张德政2,3)，陈鹏1)，崔家瑞1) 1) 北京科技大学自动化学院，北京 100083 2) 材料领域知识工程北京市重点实验室，北京 100083 3) 北京科技大学计算机与通信工程学院，北京 100083 苣通信作者，E-mail：liqing@ies.ustb.edu.cn 摘要谷歌的人工智能系统（AlphaGo）在围棋领域取得了一系列成功，使得深度强化学习得到越来越多的关注. 深度强化学习融合了深度学习对复杂环境的感知能力和强化学习对复杂情景的决策能力. 而自然语言处理过程中有着数量巨大的词汇或者语句需要表征，并且在对话系统、机器翻译和图像描述等文本生成任务中存在大量难以建模的决策问题. 这使得深度强化学习在自然语言处理的文本生成任务中能够发挥重要的作用，帮助改进现有的模型结构或者训练机制，并且已经取得了很多显著的成果. 为此，本文系统阐述深度强化学习应用在不同的文本生成任务中的一些主要方法，梳理其发展的轨迹，分析算法特点. 最后，展望深度强化学习与自然语言处理任务融合的前景和挑战. 关键词深度强化学习；自然语言处理；文本生成；对话系统；机器翻译；图像描述分类号 TP183 Research progress of deep reinforcement learning applied to text generation XU Cong1,2) ，LI Qing1) 苣，ZHANG De-zheng2,3) ，CHEN Peng1) ，CUI Jia-rui1) 1) School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China 2) Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China 3) School of Computer & Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China 苣 Corresponding author, E-mail: liqing@ies.ustb.edu.cn ABSTRACT With the recent exciting achievements of Google’s artificial intelligence system in the game of Go, deep reinforcement learning (DRL) has witnessed considerable development. DRL combines the abilities of sensing and making decisions provided by deep learning and reinforcement learning. Natural language processing (NLP) involves a large number of vocabularies or statements that have to be represented, and its subtasks, such as the dialogue system and machine translation, involve many decision problems that are difficult to model. Because of the aforementioned reasons, DRL can be appropriately applied to various NLP tasks such as named entity recognition, relation extraction, dialogue system, image caption, and machine translation. Further, DRL is helpful in improving the framework or the training pipeline of the aforementioned tasks, and notable achievements have been obtained. DRL is not an algorithm or a method but a paradigm. Many researchers fit plenty of NLP tasks in this paradigm and achieve better performance. Specifically, in text generation based on the reinforcement learning paradigm, the learning process that is used to produce a predicted sequence from the given source sequence can be considered to be the Markov decision process (MDP). In MDP, an agent interacts with the environment by receiving a sequence of observations and scaled rewards and subsequently produces the next action or word. This causes the text generation model to achieve decision-making ability, which can result in future success. Thus, the text generation task integrated with reinforcement learning is an attractive and promising research field. This study presented a comprehensive introduction and a systemic overview. First, we presented the basic methods in DRL and its variations. Then, we showed the main applications of DRL during the 收稿日期: 2019−06−16 基金项目: 国家重点研发计划云计算和大数据专项资助项目（2017YFB1002304）工程科学学报，第 42 卷，第 4 期：399−411，2020 年 4 月 Chinese Journal of Engineering, Vol. 42, No. 4: 399−411, April 2020 https://doi.org/10.13374/j.issn2095-9389.2019.06.16.030; http://cje.ustb.edu.cn

<<向上翻页向下翻页>>

点击下载：《工程科学学报》：文本生成领域的深度强化学习研究进展（北京科技大学）