其他自然语言处理任务的结合形式会更加丰富. 目前深度强化学习主要还是用来解

正在加载图片...

408 工程科学学报，第42卷，第4期其他自然语言处理任务的结合形式会更加丰富 [2]Mnih V,Kavukcuoglu K,Silver D,et al.Human-level control 目前深度强化学习主要还是用来解决自然语言处 through deep reinforcement learning.Nature,2015,518(7540): 理中普遍出现的不可导问题，或者是利用深度强 529 [3] Silver D,Huang A,Maddison C J,et al.Mastering the game of Go 化学习的框架帮助改进网络训练流程，从而提升 with deep neural networks and tree search.Nature,2016, 最终效果，未来可以从下面几个方向开展研究工作： 529(7587):484 (1)提升深度强化学习算法的性能.深度强化 [4]LeCun Y,Bengio Y,Hinton G.Deep learing.Nature,2015, 学习算法本身还有不少问题亟待解决，例如其训 521(7553):436 练过程较为艰难、稳定性不够好、奖励函数的设 [5] Littman M L.Reinforcement leaming improves behaviour from 计依赖经验等，都需要研究者对其进一步改进阿 evaluative feedback.Nature,2015,521(7553):445 同时研究者也可以关注于如何提高算法的收敛 [6]Li Y X.Deep reinforcement leaming:an overview[J/OL].arXiv 性、精度、速度和鲁棒性，简化模型结构，增加数 Preprint (2017-09-15)[2019-06-16].https://arxiv.org/abs/1701. 07274 据使用效率等方面. [7] Baroni M,Zamparelli R.Nouns are vectors,adjectives are (2)更多传统强化学习算法和深度学习结合 matrices:representing adjective-noun constructions in semantic 可以更好的解决自然语言领域的问题.传统强化 space l Proceedings of the 2010 Conference on Empirical 学习算法的研究已经历了20年的时间，其中很多 Methods in Natural Language Processing.Cambridge,2010:1183 算法都有各自的优势，例如逆强化学习、继承学习 [8]Lapata M,Mitchell J.Vector-based models of semantic 等，借助深度学习的力量可以在自然语言处理的 composition /Proceedings of the Meeting of the Association for 多种任务中发挥新的作用.例如Casanueva等m Computational Linguistics.Columbus,2008:236 [9] Su P H,Gasic M,Mrksic N,et al.On-line active reward learning 借鉴封建强化学习Feudal RL的方法，把基于任 for policy optimisation in spoken dialogue systems /Proceedings 务的对话管理分解为两步，每个子策略通过深度 of the 54th Annual Meeting of the Association for Computational 继承学习进行学习 Linguistics.Berlin,2016:2431 (3)从自然语言处理的任务中抽象出更多的 [10]Vinyals O,Le Q.A neural conversational model[J/OL].arYiv 决策问题.不同的自然语言任务中都包含需要决 Preprint (2015-07-22)[2019-06-16].https://arxiv.org/abs/1506. 策的环节，例如对话机器人与人进行交互、问答系 05869 [11]Wen T H,Vandyke D,Mrksic N,et al.A network-based end-to- 统从知识库抽取知识、利用人的反馈改进图像生 end trainable task-oriented dialogue system[J/OL].arXiv Preprint 成的描述或者是机器翻译的输出等，深度强化学 (2017-04-24)[2019-06-16.https:axiv.org/abs/1604.04562 习强大的决策能力能够帮助自然语言处理任务做 [12]Wen T H,Gasic M,Kim D,et al.Stochastic language generation 出较优的选择，这是监督学习无法做到的，例如深 in dialogue using recurrent neural networks with convolutional 度路径强化学习算法模型利用强化学习解决知 sentence reranking I/Proceedings of 16th Annual Meeting of the 识图谱中的关系补全问题；Buck等8O]将问答任务 Special Interest Group on Discourse and Dialogue.Prague,2015: 归纳到创新的强化学习框架中，提高了回答的 275 效果 [13]Henderson M,Thomson B,Williams J.The second dialog state tracking challenge /Proceedings of 15th Annual Meeting of the (4)深度强化学习与新的学习算法结合.深度 Special Interest Group on Discourse and Dialogue.Philadelphia, 强化学习是一个灵活的框架，可以与很多新算法 2014:263 融合，例如结合生成对抗网络、记忆网络、注意力 [14]Eric M,Manning C D.Key-value retrieval networks for task- 机制等，这也能够为解决自然语言处理中的问题 oriented dialogue[J/OL].arXiv Preprint (2017-07-14)[2019-06- 提供更多创新的方法和思路，例如Feng等I刚提出 16].https://arxiv.org/abs/1705.05414 基于强化学习的框架从噪声数据中抽取关系，解 [15]Lowe R,Pow N,Serban I V,et al.The ubuntu dialogue corpus:a 决了远距离监督学习的问题；Zhang等利用强 large dataset for research in unstructured multi-tur dialogue 化学习算法自动地学习句子的最优结构化表示， systems I Proceedings of 16th Anmual Meeting of the Special Interest Group on Discourse and Dialogue.Prague,2015:285 并用于句子分类任务中 [16]Brown P F,Pietra V J D,Pietra S A D,et al.The mathematics of statistical machine translation:Parameter estimation.Comput 参考文献 Linguist,,1993,19(2:263 [1] Sutton R S,Barto A G.Reinforcement Learning:An Introduction. [17]Koehn P,Och F J,Marcu D.Statistical phrase-based translation / 2nd Ed.Massachusetts:MIT Press,2018 Proceedings of the 2003 Conference of the North American其他自然语言处理任务的结合形式会更加丰富. 目前深度强化学习主要还是用来解决自然语言处理中普遍出现的不可导问题，或者是利用深度强化学习的框架帮助改进网络训练流程，从而提升最终效果，未来可以从下面几个方向开展研究工作：（1）提升深度强化学习算法的性能. 深度强化学习算法本身还有不少问题亟待解决，例如其训练过程较为艰难、稳定性不够好、奖励函数的设计依赖经验等，都需要研究者对其进一步改进[76] . 同时研究者也可以关注于如何提高算法的收敛性、精度、速度和鲁棒性，简化模型结构，增加数据使用效率等方面. （2）更多传统强化学习算法和深度学习结合可以更好的解决自然语言领域的问题. 传统强化学习算法的研究已经历了 20 年的时间，其中很多算法都有各自的优势，例如逆强化学习、继承学习等，借助深度学习的力量可以在自然语言处理的多种任务中发挥新的作用. 例如 Casanueva 等[77] 借鉴封建强化学习 Feudal RL[78] 的方法，把基于任务的对话管理分解为两步，每个子策略通过深度继承学习进行学习. （3）从自然语言处理的任务中抽象出更多的决策问题. 不同的自然语言任务中都包含需要决策的环节，例如对话机器人与人进行交互、问答系统从知识库抽取知识、利用人的反馈改进图像生成的描述或者是机器翻译的输出等，深度强化学习强大的决策能力能够帮助自然语言处理任务做出较优的选择，这是监督学习无法做到的，例如深度路径强化学习算法模型[79] 利用强化学习解决知识图谱中的关系补全问题；Buck 等[80] 将问答任务归纳到创新的强化学习框架中，提高了回答的效果. （4）深度强化学习与新的学习算法结合. 深度强化学习是一个灵活的框架，可以与很多新算法融合，例如结合生成对抗网络、记忆网络、注意力机制等，这也能够为解决自然语言处理中的问题提供更多创新的方法和思路，例如 Feng 等[81] 提出基于强化学习的框架从噪声数据中抽取关系，解决了远距离监督学习的问题；Zhang 等[82] 利用强化学习算法自动地学习句子的最优结构化表示，并用于句子分类任务中. 参考文献 Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd Ed. Massachusetts: MIT Press, 2018 [1] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518（7540）: 529 [2] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529（7587）: 484 [3] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521（7553）: 436 [4] Littman M L. Reinforcement learning improves behaviour from evaluative feedback. Nature, 2015, 521（7553）: 445 [5] Li Y X. Deep reinforcement learning: an overview[J/OL]. arXiv Preprint (2017-09-15) [2019-06-16]. https://arxiv.org/abs/1701. 07274 [6] Baroni M, Zamparelli R. Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space // Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, 2010: 1183 [7] Lapata M, Mitchell J. Vector-based models of semantic composition // Proceedings of the Meeting of the Association for Computational Linguistics. Columbus, 2008: 236 [8] Su P H, Gašić M, Mrkšić N, et al. On-line active reward learning for policy optimisation in spoken dialogue systems // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 2431 [9] Vinyals O, Le Q. A neural conversational model[J/OL]. arXiv Preprint (2015-07-22) [2019-06-16]. https://arxiv.org/abs/1506. 05869 [10] Wen T H, Vandyke D, Mrksic N, et al. A network-based end-toend trainable task-oriented dialogue system[J/OL]. arXiv Preprint (2017-04-24) [2019-06-16]. https://arxiv.org/abs/1604.04562 [11] Wen T H, Gašic M, Kim D, et al. Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking // Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague, 2015: 275 [12] Henderson M, Thomson B, Williams J. The second dialog state tracking challenge // Proceedings of 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Philadelphia, 2014: 263 [13] Eric M, Manning C D. Key-value retrieval networks for taskoriented dialogue[J/OL]. arXiv Preprint (2017-07-14) [2019-06- 16]. https://arxiv.org/abs/1705.05414 [14] Lowe R, Pow N, Serban I V, et al. The ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems // Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague, 2015: 285 [15] Brown P F, Pietra V J D, Pietra S A D, et al. The mathematics of statistical machine translation: Parameter estimation. Comput Linguist, 1993, 19（2）: 263 [16] Koehn P, Och F J, Marcu D. Statistical phrase-based translation // Proceedings of the 2003 Conference of the North American [17] · 408 · 工程科学学报，第 42 卷，第 4 期

<<向上翻页向下翻页>>

点击下载：文本生成领域的深度强化学习研究进展