徐聪等:文本生成领域的深度强化学习研究进展 409. Chapter of the Association for Computational Linguistics on optimization I Proceedings of 31st International Conference on Human Language Technology.Edmonton,2003:48 Machine Learning.Lille,2015:1889 [18]Zhang J J,Zong C Q.Deep neural networks in machine [34]Kandasamy K,Bachrach Y,Tomioka R,et al.Batch policy translation:an overview./EEE Intell Sys,2015,30(5):16 gradient methods for improving neural conversation models[J/OL]. [19]Sutskever I,Vinyals O,Le Q V.Sequence to sequence learning arXiy preprint (2017-02-10)[2019-06-16].https://arxiv.org/abs/ with neural networks ll Proceedings of Advances in Neural 1702.03334 Information Processing Systems.Montreal,2014:3104 [35]Bhatnagar S,Sutton R S,Ghavamzadeh M,et al.Natural actor- [20]Cho K,Merrienboer van B,Bahdanau D,et al.On the properties of critic algorithms.Automatica,2009,45(11):2471 neural machine translation:encoder-decoder approaches.Comput [36]Grondman I,Busoniu L,Lopes G A D,et al.A survey of actor- Sci,2014:103 critic reinforcement learning:standard and natural policy [21]Luong M T,Pham H,Manning C D.Effective approaches to gradients.IEEE Trans Syst Man Cybern Part C Appl Rev,2012, attention-based neural machine translation /Proceedings of the 42(6):1291 Conference on Empirical Methods in Natural Language [37]Mnih V,Badia A P,Mirza M,et al.Asynchronous methods for Processing.Lisbon,2015:1412 deep reinforcement learning /Proceedings of 33rd International [22]Wu Y H,Schuster M,Chen Z F,et al.Google's neural machine Conference on Machine Learning.New York,2016:1928 translation system:bridging the gap between human and machine [38]Lillicrap T P,Hunt JJ,Pritzel A,et al.Continuous control with translation[J/OL].arXiv Preprint (2016-10-08)[2019-06-16]. deep reinforcement leamning[J/OL].arXiv Preprint (2016-02-29) https://arxiv.org/abs/1609.08144 [2019-06-16.htps:/axiv.org/abs/1509.02971 [23]He Z J.Baidu translate:research and products /Proceedings of [39]Kulkami T D,Saeedi A,Gautam S,et al.Deep successor the ACL 2015 Fourth Workshop on Hybrid Approaches to reinforcement learning[J/OL].arXiy Preprint(2016-06-08)[2019- Translation (HyTra).Beijing,2015:61 06-16].https://arxiv.org/abs/1606.02396 [24]Cho K,Merrienboer van B,Gulcehre C,et al.Learning phrase [40]Xu C,Li Q,Zhang D,et al.Deep successor feature leaming for representations using RNN encoder -decoder for statistical text generation[J/OL].Neurocomputing,(2019-04-25)[2019-06- machine translation /Proceedings of the Conference on Empirical 16].htps:ldoi.org/10.1016.neucom.2018.11.116 Methods in Natural Language Processing.Doha,2014:1724 [41]Zhang J W.Springenberg J T,Boedecker J,et al.Deep [25]Xu K,Ba JL,Kiros R,et al.Show,attend and tell:Neural image reinforcement learning with successor features for navigation caption generation with visual attention /Proceedings of 32nd across similar environments[J/OL].arXiy Preprint (2017-07-23) International Conference on Machine Learning.Lille,2015:2048 [2019-06-16.htps://arxiv..org/abs/1612.05533 [26]Das A,Kottur S,Gupta K,et al.Visual dialog[J/OL].arXiv [42]Bowling M,Burch N,Johanson M,et al.Heads-up limit hold'em Preprint (2017-08-01)[2019-06-16].https://arxiv.org/abs/1611. poker is solved.Science,2015,347(6218):145 08669 [43]Liu X,Xia T,Wang J,et al.Fully convolutional attention [27]Hodosh M,Young P,Hockenmaier J.Framing image description localization networks for fine-grained recognition[J/OL].arXiv as a ranking task:Data,models and evaluation metrics./Artif Preprint (2017-03-21)[2019-06-161.https://arxiv.org/abs/1603. Intell Res,2013,47:853 06765 [28]Young P,Lai A,Hodosh M,et al.From image descriptions to [44]Zoph B,Le Q V.Neural architecture search with reinforcement visual denotations:New similarity metrics for semantic inference learning[J/OL].arXiv Preprint (2017-02-15)[2019-06-161. over event descriptions.Trans Assoc Comput Linguist,2014,2:67 https://arxiv.org/abs/1611.01578 [29]Lin T Y,Maire M,Belongie S,et al.Microsoft coco:common [45]Theocharous G,Thomas P S,Ghavamzadeh M.Personalized ad objects in context Proceedings of European Conference on recommendation systems for life-time value optimization with Computer Vision.Zurich,2014:740 guarantees l International Joint Conferences on Artificial [30]Van Hasselt H,Guez A,Silver D.Deep reinforcement learning Intelligence.Buenos Aires,2015:1806 with double Q-Leaming /AAAI Conference on Artificial [46]Cuayahuitl H.Simple D S:A simple deep reinforcement learning Intelligence.Phoenix,2016:2094 dialogue system II Dialogues with Social Robots.Springer, [31]Schaul T,Quan J,Antonoglou I,et al.Prioritized experience Singapore,2017:109 replay[J/OL]arXiv Preprint (2016-02-25)[2019-06-16]. [47]He D,Xia Y C,Qin T,et al.Dual learning for machine translation https://arxiv.org/abs/1511.05952 Il Advances in Neural Information Processing Systems.Barcelona, [32]WangZ,Schaul T.Hessel M,et al.Dueling network architectures 2016:820 for deep reinforcement learning Proceedings of 33rd [48]Zhang X X.Lapata M.Sentence simplification with deep International Conference on Machine Learning.New York,2016: reinforcement learning /Proceedings of the 2017 Conference on 1995 Empirical Methods in Natural Language Processing.Copenhagen, [33]Schulman J,Levine S,Mortiz P,et al.Trust region policy Denmark,2017:584Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, 2003: 48 Zhang J J, Zong C Q. Deep neural networks in machine translation: an overview. IEEE Intell Sys, 2015, 30(5): 16 [18] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks // Proceedings of Advances in Neural Information Processing Systems. Montréal, 2014: 3104 [19] Cho K, Merriënboer van B, Bahdanau D, et al. On the properties of neural machine translation: encoder–decoder approaches. Comput Sci, 2014: 103 [20] Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 1412 [21] Wu Y H, Schuster M, Chen Z F, et al. Google ’s neural machine translation system: bridging the gap between human and machine translation[J/OL]. arXiv Preprint (2016-10-08) [2019-06-16]. https://arxiv.org/abs/1609.08144 [22] He Z J. Baidu translate: research and products // Proceedings of the ACL 2015 Fourth Workshop on Hybrid Approaches to Translation (HyTra). Beijing, 2015: 61 [23] Cho K, Merrienboer van B, Gulcehre C, et al. Learning phrase representations using RNN encoder –decoder for statistical machine translation // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, 2014: 1724 [24] Xu K, Ba J L, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention // Proceedings of 32nd International Conference on Machine Learning. Lille, 2015: 2048 [25] Das A, Kottur S, Gupta K, et al. Visual dialog[J/OL]. arXiv Preprint (2017-08-01) [2019-06-16]. https://arxiv.org/abs/1611. 08669 [26] Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task: Data, models and evaluation metrics. J Artif Intell Res, 2013, 47: 853 [27] Young P, Lai A, Hodosh M, et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Linguist, 2014, 2: 67 [28] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context // Proceedings of European Conference on Computer Vision. Zurich, 2014: 740 [29] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-Learning // AAAI Conference on Artificial Intelligence. Phoenix, 2016: 2094 [30] Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J/OL]. arXiv Preprint (2016-02-25) [2019-06-16]. https://arxiv.org/abs/1511.05952 [31] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1995 [32] [33] Schulman J, Levine S, Mortiz P, et al. Trust region policy optimization // Proceedings of 31st International Conference on Machine Learning. Lille, 2015: 1889 Kandasamy K, Bachrach Y, Tomioka R, et al. Batch policy gradient methods for improving neural conversation models[J/OL]. arXiv preprint (2017-02-10) [2019-06-16]. https://arxiv.org/abs/ 1702.03334 [34] Bhatnagar S, Sutton R S, Ghavamzadeh M, et al. Natural actorcritic algorithms. Automatica, 2009, 45(11): 2471 [35] Grondman I, Busoniu L, Lopes G A D, et al. A survey of actorcritic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C Appl Rev, 2012, 42(6): 1291 [36] Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1928 [37] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J/OL]. arXiv Preprint (2016-02-29) [2019-06-16]. https://arxiv.org/abs/1509.02971 [38] Kulkarni T D, Saeedi A, Gautam S, et al. Deep successor reinforcement learning[J/OL]. arXiv Preprint (2016-06-08) [2019- 06-16]. https://arxiv.org/abs/1606.02396 [39] Xu C, Li Q, Zhang D, et al. Deep successor feature learning for text generation[J/OL]. Neurocomputing, (2019-04-25) [2019-06- 16]. https://doi.org/10.1016/j.neucom.2018.11.116 [40] Zhang J W, Springenberg J T, Boedecker J, et al. Deep reinforcement learning with successor features for navigation across similar environments[J/OL]. arXiv Preprint (2017-07-23) [2019-06-16]. https://arxiv.org/abs/1612.05533 [41] Bowling M, Burch N, Johanson M, et al. Heads-up limit hold’em poker is solved. Science, 2015, 347(6218): 145 [42] Liu X, Xia T, Wang J, et al. Fully convolutional attention localization networks for fine-grained recognition[J/OL]. arXiv Preprint (2017-03-21) [2019-06-16]. https://arxiv.org/abs/1603. 06765 [43] Zoph B, Le Q V. Neural architecture search with reinforcement learning[J/OL]. arXiv Preprint (2017-02-15) [2019-06-16]. https://arxiv.org/abs/1611.01578 [44] Theocharous G, Thomas P S, Ghavamzadeh M. Personalized ad recommendation systems for life-time value optimization with guarantees // International Joint Conferences on Artificial Intelligence. Buenos Aires, 2015: 1806 [45] Cuayáhuitl H. Simple D S: A simple deep reinforcement learning dialogue system // Dialogues with Social Robots. Springer, Singapore, 2017: 109 [46] He D, Xia Y C, Qin T, et al. Dual learning for machine translation // Advances in Neural Information Processing Systems. Barcelona, 2016: 820 [47] Zhang X X, Lapata M. Sentence simplification with deep reinforcement learning // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark, 2017: 584 [48] 徐 聪等: 文本生成领域的深度强化学习研究进展 · 409 ·