徐聪等:文本生成领域的深度强化学习研究进展 409. Chapter of the Association for Computational Linguistics on optimization I Proceedings of 31st International Conference on Human Language Technology.Edmonton,2003:48 Machine Learning.Lille,2015:1889 [18]Zhang J J,Zong C Q.Deep neural networks in machine [34]Kandasamy K,Bachrach Y,Tomioka R,et al.Batch policy translation:an overview./EEE Intell Sys,2015,30(5):16 gradient methods for improving neural conversation models[J/OL]. [19]Sutskever I,Vinyals O,Le Q V.Sequence to sequence learning arXiy preprint (2017-02-10)[2019-06-16].https://arxiv.org/abs/ with neural networks ll Proceedings of Advances in Neural 1702.03334 Information Processing Systems.Montreal,2014:3104 [35]Bhatnagar S,Sutton R S,Ghavamzadeh M,et al.Natural actor- [20]Cho K,Merrienboer van B,Bahdanau D,et al.On the properties of critic algorithms.Automatica,2009,45(11):2471 neural machine translation:encoder-decoder approaches.Comput [36]Grondman I,Busoniu L,Lopes G A D,et al.A survey of actor- Sci,2014:103 critic reinforcement learning:standard and natural policy [21]Luong M T,Pham H,Manning C D.Effective approaches to gradients.IEEE Trans Syst Man Cybern Part C Appl Rev,2012, attention-based neural machine translation /Proceedings of the 42(6):1291 Conference on Empirical Methods in Natural Language [37]Mnih V,Badia A P,Mirza M,et al.Asynchronous methods for Processing.Lisbon,2015:1412 deep reinforcement learning /Proceedings of 33rd International [22]Wu Y H,Schuster M,Chen Z F,et al.Google's neural machine Conference on Machine Learning.New York,2016:1928 translation system:bridging the gap between human and machine [38]Lillicrap T P,Hunt JJ,Pritzel A,et al.Continuous control with translation[J/OL].arXiv Preprint (2016-10-08)[2019-06-16]. deep reinforcement leamning[J/OL].arXiv Preprint (2016-02-29) https://arxiv.org/abs/1609.08144 [2019-06-16.htps:/axiv.org/abs/1509.02971 [23]He Z J.Baidu translate:research and products /Proceedings of [39]Kulkami T D,Saeedi A,Gautam S,et al.Deep successor the ACL 2015 Fourth Workshop on Hybrid Approaches to reinforcement learning[J/OL].arXiy Preprint(2016-06-08)[2019- Translation (HyTra).Beijing,2015:61 06-16].https://arxiv.org/abs/1606.02396 [24]Cho K,Merrienboer van B,Gulcehre C,et al.Learning phrase [40]Xu C,Li Q,Zhang D,et al.Deep successor feature leaming for representations using RNN encoder -decoder for statistical text generation[J/OL].Neurocomputing,(2019-04-25)[2019-06- machine translation /Proceedings of the Conference on Empirical 16].htps:ldoi.org/10.1016.neucom.2018.11.116 Methods in Natural Language Processing.Doha,2014:1724 [41]Zhang J W.Springenberg J T,Boedecker J,et al.Deep [25]Xu K,Ba JL,Kiros R,et al.Show,attend and tell:Neural image reinforcement learning with successor features for navigation caption generation with visual attention /Proceedings of 32nd across similar environments[J/OL].arXiy Preprint (2017-07-23) International Conference on Machine Learning.Lille,2015:2048 [2019-06-16.htps://arxiv..org/abs/1612.05533 [26]Das A,Kottur S,Gupta K,et al.Visual dialog[J/OL].arXiv [42]Bowling M,Burch N,Johanson M,et al.Heads-up limit hold'em Preprint (2017-08-01)[2019-06-16].https://arxiv.org/abs/1611. poker is solved.Science,2015,347(6218):145 08669 [43]Liu X,Xia T,Wang J,et al.Fully convolutional attention [27]Hodosh M,Young P,Hockenmaier J.Framing image description localization networks for fine-grained recognition[J/OL].arXiv as a ranking task:Data,models and evaluation metrics./Artif Preprint (2017-03-21)[2019-06-161.https://arxiv.org/abs/1603. Intell Res,2013,47:853 06765 [28]Young P,Lai A,Hodosh M,et al.From image descriptions to [44]Zoph B,Le Q V.Neural architecture search with reinforcement visual denotations:New similarity metrics for semantic inference learning[J/OL].arXiv Preprint (2017-02-15)[2019-06-161. over event descriptions.Trans Assoc Comput Linguist,2014,2:67 https://arxiv.org/abs/1611.01578 [29]Lin T Y,Maire M,Belongie S,et al.Microsoft coco:common [45]Theocharous G,Thomas P S,Ghavamzadeh M.Personalized ad objects in context Proceedings of European Conference on recommendation systems for life-time value optimization with Computer Vision.Zurich,2014:740 guarantees l International Joint Conferences on Artificial [30]Van Hasselt H,Guez A,Silver D.Deep reinforcement learning Intelligence.Buenos Aires,2015:1806 with double Q-Leaming /AAAI Conference on Artificial [46]Cuayahuitl H.Simple D S:A simple deep reinforcement learning Intelligence.Phoenix,2016:2094 dialogue system II Dialogues with Social Robots.Springer, [31]Schaul T,Quan J,Antonoglou I,et al.Prioritized experience Singapore,2017:109 replay[J/OL]arXiv Preprint (2016-02-25)[2019-06-16]. [47]He D,Xia Y C,Qin T,et al.Dual learning for machine translation https://arxiv.org/abs/1511.05952 Il Advances in Neural Information Processing Systems.Barcelona, [32]WangZ,Schaul T.Hessel M,et al.Dueling network architectures 2016:820 for deep reinforcement learning Proceedings of 33rd [48]Zhang X X.Lapata M.Sentence simplification with deep International Conference on Machine Learning.New York,2016: reinforcement learning /Proceedings of the 2017 Conference on 1995 Empirical Methods in Natural Language Processing.Copenhagen, [33]Schulman J,Levine S,Mortiz P,et al.Trust region policy Denmark,2017:584Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, 2003: 48 Zhang  J  J,  Zong  C  Q.  Deep  neural  networks  in  machine translation: an overview. IEEE Intell Sys, 2015, 30(5): 16 [18] Sutskever  I,  Vinyals  O,  Le  Q  V.  Sequence  to  sequence  learning with  neural  networks  // Proceedings of Advances in Neural Information Processing Systems. Montréal, 2014: 3104 [19] Cho K, Merriënboer van B, Bahdanau D, et al. On the properties of neural machine translation: encoder–decoder approaches. Comput Sci, 2014: 103 [20] Luong  M  T,  Pham  H,  Manning  C  D.  Effective  approaches  to attention-based  neural  machine  translation  // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 1412 [21] Wu Y H, Schuster M, Chen Z F, et al. Google ’s neural machine translation system: bridging the gap between human and machine translation[J/OL]. arXiv Preprint (2016-10-08)  [2019-06-16]. https://arxiv.org/abs/1609.08144 [22] He  Z  J.  Baidu  translate:  research  and  products  // Proceedings of the ACL 2015 Fourth Workshop on Hybrid Approaches to Translation (HyTra). Beijing, 2015: 61 [23] Cho  K,  Merrienboer  van  B,  Gulcehre  C,  et  al.  Learning  phrase representations  using  RNN  encoder –decoder  for  statistical machine translation // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, 2014: 1724 [24] Xu K, Ba J L, Kiros R, et al. Show, attend and tell: Neural image caption  generation  with  visual  attention  // Proceedings of 32nd International Conference on Machine Learning. Lille, 2015: 2048 [25] Das  A,  Kottur  S,  Gupta  K,  et  al.  Visual  dialog[J/OL]. arXiv Preprint (2017-08-01)  [2019-06-16]. https://arxiv.org/abs/1611. 08669 [26] Hodosh M, Young P, Hockenmaier J. Framing image description as  a  ranking  task:  Data,  models  and  evaluation  metrics. J Artif Intell Res, 2013, 47: 853 [27] Young  P,  Lai  A,  Hodosh  M,  et  al.  From  image  descriptions  to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Linguist, 2014, 2: 67 [28] Lin  T  Y,  Maire  M,  Belongie  S,  et  al.  Microsoft  coco:  common objects  in  context  // Proceedings of European Conference on Computer Vision. Zurich, 2014: 740 [29] Van  Hasselt  H,  Guez  A,  Silver  D.  Deep  reinforcement  learning with  double  Q-Learning  // AAAI Conference on Artificial Intelligence. Phoenix, 2016: 2094 [30] Schaul  T,  Quan  J,  Antonoglou  I,  et  al.  Prioritized  experience replay[J/OL]. arXiv Preprint (2016-02-25)  [2019-06-16]. https://arxiv.org/abs/1511.05952 [31] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for  deep  reinforcement  learning  // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1995 [32] [33] Schulman  J,  Levine  S,  Mortiz  P,  et  al.  Trust  region  policy optimization  // Proceedings of 31st International Conference on Machine Learning. Lille, 2015: 1889 Kandasamy  K,  Bachrach  Y,  Tomioka  R,  et  al.  Batch  policy gradient methods for improving neural conversation models[J/OL]. arXiv preprint (2017-02-10)  [2019-06-16]. https://arxiv.org/abs/ 1702.03334 [34] Bhatnagar  S,  Sutton  R  S,  Ghavamzadeh  M,  et  al.  Natural  actor￾critic algorithms. Automatica, 2009, 45(11): 2471 [35] Grondman I, Busoniu L, Lopes G A D, et al. A survey of actor￾critic  reinforcement  learning:  standard  and  natural  policy gradients. IEEE Trans Syst Man Cybern Part C Appl Rev,  2012, 42(6): 1291 [36] Mnih  V,  Badia  A  P,  Mirza  M,  et  al.  Asynchronous  methods  for deep  reinforcement  learning  // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1928 [37] Lillicrap  T  P,  Hunt  J  J,  Pritzel  A,  et  al.  Continuous  control  with deep  reinforcement  learning[J/OL]. arXiv Preprint (2016-02-29) [2019-06-16]. https://arxiv.org/abs/1509.02971 [38] Kulkarni  T  D,  Saeedi  A,  Gautam  S,  et  al.  Deep  successor reinforcement learning[J/OL]. arXiv Preprint (2016-06-08) [2019- 06-16]. https://arxiv.org/abs/1606.02396 [39] Xu  C,  Li  Q,  Zhang  D,  et  al.  Deep  successor  feature  learning  for text  generation[J/OL]. Neurocomputing,  (2019-04-25)  [2019-06- 16]. https://doi.org/10.1016/j.neucom.2018.11.116 [40] Zhang  J  W,  Springenberg  J  T,  Boedecker  J,  et  al.  Deep reinforcement  learning  with  successor  features  for  navigation across  similar  environments[J/OL]. arXiv Preprint (2017-07-23) [2019-06-16]. https://arxiv.org/abs/1612.05533 [41] Bowling M, Burch N, Johanson M, et al. Heads-up limit hold’em poker is solved. Science, 2015, 347(6218): 145 [42] Liu  X,  Xia  T,  Wang  J,  et  al.  Fully  convolutional  attention localization  networks  for  fine-grained  recognition[J/OL]. arXiv Preprint (2017-03-21)  [2019-06-16]. https://arxiv.org/abs/1603. 06765 [43] Zoph  B,  Le  Q  V.  Neural  architecture  search  with  reinforcement learning[J/OL]. arXiv Preprint (2017-02-15)  [2019-06-16]. https://arxiv.org/abs/1611.01578 [44] Theocharous  G,  Thomas  P  S,  Ghavamzadeh  M.  Personalized  ad recommendation  systems  for  life-time  value  optimization  with guarantees  // International Joint Conferences on Artificial Intelligence. Buenos Aires, 2015: 1806 [45] Cuayáhuitl H. Simple D S: A simple deep reinforcement learning dialogue  system  // Dialogues with Social Robots.  Springer, Singapore, 2017: 109 [46] He D, Xia Y C, Qin T, et al. Dual learning for machine translation // Advances in Neural Information Processing Systems. Barcelona, 2016: 820 [47] Zhang  X  X,  Lapata  M.  Sentence  simplification  with  deep reinforcement learning // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark, 2017: 584 [48] 徐    聪等: 文本生成领域的深度强化学习研究进展 · 409 ·
©2008-现在 cucdc.com 高等教育资讯网 版权所有