深度学习中注意力机制研究进展刘建伟苣，刘俊文，罗雄麟中国石油大学 (北

正在加载图片...

工程科学学报.第43卷，第11期：1499-1511.2021年11月 Chinese Journal of Engineering,Vol.43,No.11:1499-1511,November 2021 https://doi.org/10.13374/j.issn2095-9389.2021.01.30.005;http://cje.ustb.edu.cn 深度学习中注意力机制研究进展刘建伟巴，刘俊文，罗雄麟中国石油大学（北京）自动化系，北京102249 ☒通信作者，E-mail:liujw@cup.edu.cn 摘要对注意力机制的主流模型进行了全面系统的概述.注意力机制模拟人类视觉选择性的机制，其核心的目的是从冗杂的信息中选择出对当前任务目标关联性更大、更关键的信息而过滤噪声，也就是高效率信息选择和关注机制.首先简要介绍和定义了注意力机制的原型，接着按照多个层面对各种注意力机制结构进行分类，然后对注意力机制的可解释性进行了阐述司时总结了在各种领域的应用，最后指出了注意力机制未来的发展方向以及会面临的挑战关键词注意力机制：全局/局部注意力机制：硬/软注意力机制：自注意力机制：可解释性分类号TP181 Research progress in attention mechanism in deep learning LIU Jian-we LIU Jun-wen,LUO Xiong-lin Department of Automation,China University of Petroleum,Beijing 102249,China Corresponding author,E-mail:liujw@cup.edu.cn ABSTRACT There are two challenges with the traditional encoder-decoder framework.First,the encoder needs to compress all the necessary information of a source sentence into a fixed-length vector.Second,it is unable to model the alignment between the source and the target sentences,which is an essential aspect of structured output tasks,such as machine translation.To address these issues,the attention mechanism is introduced to the encoder-decoder model.This mechanism allows the model to align and translate by jointly learning a neural machine translation task.The whose core idea of this mechanism is to induce attention weights over the source sentences to prioritize the set of positions where relevant information is present for generating the next output token.Nowadays,this mechanism has become essential in neural networks,which have been researched for diverse applications.The present survey provides a systematic and comprehensive overview of the developments in attention modeling.The intuition behind attention modeling can be best explained by the simulation mechanism of human visual selectivity,which aims to select more relevant and critical information from tedious information for the current target task while ignoring other irrelevant information in a manner that assists in developing perception.In addition,attention mechanism is an efficient information selection and widely used in deep learning fields in recent years and played a pivotal role in natural language processing,speech recognition,and computer vision.This survey first briefly introduces the origin of the attention mechanism and defines a standard parametric and uniform model for encoder-decoder neural machine translation. Next,various techniques are grouped into coherent categories using types of alignment scores and number of sequences,abstraction levels,positions,and representations.A visual explanation of attention mechanism is then provided to a certain extent,and roles of attention mechanism in multiple application areas is summarized.Finally,this survey identified the future direction and challenges of the attention mechanism. KEY WORDS attention mechanism;global/local attention;hard/soft attention:self-attention;interpretability 收稿日期：2021-01-30 基金项目：中国石油大学（北京）科研基金资助项目(2462020YXZZ023)深度学习中注意力机制研究进展刘建伟苣，刘俊文，罗雄麟中国石油大学 (北京) 自动化系，北京 102249 苣通信作者， E-mail: liujw@cup.edu.cn 摘要对注意力机制的主流模型进行了全面系统的概述. 注意力机制模拟人类视觉选择性的机制，其核心的目的是从冗杂的信息中选择出对当前任务目标关联性更大、更关键的信息而过滤噪声，也就是高效率信息选择和关注机制. 首先简要介绍和定义了注意力机制的原型，接着按照多个层面对各种注意力机制结构进行分类，然后对注意力机制的可解释性进行了阐述同时总结了在各种领域的应用，最后指出了注意力机制未来的发展方向以及会面临的挑战. 关键词注意力机制；全局/局部注意力机制；硬/软注意力机制；自注意力机制；可解释性分类号 TP181 Research progress in attention mechanism in deep learning LIU Jian-wei苣，LIU Jun-wen，LUO Xiong-lin Department of Automation, China University of Petroleum, Beijing 102249, China 苣 Corresponding author, E-mail: liujw@cup.edu.cn ABSTRACT There are two challenges with the traditional encoder–decoder framework. First, the encoder needs to compress all the necessary information of a source sentence into a fixed-length vector. Second, it is unable to model the alignment between the source and the target sentences, which is an essential aspect of structured output tasks, such as machine translation. To address these issues, the attention mechanism is introduced to the encoder–decoder model. This mechanism allows the model to align and translate by jointly learning a neural machine translation task. The whose core idea of this mechanism is to induce attention weights over the source sentences to prioritize the set of positions where relevant information is present for generating the next output token. Nowadays, this mechanism has become essential in neural networks, which have been researched for diverse applications. The present survey provides a systematic and comprehensive overview of the developments in attention modeling. The intuition behind attention modeling can be best explained by the simulation mechanism of human visual selectivity, which aims to select more relevant and critical information from tedious information for the current target task while ignoring other irrelevant information in a manner that assists in developing perception. In addition, attention mechanism is an efficient information selection and widely used in deep learning fields in recent years and played a pivotal role in natural language processing, speech recognition, and computer vision. This survey first briefly introduces the origin of the attention mechanism and defines a standard parametric and uniform model for encoder–decoder neural machine translation. Next, various techniques are grouped into coherent categories using types of alignment scores and number of sequences, abstraction levels, positions, and representations. A visual explanation of attention mechanism is then provided to a certain extent, and roles of attention mechanism in multiple application areas is summarized. Finally, this survey identified the future direction and challenges of the attention mechanism. KEY WORDS attention mechanism；global/local attention；hard/soft attention；self-attention；interpretability 收稿日期: 2021−01−30 基金项目: 中国石油大学（北京）科研基金资助项目（2462020YXZZ023）工程科学学报，第 43 卷，第 11 期：1499−1511，2021 年 11 月 Chinese Journal of Engineering, Vol. 43, No. 11: 1499−1511, November 2021 https://doi.org/10.13374/j.issn2095-9389.2021.01.30.005; http://cje.ustb.edu.cn

<<向上翻页向下翻页>>

点击下载：《工程科学学报》：深度学习中注意力机制研究进展