正在加载图片...
工程科学学报.第42卷,第5期:557-569.2020年5月 Chinese Journal of Engineering,Vol.42,No.5:557-569,May 2020 https://doi.org/10.13374/j.issn2095-9389.2019.03.21.003;http://cje.ustb.edu.cn 多模态学习方法综述 陈 鹏2,李擎,2)四,张德政3,杨宇航,蔡铮,陆子怡) 1)北京科技大学自动化学院,北京1000832)工业过程知识自动化教育部重点实验室,北京1000833)北京科技大学计算机与通信工程 学院.北京1000834)材料领域知识工程北京市重点实验室,北京100083 ☒通信作者,E-mail:liging@ies.ustb.edu.cn 摘要大数据是多源异构的.在信息技术飞速发展的今天,多模态数据已成为近来数据资源的主要形式.研究多模态学习 方法,赋予计算机理解多源异构海量数据的能力具有重要价值.本文归纳了多模态的定义与多模态学习的基本任务,介绍了 多模态学习的认知机理与发展过程在此基础上,重点综述了多模态统计学习方法与深度学习方法.此外,本文系统归纳了 近两年较为新颖的基于对抗学习的跨模态匹配与生成技术.本文总结了多模态学习的主要形式,并对未来可能的研究方向 进行思考与展望 关键词多模态学习:统计学习:深度学习:对抗学习:特征表示 分类号TP18 A survey of multimodal machine learning CHEN Peng2),LI Qing,ZHANG De-zheng,YANG Yu-hang,CAl Zheng,LU Zi-yi) 1)School of Automation and Electrical Engineering,University of Science and Technology Beijing,Beijing 100083,China 2)Key Laboratory of Knowledge Automation for Industrial Processes,Ministry of Education,Beijing 100083,China 3)School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China 4)Beijing Key Laboratory of Knowledge Engineering for Materials Science,Beijing 100083,China Corresponding author,E-mail:liqing@ies.ustb.edu.cn ABSTRACT "Big data"is always collected from different resources that have different data structures.With the rapid development of information technologies,current precious data resources are characteristic of multimodes.As a result,based on classical machine learning strategies,multi-modal learning has become a valuable research topic,enabling computers to process and understand "big data".The cognitive processes of humans involve perception through different sense organs.Signals from eyes,ears,the nose,and hands (tactile sense)constitute a person's understanding of a special scene or the world as a whole.It reasonable to believe that multi-modal methods involving a higher ability to process complex heterogeneous data can further promote the progress of information technologies. The concepts of multimodality stemmed from psychology and pedagogy from hundreds of years ago and have been popular in computer science during the past decade.In contrast to the concept of"media",a"mode"is a more fine-grained concept that is associated with a typical data source or data form.The effective utilization of multi-modal data can aid a computer understand a specific environment in a more holistic way.In this context,we first introduced the definition and main tasks of multi-modal learning.Based on this information, the mechanism and origin of multi-modal machine learning were then briefly introduced.Subsequently,statistical learning methods and deep learning methods for multi-modal tasks were comprehensively summarized.We also introduced the main styles of data fusion in multi-modal perception tasks,including feature representation,shared mapping,and co-training.Additionally,novel adversarial learning strategies for cross-modal matching or generation were reviewed.The main methods for multi-modal learning were outlined in this paper 收稿日期:2019-03-21 基金项目:国家重点研发计划(云计算和大数据专项)资助项目(2017YFB1002304)多模态学习方法综述 陈    鹏1,2),李    擎1,2) 苣,张德政3,4),杨宇航1),蔡    铮1),陆子怡1) 1) 北京科技大学自动化学院,北京 100083    2) 工业过程知识自动化教育部重点实验室,北京 100083    3) 北京科技大学计算机与通信工程 学院,北京 100083    4) 材料领域知识工程北京市重点实验室,北京 100083 苣通信作者,E-mail:liqing@ies.ustb.edu.cn 摘    要    大数据是多源异构的. 在信息技术飞速发展的今天,多模态数据已成为近来数据资源的主要形式. 研究多模态学习 方法,赋予计算机理解多源异构海量数据的能力具有重要价值. 本文归纳了多模态的定义与多模态学习的基本任务,介绍了 多模态学习的认知机理与发展过程. 在此基础上,重点综述了多模态统计学习方法与深度学习方法. 此外,本文系统归纳了 近两年较为新颖的基于对抗学习的跨模态匹配与生成技术. 本文总结了多模态学习的主要形式,并对未来可能的研究方向 进行思考与展望. 关键词    多模态学习;统计学习;深度学习;对抗学习;特征表示 分类号    TP18 A survey of multimodal machine learning CHEN Peng1,2) ,LI Qing1,2) 苣 ,ZHANG De-zheng3,4) ,YANG Yu-hang1) ,CAI Zheng1) ,LU Zi-yi1) 1) School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China 2) Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing 100083, China 3) School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China 4) Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China 苣 Corresponding author, E-mail: liqing@ies.ustb.edu.cn ABSTRACT    “Big data” is always collected from different resources that have different data structures. With the rapid development of information  technologies,  current  precious  data  resources  are  characteristic  of  multimodes.  As  a  result,  based  on  classical  machine learning  strategies,  multi-modal  learning  has  become  a  valuable  research  topic,  enabling  computers  to  process  and  understand “big data”. The cognitive processes of humans involve perception through different sense organs. Signals from eyes, ears, the nose, and hands (tactile sense) constitute a person’s understanding of a special scene or the world as a whole. It reasonable to believe that multi-modal methods involving a higher ability to process complex heterogeneous data can further promote the progress of information technologies. The concepts of multimodality stemmed from psychology and pedagogy from hundreds of years ago and have been popular in computer science during the past decade. In contrast to the concept of “media”, a “mode” is a more fine-grained concept that is associated with a typical data source or data form. The effective utilization of multi-modal data can aid a computer understand a specific environment in a more holistic way. In this context, we first introduced the definition and main tasks of multi-modal learning. Based on this information, the mechanism and origin of multi-modal machine learning were then briefly introduced. Subsequently, statistical learning methods and deep learning methods for multi-modal tasks were comprehensively summarized. We also introduced the main styles of data fusion in multi-modal perception tasks, including feature representation, shared mapping, and co-training. Additionally, novel adversarial learning strategies for cross-modal matching or generation were reviewed. The main methods for multi-modal learning were outlined in this paper 收稿日期: 2019−03−21 基金项目: 国家重点研发计划(云计算和大数据专项)资助项目(2017YFB1002304) 工程科学学报,第 42 卷,第 5 期:557−569,2020 年 5 月 Chinese Journal of Engineering, Vol. 42, No. 5: 557−569, May 2020 https://doi.org/10.13374/j.issn2095-9389.2019.03.21.003; http://cje.ustb.edu.cn
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有