第11卷第3期 智能系统学报 Vol.11 No.3 2016年6月 CAAI Transactions on Intelligent Systems Jun.2016 D0I:10.11992/is.201603040 网络出版地址:http://www.cnki.net/kcms/detail/23.1538.TP.20160513.0957.032.html 一种多模态融合的网络视频相关性度量方法 温有福2,贾彩燕,陈智能2 (1.北京交通大学交通数据分析与数据挖掘北京市重点实验室,北京100044:2.中国科学院自动化研究所数字内 容技术与服务研究中心,北京100190) 摘要:随着网络和多媒体技术的发展,视频分享网站中的网络视频数量呈爆炸式增长。海量视频库中的高精度视 频检索、分类、标注等任务成为亟待解决的研究问题。视频间的相关性度量是这些问题所面临的一个共性基础技 术。本文从视频视觉内容,视频标题和标签文本,以及视频上传时间、类别、作者3种人与视频交互产生的社会特征 等多源异构信息出发,提出一种新颖的多模态融合的网络视频相关性度量方法,并将所获相关性应用到大规模视频 检索任务中。YouTube数据上的实验结果显示:相对于传统单一文本特征、单一视觉特征的检索方案,以及文本和视 觉特征相融合的检索方案,文本视觉和用户社会特征多模态融合方法表现出更好的性能。 关键词:网络视频;海量视频;社会特征;交互;多源异构信息;多模态信息融合;相关性度量;视频检索 中图分类号:TP393文献标志码:A文章编号:1673-4785(2016)03-0359-07 中文引用格式:温有福,贾彩燕,陈智能.一种多模态融合的网络视频相关性度量方法[J].智能系统学报,2016,11(3):359-365. 英文引用格式:WEN Youfu,JIA Caiyan,.CHEN Zhineng..A multi-modal fusion approach for measuring web video relatedness[J]. CAAI transactions on intelligent systems,2016,11(3):359-365. A multi-modal fusion approach for measuring web video relatedness WEN Youfu'2,JIA Caiyan',CHEN Zhineng? (1.Beijing Key Lab of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China;2.Interactive Media Re- search and Services Center,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China) Abstract:With the advances in internet and multimedia technologies,the number of web videos on social video platforms rapidly grows.Therefore,tasks such as large-scale video retrieval,classification,and annotation become issues that need to be urgently addressed.Web video relatedness serves as a basic and common infrastructure for these issues.This paper investigates the measurement of web video relatedness from a multi-modal fusion perspec- tive.It proposes to measure web video relatedness based on multi-source heterogeneous information.The multi-mo- dal fusion simultaneously leverages videos'visual content,title,and tag text as well as social features contributed by human-video interactions (i.e.,the upload time,channel,and author of a video).Consequently,a novel multi- modal fusion approach is proposed for computing web video relatedness,which serves to give a ranking criterion and is applied to the task of large-scale video retrieval.Experimental results using YouTube videos show that the pro- posed text,visual,and users'social feature multi-modal fusion approach performs best in comparison tests with three alternate approaches;i.e.,those approaches that compute web video relatedness based just on text features, just on visual features,or jointly on text and visual features. Keywords:web video;large-scale video;social feature;human-video interactions;multi-source heterogeneous in- formation;social features;multi-modal fusion;relatedness measurement;video retrieval 视频是集图像、声音和文字信息于一体的多源 信息载体,其丰富直观的表达形式非常契合人类接 受信息的方式。随着网络和多媒体技术的快速发 收稿日期:2016-03-19.网络出版日期:2016-05-13. 基金项目:国家自然科学基金项目(61473030,61303175):重点大学研 展,在线视频服务正在以不可阻挡之势在互联网平 究基金项目(2014JBM031):重点实验室数字媒体技术开放 台上蓬勃发展。成立于2005年的视频分享网站 课题 通信作者:贾彩燕.E-mail:cyjia(@bjtu.edu.cm. YouTube,目前已成为世界第三大网站和第二大搜第 11 卷第 3 期 智 能 系 统 学 报 Vol.11 №.3 2016 年 6 月 CAAI Transactions on Intelligent Systems Jun. 2016 DOI:10.11992 / tis.201603040 网络出版地址:http: / / www.cnki.net / kcms/ detail / 23.1538.TP.20160513.0957.032.html 一种多模态融合的网络视频相关性度量方法 温有福1,2 ,贾彩燕1 ,陈智能2 (1.北京交通大学 交通数据分析与数据挖掘北京市重点实验室,北京 100044; 2. 中国科学院自动化研究所 数字内 容技术与服务研究中心,北京 100190) 摘 要:随着网络和多媒体技术的发展,视频分享网站中的网络视频数量呈爆炸式增长。 海量视频库中的高精度视 频检索、分类、标注等任务成为亟待解决的研究问题。 视频间的相关性度量是这些问题所面临的一个共性基础技 术。 本文从视频视觉内容,视频标题和标签文本,以及视频上传时间、类别、作者 3 种人与视频交互产生的社会特征 等多源异构信息出发,提出一种新颖的多模态融合的网络视频相关性度量方法,并将所获相关性应用到大规模视频 检索任务中。 YouTube 数据上的实验结果显示:相对于传统单一文本特征、单一视觉特征的检索方案,以及文本和视 觉特征相融合的检索方案,文本视觉和用户社会特征多模态融合方法表现出更好的性能。 关键词:网络视频; 海量视频;社会特征; 交互;多源异构信息;多模态信息融合; 相关性度量; 视频检索 中图分类号:TP393 文献标志码:A 文章编号:1673⁃4785(2016)03⁃0359⁃07 中文引用格式:温有福,贾彩燕,陈智能.一种多模态融合的网络视频相关性度量方法[J]. 智能系统学报, 2016, 11(3): 359⁃365. 英文引用格式:WEN Youfu, JIA Caiyan, CHEN Zhineng. A multi⁃modal fusion approach for measuring web video relatedness[J]. CAAI transactions on intelligent systems, 2016,11(3): 359⁃365. A multi⁃modal fusion approach for measuring web video relatedness WEN Youfu 1,2 , JIA Caiyan 1 , CHEN Zhineng 2 (1. Beijing Key Lab of Traffic Data Analysis and Mining,Beijing Jiaotong University, Beijing 100044, China; 2. Interactive Media Re⁃ search and Services Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China) Abstract:With the advances in internet and multimedia technologies, the number of web videos on social video platforms rapidly grows. Therefore, tasks such as large⁃scale video retrieval, classification, and annotation become issues that need to be urgently addressed. Web video relatedness serves as a basic and common infrastructure for these issues. This paper investigates the measurement of web video relatedness from a multi⁃modal fusion perspec⁃ tive. It proposes to measure web video relatedness based on multi⁃source heterogeneous information. The multi⁃mo⁃ dal fusion simultaneously leverages videos' visual content, title, and tag text as well as social features contributed by human⁃video interactions (i.e., the upload time, channel, and author of a video). Consequently, a novel multi⁃ modal fusion approach is proposed for computing web video relatedness, which serves to give a ranking criterion and is applied to the task of large⁃scale video retrieval. Experimental results using YouTube videos show that the pro⁃ posed text, visual, and users' social feature multi⁃modal fusion approach performs best in comparison tests with three alternate approaches; i.e., those approaches that compute web video relatedness based just on text features, just on visual features, or jointly on text and visual features. Keywords:web video; large⁃scale video; social feature; human⁃video interactions; multi⁃source heterogeneous in⁃ formation; social features; multi⁃modal fusion; relatedness measurement; video retrieval 收稿日期:2016⁃03⁃19. 网络出版日期:2016⁃05⁃13. 基金项目:国家自然科学基金项目( 61473030,61303175);重点大学研 究基金项目( 2014JBM031);重点实验室数字媒体技术开放 课题 通信作者:贾彩燕. E⁃mail:cyjia@ bjtu.edu.cn. 视频是集图像、声音和文字信息于一体的多源 信息载体,其丰富直观的表达形式非常契合人类接 受信息的方式。 随着网络和多媒体技术的快速发 展,在线视频服务正在以不可阻挡之势在互联网平 台上蓬勃发展。 成立于 2005 年的视频分享网站 YouTube,目前已成为世界第三大网站和第二大搜
©2008-现在 cucdc.com 高等教育资讯网 版权所有