正在加载图片...
SVD:A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval Qing-Yuan Jiang',Yi Het,Gen Lit,Jian Lint,Lei Lit and Wu-Jun Lit fNational Key Laboratory for Novel Software Technology, Department of Computer Science and Technology,Nanjing University,Nanjing,China ByteDance AI Lab,Beijing,China jiangqy@lamda.nju.edu.cn,[heyi,ligen.lab,lileilab}@bytedance.com, linj@lamda.nju.edu.cn,liwujun@nju.edu.cn Abstract from a large-scale video database.NDVR aims to retrieve the near-duplicate videos from a massive video database. With the explosive growth of video data in real appli- where near-duplicate videos are defined as videos that are cations,near-duplicate video retrieval(NDVR)has become visually close to the original videos [321.For example,the indispensable and challenging,especially for short videos. videos might be slightly modified by the users to bypass However,all existing NDVR datasets are introduced for the detection,and the modified videos can be treated as long videos.Furthermore,most of them are small-scale and near-duplicate videos of the original videos.These modi- lack of diversity due to the high cost of collecting and la- fications can be caption insertion,border insertion and so beling near-duplicate videos.In this paper,we introduce on.An NDVR system has been a necessity on content plat- a large-scale short video dataset,called SVD,for the ND- forms with many applications,including video recommen- VR task.SVD contains over 500,000 short videos and over dation,video search,and copyright infringement detection. 30,000 labeled videos of near-duplicates. We use multi- Hence,NDVR has become a hot research topic,and there ple video mining techniques to construct positive/negative have appeared a lot of methods for NDVR [32,10,8,4,33, pairs.Furthermore,we design temporal and spatial trans- 29,1,24,16,18,2,23,13,30,19,6. formations to mimic user-attack behavior in real applica- Existing NDVR methods can be divided as video-level tions for constructing more difficult variants of SVD.Ex- methods and frame-level methods.Video-level method- periments show that existing state-of-the-art NDVR method- s,including layer-wise convolutional neural network (C- s,including real-value based and hashing based methods, NNL)[12],vector-wise convolutional neural network (C- fail to achieve satisfactory performance on this challenging NNV)[12]and deep metric learning (DML)[13],try to dataset.The release of SVD dataset will foster research and represent each video as a global feature.Frame-level meth- system engineering in the NDVR area.The SVD dataset is ods,including spatio-temporal post-filtering [4],circulant available at https://svdbase.github.io. temporal encoding (CTE)[24]and temporal matching k- ernel (TMK)[23],extract features for each frame of the video.In the meantime,to advance the research of ND- 1.Introduction VR,several video datasets have been introduced in recen- t years,including CCWEB [32].UQ_VIDEO [29],VCD- Over the past decades,we have witnessed the explosive B [9],MUSCLE_VCD [14],TRECVID [22]and so on. growth of video data in a variety of video sharing web- However,all of them are for long videos with average dura- sites like YouTube,Instagram2,and TikTok3.For exam- tion longer than 60 seconds. ple,400 hours of new videos were uploaded to Youtube ev- In recent years,short videos with duration less than 60 ery minute and one billion hours of content was watched seconds have become increasingly popular on social me- on YouTube every day in February 20174.With billions of dia platforms.Users have strong incentive to copy a hot videos being available on the internet,it becomes a major short video and upload a modified version on these plat- challenge to perform near-duplicate video retrieval(NDVR) forms to gain attention.With the increasing in short video data,there appear new difficulties and challenges for detect- Ihttps://www.youtube.com 2https://www.instagram.com ing near-duplicate short videos.Some of the new difficul- 3https://www.tiktok.com ties and challenges are listed as follows.Firstly,most long 4https://en.wikipedia.org/wiki/YouTube videos are generated by professional photographers withSVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval Qing-Yuan Jiang† , Yi He‡ , Gen Li‡ , Jian Lin† , Lei Li‡ and Wu-Jun Li† †National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing, China ‡ByteDance AI Lab, Beijing, China jiangqy@lamda.nju.edu.cn,{heyi,ligen.lab,lileilab}@bytedance.com, linj@lamda.nju.edu.cn,liwujun@nju.edu.cn Abstract With the explosive growth of video data in real appli￾cations, near-duplicate video retrieval (NDVR) has become indispensable and challenging, especially for short videos. However, all existing NDVR datasets are introduced for long videos. Furthermore, most of them are small-scale and lack of diversity due to the high cost of collecting and la￾beling near-duplicate videos. In this paper, we introduce a large-scale short video dataset, called SVD, for the ND￾VR task. SVD contains over 500,000 short videos and over 30,000 labeled videos of near-duplicates. We use multi￾ple video mining techniques to construct positive/negative pairs. Furthermore, we design temporal and spatial trans￾formations to mimic user-attack behavior in real applica￾tions for constructing more difficult variants of SVD. Ex￾periments show that existing state-of-the-art NDVR method￾s, including real-value based and hashing based methods, fail to achieve satisfactory performance on this challenging dataset. The release of SVD dataset will foster research and system engineering in the NDVR area. The SVD dataset is available at https://svdbase.github.io. 1. Introduction Over the past decades, we have witnessed the explosive growth of video data in a variety of video sharing web￾sites like YouTube1 , Instagram2 , and TikTok3 . For exam￾ple, 400 hours of new videos were uploaded to Youtube ev￾ery minute and one billion hours of content was watched on YouTube every day in February 20174 . With billions of videos being available on the internet, it becomes a major challenge to perform near-duplicate video retrieval (NDVR) 1https://www.youtube.com 2https://www.instagram.com 3https://www.tiktok.com 4https://en.wikipedia.org/wiki/YouTube from a large-scale video database. NDVR aims to retrieve the near-duplicate videos from a massive video database, where near-duplicate videos are defined as videos that are visually close to the original videos [32]. For example, the videos might be slightly modified by the users to bypass the detection, and the modified videos can be treated as near-duplicate videos of the original videos. These modi- fications can be caption insertion, border insertion and so on. An NDVR system has been a necessity on content plat￾forms with many applications, including video recommen￾dation, video search, and copyright infringement detection. Hence, NDVR has become a hot research topic, and there have appeared a lot of methods for NDVR [32, 10, 8, 4, 33, 29, 1, 24, 16, 18, 2, 23, 13, 30, 19, 6]. Existing NDVR methods can be divided as video-level methods and frame-level methods. Video-level method￾s, including layer-wise convolutional neural network (C￾NNL) [12], vector-wise convolutional neural network (C￾NNV) [12] and deep metric learning (DML) [13], try to represent each video as a global feature. Frame-level meth￾ods, including spatio-temporal post-filtering [4], circulant temporal encoding (CTE) [24] and temporal matching k￾ernel (TMK) [23], extract features for each frame of the video. In the meantime, to advance the research of ND￾VR, several video datasets have been introduced in recen￾t years, including CCWEB [32], UQ VIDEO [29], VCD￾B [9], MUSCLE VCD [14], TRECVID [22] and so on. However, all of them are for long videos with average dura￾tion longer than 60 seconds. In recent years, short videos with duration less than 60 seconds have become increasingly popular on social me￾dia platforms. Users have strong incentive to copy a hot short video and upload a modified version on these plat￾forms to gain attention. With the increasing in short video data, there appear new difficulties and challenges for detect￾ing near-duplicate short videos. Some of the new difficul￾ties and challenges are listed as follows. Firstly, most long videos are generated by professional photographers with
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有