Portrait, multiple screens Landscape,_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）A Large-Scale Short Video Dataset for Near Duplicate Video Retrieval

正在加载图片...

Figure 2.Example of query videos in SVD.Each block represents a video with multiple frames 3.2.Labeled Set To construct the labeled set,we first choose some videos as candidate videos for annotation.All candidate videos are divided into positive (near-duplicate)candidate videos Query video and negative candidate videos,which respectively denote the videos we expect to be annotated (labeled)as positive and negative videos of the corresponding query videos. To mine hard positive/negative candidate videos for an- notation.we utilize multiple strategies to select candidate videos from the ambient set.The strategies include iterative retrieval,transformed retrieval,and feature based mining. Among these strategies,the first two strategies are mainly used for mining hard positive candidate videos and the last strategy is used for mining hard negative candidate videos. Oucry vidco Positive candidate We collect nearly 50.000 video pairs for annotation. Figure 3.Example of hard positive candidate videos.Top row: These video pairs are labeled by human annotators.Annota- side mirrored.color-filtered,and watermasked.Middle row:hori- tion costs over 800 hours in total.After removing the videos zontal screen changed to vertical screen with large black margins. inappropriate for public release,we collect 1,206 queries Bottom row:rotated. and 34,020 labeled videos.In the rest of this subsection,we will describe the details of the three strategies for selecting candidate videos. videos as queries to search over the ambient set.Specifical- Iterative Retrieval To mine hard positive candidate videos, ly,we utilize LBP,BSIFT,and deep features based retrieval we utilize an interactive retrieval method to annotate the methods to select the candidate videos.Then we select the positive candidate videos.This method can be divided in- top-5 to top-10 results as candidate videos for further hu- to the following three steps.Firstly,for a given query man annotation. video,it retrieves through the ambient set to get the can- In Figure 3.we show some query videos and their hard didates by using a variety of methods,including LBP[21] positive candidate videos mined by interactive retrieval and and BSIFT [35]feature based retrieval methods.Secondly, transformed retrieval.In Figure 3,the candidate videos are human annotators label these candidates for each query and near-duplicate videos by various transformations including select the positive ones.Lastly,the selected positive videos mirror transformation,color-filtered transformation,black are further fed into the first step to retrieve more positive border insertion,and rotation transformation. candidates.The whole process is repeated for several times Feature based Mining To mine hard negative candidate until no more positive videos can be found for a given query. videos,we select 30,000 videos as candidate videos from Because the interactive retrieval procedure requires low the ambient set which were uploaded from June 2018 to latency,we only employ LBP [21]and BSIFT [35]features August 2018.As the uploading dates of these candidate during this procedure.More advanced features and similar- videos are earlier than those of the videos in our query ity calculation methods are utilized for the following trans- set,we can expect that most candidate videos are not near- formed retrieval procedure. duplicate videos of the query videos.We extract different Transformed Retrieval We also apply various transforma- types of features to calculate the similarity between candi- tions,such as rotation and cropping,on query videos to dates and query videos.The features include hand-crafted get transformed videos.And then we use the transformed features (LBP and BSIFT)and deep features.For eachPortrait, multiple screens Landscape, horizontal screen Game video, vertical screen Building, vertical screen Animation, vertical screen Pet, vertical screen Portrait, vertical screen Animation, horizontal screen Figure 2. Example of query videos in SVD. Each block represents a video with multiple frames. 3.2. Labeled Set To construct the labeled set, we first choose some videos as candidate videos for annotation. All candidate videos are divided into positive (near-duplicate) candidate videos and negative candidate videos, which respectively denote the videos we expect to be annotated (labeled) as positive and negative videos of the corresponding query videos. To mine hard positive/negative candidate videos for annotation, we utilize multiple strategies to select candidate videos from the ambient set. The strategies include iterative retrieval, transformed retrieval, and feature based mining. Among these strategies, the first two strategies are mainly used for mining hard positive candidate videos and the last strategy is used for mining hard negative candidate videos. We collect nearly 50,000 video pairs for annotation. These video pairs are labeled by human annotators. Annotation costs over 800 hours in total. After removing the videos inappropriate for public release, we collect 1,206 queries and 34,020 labeled videos. In the rest of this subsection, we will describe the details of the three strategies for selecting candidate videos. Iterative Retrieval To mine hard positive candidate videos, we utilize an interactive retrieval method to annotate the positive candidate videos. This method can be divided into the following three steps. Firstly, for a given query video, it retrieves through the ambient set to get the candidates by using a variety of methods, including LBP [21] and BSIFT [35] feature based retrieval methods. Secondly, human annotators label these candidates for each query and select the positive ones. Lastly, the selected positive videos are further fed into the first step to retrieve more positive candidates. The whole process is repeated for several times until no more positive videos can be found for a given query. Because the interactive retrieval procedure requires low latency, we only employ LBP [21] and BSIFT [35] features during this procedure. More advanced features and similarity calculation methods are utilized for the following transformed retrieval procedure. Transformed Retrieval We also apply various transformations, such as rotation and cropping, on query videos to get transformed videos. And then we use the transformed Query video Positive candidate Query video Positive candidate Query video Positive candidate Figure 3. Example of hard positive candidate videos. Top row: side mirrored, color-filtered, and watermasked. Middle row: horizontal screen changed to vertical screen with large black margins. Bottom row: rotated. videos as queries to search over the ambient set. Specifically, we utilize LBP, BSIFT, and deep features based retrieval methods to select the candidate videos. Then we select the top-5 to top-10 results as candidate videos for further human annotation. In Figure 3, we show some query videos and their hard positive candidate videos mined by interactive retrieval and transformed retrieval. In Figure 3, the candidate videos are near-duplicate videos by various transformations including mirror transformation, color-filtered transformation, black border insertion, and rotation transformation. Feature based Mining To mine hard negative candidate videos, we select 30,000 videos as candidate videos from the ambient set which were uploaded from June 2018 to August 2018. As the uploading dates of these candidate videos are earlier than those of the videos in our query set, we can expect that most candidate videos are not nearduplicate videos of the query videos. We extract different types of features to calculate the similarity between candidates and query videos. The features include hand-crafted features (LBP and BSIFT) and deep features. For each

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）A Large-Scale Short Video Dataset for Near Duplicate Video Retrieval