IEEE TRANSACTIONS ON MULTIMEDIA.VOL 12.NO.7.NOVEMBER 2010 717 Multi-View Video Summarization Yanwei Fu,Yanwen Guo,Yanshu Zhu,Feng Liu,Chuanming Song,and Zhi-Hua Zhou,Senior Member;IEEE Abstract-Previous video summarization studies focused on instance,watching a large number of videos to grasp important monocular videos,and the results would not be good if they were information quickly is a big challenge. applied to multi-view videos directly,due to problems such as the redundancy in multiple views.In this paper,we present a method Video summarization,as an important video content service, for summarizing multi-view videos.We construct a spatio-tem- produces a condensed and succinct representation of video poral shot graph and formulate the summarization problem as content,which facilitates the browsing,retrieval,and storage a graph labeling task.The spatio-temporal shot graph is derived of the original videos.There has been a rich literature on from a hypergraph,which encodes the correlations with different summarizing a long video into a concise representation,such as attributes among multi-view video shots in hyperedges.We then partition the shot graph and identify clusters of event-centered a key-frame sequence [1]-[6]and a video skim [7]-[20].These shots with similar contents via random walks.The summarization existing methods provide effective solutions to summarization. result is generated through solving a multi-objective optimization However,they focus on monocular videos.Multi-view video problem based on shot importance evaluated using a Gaussian en- summarization has been rarely addressed,though multi-view tropy fusion scheme.Different summarization objectives,such as minimum summary length and maximum information coverage, videos are widely used in surveillance systems equipped in can be accomplished in the framework.Moreover,multi-level sum- offices,banks,factories,and crossroads of cities for private and marization can be achieved easily by configuring the optimization public securities.For the all-weather,day,and night multi-view parameters.We also propose the multi-view storyboard and event surveillance systems,video data recorded increases dramati- board for presenting multi-view summaries.The storyboard naturally reflects correlations among multi-view summarized cally every day.In addition to surveillance,multi-view videos shots that describe the same important event.The event-board are also popular in sports broadcast.For example,in the soccer serially assembles event-centered multi-view shots in temporal match,the cameramen usually replay the goals recorded by dif- order.Single video summary which facilitates quick browsing of ferent cameras distributed in the football stadium.Multi-view the summarized multi-view video can be easily generated based on the event board representation. video summarization refers to the problem of summarizing multi-view videos into informative video summaries,usually Index Terms-Multi-objective optimization,multi-view video, presented as dynamic video shots coming from multi-views,by random walks,spatio-temporal graph,video summarization. considering content correlations within each view and among multiple views.The multi-view summaries will provide salient I.INTRODUCTION events with more rich information than less salient ones.This T ITH the rapid development of computation,com- will allow the user to grasp the important information from mul- tiple perspectives of the multi-view videos without watching munication,and storage infrastructures,multi-view the whole of them.Multi-view summarization will also benefit video systems that simultaneously capture a group of videos the storage,analysis,and management of multi-view video and record the video content of the occurrence of events with content. considerable overlapping field of views(FOVs)across multiple Applying the existing monocular video summarization cameras have become more and more popular.In contrast to the methods to each component of a multi-view video group could rapid development of video collection and storage techniques, lead to a redundant summarization result as each component consuming these multi-view videos still remains a problem.For has overlapping information with the others.To generate a concise multi-view video summary,information correlations Manuscript received August 28,2009;revised December 21,2009 and as well as discrepancies among multi-view videos should be March 26.2010:accepted May 18,2010.Date of publication June 07. 2010:date of current version October 15.2010.This work was supported in taken into account.It is also not good to directly apply previous part by the National Science Foundation of China under Grants 60703084. methods to the video sequence formed by simply combining 60723003,and 60721002,the National Fundamental Research Program of the multi-view videos.Furthermore.since multi-view videos China (2010CB327903),the Jiangsu Science Foundation (BK2009081),and often suffer from different lighting conditions in distinctive the Jiangsu 333 High-Level Talent Cultivation Program.The associate editor coordinating the review of this manuscript and approving it for publication was views.it is nontrivial to evaluate the importance of shots in Dr.Zhu Liu. each view video and to merge each component into an integral Y.Fu,Y.Zhu,C.Song,and Z-H.Zhou are with the National Key Lab video summary in a robust way,especially when the multi-view for Novel Software Technology,Nanjing University,Nanjing 210093,China (e-mail:ztwztq2006@gmail.com;yszhu@cs.hku.hk;chmsong @graphics.nju. videos are captured nonsynchronously.It is thus important to edu.cn:zhouzh@nju.edu.cn). have effective multi-view summarization techniques. Y.Guo is with the National Key Lab for Novel Software Technology,Nan- In this paper,we present a method for the summarization of jing University,Nanjing 210093,China,and also with the Jiangyin Information Technology Research Institute of Nanjing University (e-mail:ywguo@nju.edu. multi-view videos.We first parse the video from each view into cn). shots.Content correlations among multi-view shots are impor- F.Liu is with the Department of Computer Sciences.University of Wis- tant to produce an informative and compact summary.We use consin-Madison.Madison.WI 53562 USA (e-mail:fliu@cs.wisc.edu). Color versions of one or more of the figures in this paper are available online a hypergraph to model such correlations,in which each kind at http://ieeexplore.ieee.org. of hyperedge characterizes a kind of correlation among shots. Digital Object Identifier 10.1109/TMM.2010.2052025 By converting the hypergraph into a spatio-temporal shot graph, 1520-9210/S26.00©2010IEEEIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 7, NOVEMBER 2010 717 Multi-View Video Summarization Yanwei Fu, Yanwen Guo, Yanshu Zhu, Feng Liu, Chuanming Song, and Zhi-Hua Zhou, Senior Member, IEEE Abstract—Previous video summarization studies focused on monocular videos, and the results would not be good if they were applied to multi-view videos directly, due to problems such as the redundancy in multiple views. In this paper, we present a method for summarizing multi-view videos. We construct a spatio-temporal shot graph and formulate the summarization problem as a graph labeling task. The spatio-temporal shot graph is derived from a hypergraph, which encodes the correlations with different attributes among multi-view video shots in hyperedges. We then partition the shot graph and identify clusters of event-centered shots with similar contents via random walks. The summarization result is generated through solving a multi-objective optimization problem based on shot importance evaluated using a Gaussian entropy fusion scheme. Different summarization objectives, such as minimum summary length and maximum information coverage, can be accomplished in the framework. Moreover, multi-level summarization can be achieved easily by configuring the optimization parameters. We also propose the multi-view storyboard and event board for presenting multi-view summaries. The storyboard naturally reflects correlations among multi-view summarized shots that describe the same important event. The event-board serially assembles event-centered multi-view shots in temporal order. Single video summary which facilitates quick browsing of the summarized multi-view video can be easily generated based on the event board representation. Index Terms—Multi-objective optimization, multi-view video, random walks, spatio-temporal graph, video summarization. I. INTRODUCTION WITH the rapid development of computation, communication, and storage infrastructures, multi-view video systems that simultaneously capture a group of videos and record the video content of the occurrence of events with considerable overlapping field of views (FOVs) across multiple cameras have become more and more popular. In contrast to the rapid development of video collection and storage techniques, consuming these multi-view videos still remains a problem. For Manuscript received August 28, 2009; revised December 21, 2009 and March 26, 2010; accepted May 18, 2010. Date of publication June 07, 2010; date of current version October 15, 2010. This work was supported in part by the National Science Foundation of China under Grants 60703084, 60723003, and 60721002, the National Fundamental Research Program of China (2010CB327903), the Jiangsu Science Foundation (BK2009081), and the Jiangsu 333 High-Level Talent Cultivation Program. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Zhu Liu. Y. Fu, Y. Zhu, C. Song, and Z.-H. Zhou are with the National Key Lab for Novel Software Technology, Nanjing University, Nanjing 210093, China (e-mail: ztwztq2006@gmail.com; yszhu@cs.hku.hk; chmsong@graphics.nju. edu.cn; zhouzh@nju.edu.cn). Y. Guo is with the National Key Lab for Novel Software Technology, Nanjing University, Nanjing 210093, China, and also with the Jiangyin Information Technology Research Institute of Nanjing University (e-mail: ywguo@nju.edu. cn). F. Liu is with the Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53562 USA (e-mail: fliu@cs.wisc.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2010.2052025 instance, watching a large number of videos to grasp important information quickly is a big challenge. Video summarization, as an important video content service, produces a condensed and succinct representation of video content, which facilitates the browsing, retrieval, and storage of the original videos. There has been a rich literature on summarizing a long video into a concise representation, such as a key-frame sequence [1]–[6] and a video skim [7]–[20]. These existing methods provide effective solutions to summarization. However, they focus on monocular videos. Multi-view video summarization has been rarely addressed, though multi-view videos are widely used in surveillance systems equipped in offices, banks, factories, and crossroads of cities for private and public securities. For the all-weather, day, and night multi-view surveillance systems, video data recorded increases dramatically every day. In addition to surveillance, multi-view videos are also popular in sports broadcast. For example, in the soccer match, the cameramen usually replay the goals recorded by different cameras distributed in the football stadium. Multi-view video summarization refers to the problem of summarizing multi-view videos into informative video summaries, usually presented as dynamic video shots coming from multi-views, by considering content correlations within each view and among multiple views. The multi-view summaries will provide salient events with more rich information than less salient ones. This will allow the user to grasp the important information from multiple perspectives of the multi-view videos without watching the whole of them. Multi-view summarization will also benefit the storage, analysis, and management of multi-view video content. Applying the existing monocular video summarization methods to each component of a multi-view video group could lead to a redundant summarization result as each component has overlapping information with the others. To generate a concise multi-view video summary, information correlations as well as discrepancies among multi-view videos should be taken into account. It is also not good to directly apply previous methods to the video sequence formed by simply combining the multi-view videos. Furthermore, since multi-view videos often suffer from different lighting conditions in distinctive views, it is nontrivial to evaluate the importance of shots in each view video and to merge each component into an integral video summary in a robust way, especially when the multi-view videos are captured nonsynchronously. It is thus important to have effective multi-view summarization techniques. In this paper, we present a method for the summarization of multi-view videos. We first parse the video from each view into shots. Content correlations among multi-view shots are important to produce an informative and compact summary. We use a hypergraph to model such correlations, in which each kind of hyperedge characterizes a kind of correlation among shots. By converting the hypergraph into a spatio-temporal shot graph, 1520-9210/$26.00 © 2010 IEEE