724 IEEE TRANSACTIONS ON MULTIMEDIA.VOL.12.NO.7.NOVEMBER 2010 TABLE I DETAILS OF MULTI-VIEW VIDEOS AND SUMMARIES Multi-view No.of Video Length Levels of Level Summary Length (Mins.) 入2 Videos Views (Mins.) Summary Info.Reserved (%) officel 4 11:16/8:43/11:22/14:58 Level 1 1:53 70 campus 4 15:19/13:51/12:30/15:03 Level 1 4:02 60 2:56 office lobby 3 08:14/08:14/08:14 Level 1 60 Level 2 5:I4 70 Level 1 2:21 60 road 3 5:11/8:49/8:46 2 Level 2 4:28 70 Level 1 0:50 60 badminton 3 5:07/5:00/5:00 3 Level 2 108 65 Level 3 2:08 70 0 illustrated in Fig.6.The summarized shots are assembled along with A 0-1 0 is the final optimization the timeline across multi-views.Each shot is represented with 1-1-1 a box and the number in box illustrates the view to which the result to be solved. shot belongs.Dashed blue boxes represent those events that are This integer programming is a typical knapsack problem in recorded by more than one shot or different views.By clicking combinatorial optimization.We use a pseudo-polynomial time on the boxes,the shots can be displayed.Obviously,through the dynamic programming algorithm [52]to solve it.The algorithm event-board,we can easily generate a single video summary that runs fast for all of our experiments. includes all the summarized shots.We show some examples of the single video summary in our demo page.One of its advan- VI.EXPERIMENTS tages over storyboard is that it allows the rapid browse of sum- We conducted experiments on several multi-view videos,in- marized result.If the user needs to browse the summary within cluding typical indoor and outdoor environments.The officel, limited time,the single summary would be a good choice. campus,office lobby,and road videos are typical surveillance A distinct characteristic of the multi-view videos is that the videos,since surveillance videos are one of the most important events are captured with overlapping across multiple views.To multi-view video types.Some multi-view videos are semi-syn- generate a compact yet highly informative summary,it is usually chronous or nonsynchronous.Most multi-view videos are cap- important to summarize a certain event only in the most infor- tured by three or four ordinary cameras with overall 360 de- mative view,and to avoid repetitive summary.This is especially gree coverage of the scene.To further verify our method,we true if the user only hopes to obtain a short length video sum- also deliberately shoot an outdoor scene by four cameras with mary.Our method realizes this.One example is shown in the only 180 degree coverage.Note that all of the videos are cap-summary of multi-view officel videos.In the 24th shot,the girl tured using the web cameras or handheld ordinary video cam- who opened the door and went to her cube is only reserved in eras by nonspecialists,making some of them unstable and ob- the second view,although she appeared in four views simulta- scure.Moreover,some videos have quite different brightness neously.The man who opened the door in the 24th shot and left across multi-views.These issues pose great challenges to the the room in the 25th shot is only reserved in the second view.In multi-view video summarization. this sense,our method can be applied to the selection of optimal Table I shows the information on experimental data.All ex- views.In addition,the method supports summarizing the same perimental results were collected in a PC equipped with P4 event using temporally successive multi-view shots.The event 3.0-GHZ CPU and 1 GB of memory.The multi-view videos as is recorded by the shots describing it with the best views in its well as summaries can be found in the demo page http://cs.nju. duration. edu.cn/ywguo/summarization.html. On the other hand,it is also reasonable to produce a multi- Note that we sacrifice the visual quality of original multi-view summary for the same event.For example,for a traffic ac- view videos to meet the space limitation of online storage by cident,all videos in multi-views are often crucial in responsi- compressing them with high compression ratios. bility identification and verification.Our method handles this Display of multi-view summary.We employ here the multi-case successfully.In the multi-view officel videos,three guys view storyboard to represent the multi-view video summary,as intruded the views and left the room.This action is reserved illustrated in Fig.5.The storyboard naturally reflects spatial and simultaneously in the 22nd shot of the second view and 40th temporal information of the resulting shots as well as their corre- shot of the fourth view.Other typical examples are the 28th and lations,allowing the user to walk through and analyze the sum- 35th shots.30th and 46th shots.and 38th and 49th shots.Such marized shots in a natural and intuitive way.In the storyboard, summaries are attributed to two points.First,the shot impor- each shot in summary is represented by its middle frame.By tance computation algorithm fairly computes the importance of clicking on the yellow block highlighted with corresponding multi-view shots,even in the presence of brightness difference shot number,the user can browse the summarized video shot.and noises.Second,the summarization method makes the most Dashed lines connect those shots of the same scene-event de- of correlations among multi-view shots. rived from random walks clustering and multi-objective opti-Multi-level summarization can be conveniently achieved by mization.By means of the multi-view storyboard,we further our method.We only need to configure the two parameters A1 introduce an event-board to display the multi-view summary as and A2 in multi-objective optimization.As aforementioned,Al724 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 7, NOVEMBER 2010 TABLE I DETAILS OF MULTI-VIEW VIDEOS AND SUMMARIES with . is the final optimization result to be solved. This integer programming is a typical knapsack problem in combinatorial optimization. We use a pseudo-polynomial time dynamic programming algorithm [52] to solve it. The algorithm runs fast for all of our experiments. VI. EXPERIMENTS We conducted experiments on several multi-view videos, including typical indoor and outdoor environments. The office1, campus, office lobby, and road videos are typical surveillance videos, since surveillance videos are one of the most important multi-view video types. Some multi-view videos are semi-synchronous or nonsynchronous. Most multi-view videos are captured by three or four ordinary cameras with overall 360 degree coverage of the scene. To further verify our method, we also deliberately shoot an outdoor scene by four cameras with only 180 degree coverage. Note that all of the videos are captured using the web cameras or handheld ordinary video cameras by nonspecialists, making some of them unstable and obscure. Moreover, some videos have quite different brightness across multi-views. These issues pose great challenges to the multi-view video summarization. Table I shows the information on experimental data. All experimental results were collected in a PC equipped with P4 3.0-GHZ CPU and 1 GB of memory. The multi-view videos as well as summaries can be found in the demo page http://cs.nju. edu.cn/ywguo/summarization.html. Note that we sacrifice the visual quality of original multiview videos to meet the space limitation of online storage by compressing them with high compression ratios. Display of multi-view summary. We employ here the multiview storyboard to represent the multi-view video summary, as illustrated in Fig. 5. The storyboard naturally reflects spatial and temporal information of the resulting shots as well as their correlations, allowing the user to walk through and analyze the summarized shots in a natural and intuitive way. In the storyboard, each shot in summary is represented by its middle frame. By clicking on the yellow block highlighted with corresponding shot number, the user can browse the summarized video shot. Dashed lines connect those shots of the same scene-event derived from random walks clustering and multi-objective optimization. By means of the multi-view storyboard, we further introduce an event-board to display the multi-view summary as illustrated in Fig. 6. The summarized shots are assembled along the timeline across multi-views. Each shot is represented with a box and the number in box illustrates the view to which the shot belongs. Dashed blue boxes represent those events that are recorded by more than one shot or different views. By clicking on the boxes, the shots can be displayed. Obviously, through the event-board, we can easily generate a single video summary that includes all the summarized shots. We show some examples of the single video summary in our demo page. One of its advantages over storyboard is that it allows the rapid browse of summarized result. If the user needs to browse the summary within limited time, the single summary would be a good choice. A distinct characteristic of the multi-view videos is that the events are captured with overlapping across multiple views. To generate a compact yet highly informative summary, it is usually important to summarize a certain event only in the most informative view, and to avoid repetitive summary. This is especially true if the user only hopes to obtain a short length video summary. Our method realizes this. One example is shown in the summary of multi-view office1 videos. In the 24th shot, the girl who opened the door and went to her cube is only reserved in the second view, although she appeared in four views simultaneously. The man who opened the door in the 24th shot and left the room in the 25th shot is only reserved in the second view. In this sense, our method can be applied to the selection of optimal views. In addition, the method supports summarizing the same event using temporally successive multi-view shots. The event is recorded by the shots describing it with the best views in its duration. On the other hand, it is also reasonable to produce a multiview summary for the same event. For example, for a traffic accident, all videos in multi-views are often crucial in responsibility identification and verification. Our method handles this case successfully. In the multi-view office1 videos, three guys intruded the views and left the room. This action is reserved simultaneously in the 22nd shot of the second view and 40th shot of the fourth view. Other typical examples are the 28th and 35th shots, 30th and 46th shots, and 38th and 49th shots. Such summaries are attributed to two points. First, the shot importance computation algorithm fairly computes the importance of multi-view shots, even in the presence of brightness difference and noises. Second, the summarization method makes the most of correlations among multi-view shots. Multi-level summarization can be conveniently achieved by our method. We only need to configure the two parameters and in multi-objective optimization. As aforementioned