正在加载图片...
the silhouette of the foreground regions corresponding to the group.It is able to count the number of people in the groups as long as their heads appear as part of the outer silhouette of the group;it fails otherwise.The Hydra system was not in- tended to accurately segment the group into individuals nor does it recover depth information.In [32],groups of people were segmented based on the individuals'color distribution where the color distribution of the whole person was rep- resented by a histogram.The color features are represented globally and are not spatially localized;therefore,this ap- proach loses spatial information about the color distributions which is an essential discriminant. 1)Segmentation Using Likelihood Maximization:For Fig.9.Example results for blob segmentation. simplicity and without loss of generality,we focus on thethe two-person case.Given a person model M=Ai where head,torso,or bottom and use them to capture initial sam- i=1:n,the probability of observing color c at location ples S=[SH,Sr,SB}.Fig.8(b)shows initial bands used x,y given blob Ais for initialization where the segmentation result is shown in 8(c),and the detected separators are shown in 8(d). PA(x,v,c)=fA(gA(y)hA(C). Fig.9 illustrates some blob segmentation examples for various people.The segmentation and separator detection is Since our blobs are aligned vertically,we can assume that all robust even under partial occlusion of the target as in the the blobs share the same horizontal density function f(). rightmost result.Also,in some of these examples,the clothes Therefore,given a person model M=fAii=1:n,the are not of a uniform color. probability of (,y c)is C.Segmentation of Multiple People Visual surveillance systems are required to keep track of P代,5cM0)=∑g.AhA(回 C (9) targets as they move through the scene even when they are occluded by or interacting with other people in the scene.It where C is a normalization factor such that C(y)= is highly undesirable to lose track of the targets when they are in a group.It is even more important to track the targets g).The location and the spatial densities when they are interacting than when they are isolated.This gA(y),f()are defined relative to an origin o.If the origin problem is important not only for visual surveillance but also moves to o,o,we can shift the previous probability as for other video analysis applications such as video indexing and video archival and retrieval. P氏x,5cMo%》=-∑gAg-%hA.G. In this section,we show how to segment foreground re- C(y-%o) gions corresponding to a group of people into individuals given the representation for isolated people presented in Sec- This defines the conditional density as a function of the tion IV-B.One drawback of this representation is its inability model origin (o,)i.e.,(o,is a parameter for the to model highly articulated parts such as hands.However, density,and it is the only degree of freedom allowed. since our main objective is to segment people under occlu- Given two people occluding each other with models sion,we are principally concerned with the mass of the body. Mi(x1,v)and M2(x2,42),h=(x1,v,x2,v2)is a Correctly locating the major blobs of the body will provide four-dimensional(4-D)hypothesis for their origins.We will constraints on the location of the hands which could then be call h an arrangement hypothesis.For a foreground region used to locate and segment them.The assumption we make X=(X1,...Xm)representing those two people,each about the scenario is that the targets are visually isolated be- foreground pixel Xi=(ti,Ci)can be classified into one fore occlusion so that we can initialize their models. of the two classes using maximum-likelihood classification Given a foreground region corresponding to a group of (assuming the same prior probability for each person).This people,we search for the arrangement that maximizes the defines a segmentation wn(X)=(wn(X1),...wn(Xm)) likelihood of the appearance of this region given the models that minimizes Bayes error,where that we have built for the individuals.As a result,we obtain a segmentation of the region.The segmentation result is then w(Xi)=k s.t.k=argk:max P(Xi Mi(k,)),=1,2. used to determine the relative depth of each individual by evaluating different hypothesis about the arrangement of the Notice that the segmentation wn()is a function of the people.This allows us to construct a model for occlusion. origin hypothesis h for the two models,i.e.,each choice for The problem of tracking groups of people has been ad- the targets'origins defines a different segmentation of the dressed recently in the literature.The Hydra system [36] foreground region.The best choice for the targets'origins is tracks people in groups by tracking their heads based on the one that maximizes the likelihood of the data over the ELGAMMAL et al:MODELING USING NONPARAMETRIC KERNEL DENSITY ESTIMATION FOR VISUAL SURVEILLANCE 1159Fig. 9. Example results for blob segmentation. head, torso, or bottom and use them to capture initial sam￾ples . Fig. 8(b) shows initial bands used for initialization where the segmentation result is shown in 8(c), and the detected separators are shown in 8(d). Fig. 9 illustrates some blob segmentation examples for various people. The segmentation and separator detection is robust even under partial occlusion of the target as in the rightmost result. Also, in some of these examples, the clothes are not of a uniform color. C. Segmentation of Multiple People Visual surveillance systems are required to keep track of targets as they move through the scene even when they are occluded by or interacting with other people in the scene. It is highly undesirable to lose track of the targets when they are in a group. It is even more important to track the targets when they are interacting than when they are isolated. This problem is important not only for visual surveillance but also for other video analysis applications such as video indexing and video archival and retrieval. In this section, we show how to segment foreground re￾gions corresponding to a group of people into individuals given the representation for isolated people presented in Sec￾tion IV-B. One drawback of this representation is its inability to model highly articulated parts such as hands. However, since our main objective is to segment people under occlu￾sion, we are principally concerned with the mass of the body. Correctly locating the major blobs of the body will provide constraints on the location of the hands which could then be used to locate and segment them. The assumption we make about the scenario is that the targets are visually isolated be￾fore occlusion so that we can initialize their models. Given a foreground region corresponding to a group of people, we search for the arrangement that maximizes the likelihood of the appearance of this region given the models that we have built for the individuals. As a result, we obtain a segmentation of the region. The segmentation result is then used to determine the relative depth of each individual by evaluating different hypothesis about the arrangement of the people. This allows us to construct a model for occlusion. The problem of tracking groups of people has been ad￾dressed recently in the literature. The Hydra system [36] tracks people in groups by tracking their heads based on the silhouette of the foreground regions corresponding to the group. It is able to count the number of people in the groups as long as their heads appear as part of the outer silhouette of the group; it fails otherwise. The Hydra system was not in￾tended to accurately segment the group into individuals nor does it recover depth information. In [32], groups of people were segmented based on the individuals’ color distribution where the color distribution of the whole person was rep￾resented by a histogram. The color features are represented globally and are not spatially localized; therefore, this ap￾proach loses spatial information about the color distributions which is an essential discriminant. 1) Segmentation Using Likelihood Maximization: For simplicity and without loss of generality, we focus on the the two-person case. Given a person model where , the probability of observing color at location given blob is Since our blobs are aligned vertically, we can assume that all the blobs share the same horizontal density function . Therefore, given a person model , the probability of is (9) where is a normalization factor such that . The location and the spatial densities are defined relative to an origin . If the origin moves to , we can shift the previous probability as This defines the conditional density as a function of the model origin , i.e., is a parameter for the density, and it is the only degree of freedom allowed. Given two people occluding each other with models and is a four-dimensional (4-D) hypothesis for their origins. We will call an arrangement hypothesis. For a foreground region representing those two people, each foreground pixel can be classified into one of the two classes using maximum-likelihood classification (assuming the same prior probability for each person). This defines a segmentation that minimizes Bayes error, where s.t. Notice that the segmentation is a function of the origin hypothesis for the two models, i.e., each choice for the targets’ origins defines a different segmentation of the foreground region. The best choice for the targets’ origins is the one that maximizes the likelihood of the data over the ELGAMMAL et al.: MODELING USING NONPARAMETRIC KERNEL DENSITY ESTIMATION FOR VISUAL SURVEILLANCE 1159
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有