networking dataset indicate that many_中国高校课件下载中心

点击下载：MOBILE VISUAL CLOTHING SEARCH（南京大学：曹春）

正在加载图片...

networking dataset indicate that many people just take up- body only region,ROl,.We attempt to segment the person per body fashion photos.The segmented upper body clothing from the background within the bounding box ROIp by using image is divided up into non-overlapping patches and domi- the popular GrabCut algorithm.GrabCut is based on graph nant colour and HoG features are extracted.These sets of de- cuts which have been shown to be reasonably efficient and to scriptors are quantized using vocabulary codebooks and con- have good performance at segmenting humans [13]. catenated to generate a histogram of visual words(HovW). We attempt to eliminate the skin from the segmented per- The HoVW defines the ultimate query which is compared to son by employing an efficient thresholding method.Chai and a database of HoVWs for clothing products from retailers. Ngan [14]reported that skin pixels on the face can be iden- Finally,a similarity measure is applied to determine the most tified by the presence of a certain set of chrominance values similar matches and these are re-ranked based on the GPS lo- in the YCrCb colour space and utilized for face detection pur- cation of the user (obtained from the smart phone)and the poses.Based on this work,we propose a thresholding method location of the retailers,stored in the database. for the purpose of clothing segmentation that takes into ac- It is not practical to store databases of a large number of count other skin pixels on the body.This can be more chal- clothing products from various retailers on the client.Thus, lenging as we find illumination on the face tends to be more a client-server architecture is conceived for our mobile visual uniform.Consider R and Ro as ranges of the respective Cr clothing search. and Cb values that correspond to the colour of skin pixels. Our system is designed to be efficient with short response For a random sample of our social networking dataset,we times and offer an interactive graphical user experience.The found ranges of Rr [140165]and R [105 135]to be client communicates with the server using compressed fea- optimal.In our experiments,these ranges prove to provide a ture information rather than a large query image.This al- good compromise between robustness against different types lows for fast transmission on typical 3G mobile networks and of skin colour and attempting to preserve clothing pixels of has the additional benefit of distributing processing between similar chrominance to the skin.Thus,we have the following client and server so that the server may handle more simulta- equation: neous search requests.Our contributions are described in the following sections. 1 skin(x, ifCr(x,y)∈Rr nCb(x,y)∈Rb 0 otherwise (1) 3.CLOTHING SEGMENTATION where z and y are pixels in ROIp.Morphological opening is Clothing segmentation is a challenging field of research then performed on the binary mask skin(,y)to reduce noise. which can benefit numerous fields including human detec- Finally,the segmented full body clothing is cropped to tion [10,recognition for re-identification [2],pose estima- the upper body region ROL and normalised in size.The area of segmented clothing is compared to the area of the ROL. tion [11],and image retrieval.Although the fast segmenta- tion of a person's clothing in a photo appears effortless for a If the percentage of clothing pixels is less than an empiri- cally defined threshold Ta,we perform the next stage (feature human to perform,it remains challenging for a machine due extraction)on the pre-processed image rather than the seg- to the wide diversity and ever-changing nature of fashion,un- controlled scene lighting,dynamic backgrounds,variation in mented image.This final step can increase overall robustness of the system to the special case where the clothing and either human pose,and self and third-party occlusions.Addition- ally,difficult sub-problems such as face detection are usually skin or background are of a very similar colour.The resulting involved to initialize the segmentation procedure. upper body clothing image is denoted I The main objectives of this stage of the system are to au- tomatically crop the image to the region of interest(the re- 4.CLOTHING FEATURE EXTRACTION gions of the body below the head where clothes are typically located)and to eliminate both the background and skin from Colour is one of the most distinguishing visual features of the image to constrain the regions where clothing features will clothing.We propose an efficient method to describe dom- be extracted from. inant colours in the segmented clothing image based on the We propose converting the query image I to the more MPEG-7 descriptor [15],and integrate this feature with the perceptually relevant YCrCb colour space and the corre- HoG texture/shape descriptor. sponding illumination channel is normalized to help allevi- The upper body clothing image Ie is divided up into a reg- ate,to some extent,the non-uniform effects of uncontrolled ular grid of 5 x 5 cells (depicted in the third column of Fig- lighting. ure 3).We denote each column of the grid as ROI where The Viola-Jones face detector is used to estimate the face k 1...5 and we propose providing robustness to lay- size and location which are fed as parameters to initialise a ered clothing(e.g.jacket and top)by computing the dominate human detector based on [12].The detector yields a ROI for colours for each column and concatenating. the full body pose excluding head,ROIp,and a smaller upper A 3D histogram is computed on Ie E ROI in HSV colournetworking dataset indicate that many people just take upper body fashion photos. The segmented upper body clothing image is divided up into non-overlapping patches and dominant colour and HoG features are extracted. These sets of descriptors are quantized using vocabulary codebooks and concatenated to generate a histogram of visual words (HoVW). The HoVW defines the ultimate query which is compared to a database of HoVWs for clothing products from retailers. Finally, a similarity measure is applied to determine the most similar matches and these are re-ranked based on the GPS location of the user (obtained from the smart phone) and the location of the retailers, stored in the database. It is not practical to store databases of a large number of clothing products from various retailers on the client. Thus, a client-server architecture is conceived for our mobile visual clothing search. Our system is designed to be efficient with short response times and offer an interactive graphical user experience. The client communicates with the server using compressed feature information rather than a large query image. This allows for fast transmission on typical 3G mobile networks and has the additional benefit of distributing processing between client and server so that the server may handle more simultaneous search requests. Our contributions are described in the following sections. 3. CLOTHING SEGMENTATION Clothing segmentation is a challenging field of research which can benefit numerous fields including human detection [10], recognition for re-identification [2], pose estimation [11], and image retrieval. Although the fast segmentation of a person’s clothing in a photo appears effortless for a human to perform, it remains challenging for a machine due to the wide diversity and ever-changing nature of fashion, uncontrolled scene lighting, dynamic backgrounds, variation in human pose, and self and third-party occlusions. Additionally, difficult sub-problems such as face detection are usually involved to initialize the segmentation procedure. The main objectives of this stage of the system are to automatically crop the image to the region of interest (the regions of the body below the head where clothes are typically located) and to eliminate both the background and skin from the image to constrain the regions where clothing features will be extracted from. We propose converting the query image Iq to the more perceptually relevant YCrCb colour space and the corresponding illumination channel is normalized to help alleviate, to some extent, the non-uniform effects of uncontrolled lighting. The Viola-Jones face detector is used to estimate the face size and location which are fed as parameters to initialise a human detector based on [12]. The detector yields a ROI for the full body pose excluding head, ROIp, and a smaller upper body only region, ROIu. We attempt to segment the person from the background within the bounding box ROIp by using the popular GrabCut algorithm. GrabCut is based on graph cuts which have been shown to be reasonably efficient and to have good performance at segmenting humans [13]. We attempt to eliminate the skin from the segmented person by employing an efficient thresholding method. Chai and Ngan [14] reported that skin pixels on the face can be identified by the presence of a certain set of chrominance values in the YCrCb colour space and utilized for face detection purposes. Based on this work, we propose a thresholding method for the purpose of clothing segmentation that takes into account other skin pixels on the body. This can be more challenging as we find illumination on the face tends to be more uniform. Consider Rr and Rb as ranges of the respective Cr and Cb values that correspond to the colour of skin pixels. For a random sample of our social networking dataset, we found ranges of Rr = [140 165] and Rb = [105 135] to be optimal. In our experiments, these ranges prove to provide a good compromise between robustness against different types of skin colour and attempting to preserve clothing pixels of similar chrominance to the skin. Thus, we have the following equation: skin(x, y) = 1 if Cr(x, y) ∈ Rr ∩ Cb(x, y) ∈ Rb 0 otherwise (1) where x and y are pixels in ROIp. Morphological opening is then performed on the binary mask skin(x, y) to reduce noise. Finally, the segmented full body clothing is cropped to the upper body region ROIu and normalised in size. The area of segmented clothing is compared to the area of the ROIu. If the percentage of clothing pixels is less than an empirically defined threshold τa, we perform the next stage (feature extraction) on the pre-processed image rather than the segmented image. This final step can increase overall robustness of the system to the special case where the clothing and either skin or background are of a very similar colour. The resulting upper body clothing image is denoted Ic. 4. CLOTHING FEATURE EXTRACTION Colour is one of the most distinguishing visual features of clothing. We propose an efficient method to describe dominant colours in the segmented clothing image based on the MPEG-7 descriptor [15], and integrate this feature with the HoG texture/shape descriptor. The upper body clothing image Ic is divided up into a regular grid of 5 × 5 cells (depicted in the third column of Figure 3). We denote each column of the grid as ROIk c where k = 1 . . . 5 and we propose providing robustness to layered clothing (e.g. jacket and top) by computing the dominate colours for each column and concatenating. A 3D histogram is computed on Ic ∈ ROIk c in HSV colour

<<向上翻页向下翻页>>

点击下载：MOBILE VISUAL CLOTHING SEARCH（南京大学：曹春）