426 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,VOL.14,NO.2,MARCH/APRIL 2008 Mesh-Guided Optimized Retexturing for Image and Video Yanwen Guo,Hangiu Sun,Member,/EEE,Qunsheng Peng,and Zhongding Jiang,Member,/EEE Abstract-This paper presents a novel approach for replacing textures of specified regions in the input image and video using stretch- based mesh optimization.The retexturing results have the similar distortion and shading effect conforming to the unknown underlying geometry and lighting conditions.For replacing textures in a single image,two important steps are developed:The stretch-based mesh parameterization incorporating the recovered normal information is deduced to imitate perspective distortion of the region of interest:the Poisson-based refinement process is exploited to account for texture distortion at fine scale.The luminance of the input image is preserved through color transfer in YCbCr color space.Our approach is independent of the replaced textures.Once the input image is processed,any new textures can be applied to efficiently generate the retexturing results.For video retexturing,we propose key-frame- based texture replacement extended and generalized from the image retexturing.Our approach repeatedly propagates the replacement results of key frames to the rest of the frames.We develop the local motion optimization scheme to deal with the inaccuracies and errors of robust optical flow when tracking moving objects.Visibility shifting and texture drifting are effectively alleviated using graphcut segmentation algorithm and the global optimization to smooth trajectories of the tracked points over temporal domain.Our experimental results showed that the proposed approach can generate visually pleasing results for retextured images and video. Index Terms-Texture replacement,parameterization,Poisson equation,graphcut segmentation. 1 INTRODUCTION ap tmmuty por the onrihuigu community.For the second issue,relighting techniques can common task for creating visual effects.This process is be adopted to change intensities of pixels of the new texture commonly referred to as retexturing or texture replacement. when properties of light sources and surface appearances are The key issue of texture replacement is how to preserve the known beforehand.However,accurate recovery of these original shading effect and texture distortion without properties from a real-world image is more difficult than knowing the underlying geometry and lighting conditions. geometry recovery.Hence,relighting techniques are imprac- Retexturing objects of images and video clips has wide tical for texture replacement. applications in digital entertainment,virtual exhibition,art, For generating plausible visual effects,full recovery of and industry design. 3D geometry and lighting conditions can be relaxed in For retexturing image,two fundamental issues must be practice.Fang and Hart proposed one normal-guided texture addressed:how to deform the new texture for conforming to synthesis method that produced compelling replacement scene geometry and how to keep shading effect encoded in effects [1].This method works well when a 3D surface is the original image for consistent lighting conditions.One untextured,nearly diffuse,and illuminated by a single possible solution to the first issue is recovering 3D surface directional light source.One limitation of this texture geometry using shape-from-shading techniques,then estab- synthesis approach is that the synthesis process must be lishing parameterization between the surface and the texture. repeated when a new texture is applied.For regular/near- Unfortunately,shape-from-shading techniques for a single regular textures that are popular in the real world,Liu et al. image cannot accurately recover the 3D geometry with high suggested an interactive scheme to extract the deformation efficiency.Even with multiple images,full recovery of field of texture image with respect to the original sample [2]. The extraction of lighting information can also benefit from Y.Guo is with the National Laboratory for Novel Software Technology, this restriction through a Markov process [3].Nevertheless, Nanjing University,Nanjing 210093,P.R.China. this approach usually needs tedious user interaction with E-mail:ywguo@nju.edu.cn. high accuracy. .H.Sun is with the Department of Computer Science and Engineering,the Chinese University of Hong Kong,Shatin,N.T.,Hong Kong. Video clip is an image sequence in time domain,which E-mail:hangiu@cse.cuhk.edu.hk. usually contains dynamic objects and lighting changes. Q.Peng is with the State Key Lab of CAD&CG,Zhejiang University, Retexturing video is more challenging than retexturing Hangzhou 310058,P.R.China.E-mail:peng@cad.zju.edu.cn. .Z.Jiang is with the Software School,Fudan UIniversity,Shanghai,201203, image due to these dynamic phenomena.In particular, P.R.China.E-mail:zdjiang@fudan.edu.cn. keeping the texture coherence over time is more challen- Manuscript received 26 Dec.2006;revised 11 July 2007;accepted 27 Aug. ging.The temporal coherence demands that the new texture 2007;published online 17 Sept.2007. should be perceptually fixed on 3D surface when an object Recommended for acceptance by J.Dorsey. For information on obtaining reprints of this article,please send e-mail to: or camera moves.For achieving temporal coherence,the tocg@computer.org,and reference IEEECS Log Number TVCG-0231-1206. key-frame-based methods [4],[5]consist of two steps.First, Digital Object Identifier no.10.1109/TVCG.2007.70438. a few of video frames are selected as key frames on which 1077-2626/08/25.0020081EEE Published by the IEEE Computer Society
Mesh-Guided Optimized Retexturing for Image and Video Yanwen Guo, Hanqiu Sun, Member, IEEE, Qunsheng Peng, and Zhongding Jiang, Member, IEEE Abstract—This paper presents a novel approach for replacing textures of specified regions in the input image and video using stretchbased mesh optimization. The retexturing results have the similar distortion and shading effect conforming to the unknown underlying geometry and lighting conditions. For replacing textures in a single image, two important steps are developed: The stretch-based mesh parameterization incorporating the recovered normal information is deduced to imitate perspective distortion of the region of interest; the Poisson-based refinement process is exploited to account for texture distortion at fine scale. The luminance of the input image is preserved through color transfer in YCbCr color space. Our approach is independent of the replaced textures. Once the input image is processed, any new textures can be applied to efficiently generate the retexturing results. For video retexturing, we propose key-framebased texture replacement extended and generalized from the image retexturing. Our approach repeatedly propagates the replacement results of key frames to the rest of the frames. We develop the local motion optimization scheme to deal with the inaccuracies and errors of robust optical flow when tracking moving objects. Visibility shifting and texture drifting are effectively alleviated using graphcut segmentation algorithm and the global optimization to smooth trajectories of the tracked points over temporal domain. Our experimental results showed that the proposed approach can generate visually pleasing results for retextured images and video. Index Terms—Texture replacement, parameterization, Poisson equation, graphcut segmentation. Ç 1 INTRODUCTION EDITING contents of photos/footages by changing the appearances of some regions with new textures is a common task for creating visual effects. This process is commonly referred to as retexturing or texture replacement. The key issue of texture replacement is how to preserve the original shading effect and texture distortion without knowing the underlying geometry and lighting conditions. Retexturing objects of images and video clips has wide applications in digital entertainment, virtual exhibition, art, and industry design. For retexturing image, two fundamental issues must be addressed: how to deform the new texture for conforming to scene geometry and how to keep shading effect encoded in the original image for consistent lighting conditions. One possible solution to the first issue is recovering 3D surface geometry using shape-from-shading techniques, then establishing parameterization between the surface and the texture. Unfortunately, shape-from-shading techniques for a single image cannot accurately recover the 3D geometry with high efficiency. Even with multiple images, full recovery of 3D geometry is still an open problem in the computer vision community. For the second issue, relighting techniques can be adopted to change intensities of pixels of the new texture when properties of light sources and surface appearances are known beforehand. However, accurate recovery of these properties from a real-world image is more difficult than geometry recovery. Hence, relighting techniques are impractical for texture replacement. For generating plausible visual effects, full recovery of 3D geometry and lighting conditions can be relaxed in practice. Fang and Hart proposed one normal-guided texture synthesis method that produced compelling replacement effects [1]. This method works well when a 3D surface is untextured, nearly diffuse, and illuminated by a single directional light source. One limitation of this texture synthesis approach is that the synthesis process must be repeated when a new texture is applied. For regular/nearregular textures that are popular in the real world, Liu et al. suggested an interactive scheme to extract the deformation field of texture image with respect to the original sample [2]. The extraction of lighting information can also benefit from this restriction through a Markov process [3]. Nevertheless, this approach usually needs tedious user interaction with high accuracy. Video clip is an image sequence in time domain, which usually contains dynamic objects and lighting changes. Retexturing video is more challenging than retexturing image due to these dynamic phenomena. In particular, keeping the texture coherence over time is more challenging. The temporal coherence demands that the new texture should be perceptually fixed on 3D surface when an object or camera moves. For achieving temporal coherence, the key-frame-based methods [4], [5] consist of two steps. First, a few of video frames are selected as key frames on which 426 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 2, MARCH/APRIL 2008 . Y. Guo is with the National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, P.R. China. E-mail: ywguo@nju.edu.cn. . H. Sun is with the Department of Computer Science and Engineering, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong. E-mail: hanqiu@cse.cuhk.edu.hk. . Q. Peng is with the State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, P.R. China. E-mail: peng@cad.zju.edu.cn. . Z. Jiang is with the Software School, Fudan University, Shanghai, 201203, P.R. China. E-mail: zdjiang@fudan.edu.cn. Manuscript received 26 Dec. 2006; revised 11 July 2007; accepted 27 Aug. 2007; published online 17 Sept. 2007. Recommended for acceptance by J. Dorsey. For information on obtaining reprints of this article, please send e-mail to: tvcg@computer.org, and reference IEEECS Log Number TVCG-0231-1206. Digital Object Identifier no. 10.1109/TVCG.2007.70438. 1077-2626/08/$25.00 2008 IEEE Published by the IEEE Computer Society
GUO ET AL.:MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 427 texture replacements are conducted [4],[5].Second,the Since tracking objects throughout video sequence is one generated results are propagated to the rest frames.These important step of our approach,it is briefly reviewed. methods either need cumbersome interaction to locate the region of interest(ROD)frame by frame [4]or utilize special 2.1 Image Texture Replacement color-coded pattern as input [5]. The pioneering work on texture replacement dealt with In this paper,we propose one novel approach for extracting lighting map from given image [3].Based on optimized retexturing on image and video.For image certain lighting distribution models,Tsin et al.introduced texture replacement,we formulate texture distortion as one one Bayesian framework for near-regular texture,which stretch-based parameterization.The ROI is represented as a relies on the color observation at each pixel [3].Oh et al. feature-mesh coupled with normal field.The corresponding assumed that large-scale luminance variations are due to mesh of the feature-mesh is computed in texture space geometry and lighting [6]and presented an algorithm for during parameterization.For simulating the effect of texture decoupling texture luminance from image by applying an deformation at fine scale,one Poisson-based refinement improved bilateral filter.Currently,accurate recovery of process is developed.Based on our image-retexturing lighting from natural image is still a challenging problem. scheme,we design one key-frame-based video retexturing Assuming that object appearance satisfies the Lamber- tian reflectance model,Textureshop [1]recovered normal approach similar to RotoTexture [4].Once replacing field of specified region using a simplified shape-from- textures of the specified regions of key frames,these shading algorithm [7].One propagation rule of adjacent generated effects are iteratively transferred to the rest texture coordinates is deduced to guide a normal-related frames.For achieving temporal coherence,mesh points of synthesis process.The limitation of employing texture key frame serve as features that are tracked and further synthesis is that the synthesis process must be re-executed optimized using motion analysis as well over the whole when a new texture is applied.The work developed by image sequence through temporal smoothing.Graphcut Zelinka et al.[8]is an improvement over Textureshop.It segmentation is adopted for handling object occlusions by reduces user interaction of object specification by employ- extracting new appearing parts in each frame. ing efficient object cutout algorithm [9].In addition,jump Our optimized retexturing approach has the following map-based synthesis [10]is adopted to speed up the new features: computation process.Instead of texture synthesis,our method belongs to the texture mapping approach.Texture ● A two-step mesh guided process for image texture replacement can be efficiently carried out after the mesh replacement.Coupled with recovered normal field, parameterization is completed. visually pleasing deformation effect of the replaced The method presented in [4]warps an entire texture onto texture is produced by performing stretch-based photographed surface.It minimizes one energy function of mesh parameterization.Furthermore,a Poisson- a spring network with known evenly distributed rectilinear based refinement process is used to improve the grid in texture space.In most cases,the specified region of effect and enhance the efficiency. the image is a usually irregular grid.Hence,it is difficult for Creation of special retexturing effects.Based on mesh this approach to accurately control the mapping position of parameterization,we can easily generate a replace- the replaced texture. ment effect with progressively variant texton scales. For extracting deformation fields of textures in natural In addition,texture discontinuities can be realisti- images,Liu et al.introduced one user-assisted adjustment cally simulated in self-occlusion regions,which are scheme on the regular lattice of real texture [2].A bijective usually difficult to produce for most previous mapping between the regular lattice and its deformed approaches. shape on the surface image is obtained.Any new texture An optimized framework of video retexturing.We extend can thus be replaced onto the source image by exerting the and generalize our image retexturing approach to video.Rather than presegmenting the ROI through- corresponding mapping.Since this method often requires elaborate user interaction,it is more suitable for regular/ out the whole image sequence,our approach only near-regular textures. needs to select a few of key frames.The generated results are optimally propagated to the rest frames. Besides image texture replacement,recent research Texture drifting and visibility shifting are also demonstrated that material properties of objects can be changed in image space [11].Exploiting the fact that human tackled effectively. vision is surprisingly tolerant of certain physical inaccura- The rest of the paper is organized as follows:The related cies,Khan et al.reconstructed depth map of the concerned work is described in Section 2.Our optimized retexturing object with other environment parameters [11]and realized approach for image is presented in Section 3.In Section 4, compelling material editing effects using complex relight- the image retexturing approach is extended and generalized ing techniques to video.The experimental results are presented in Section 5. Finally,we draw conclusions and point out the future work. 2.2 Video Texture Replacement Rototexture [4]generalized the method of Textureshop [1] 2 RELATED WORK to video.It provides two means of texturing a raw video sequence,namely,texture mapping and texture synthesis. This paper is made possible by many inspirations from The texture mapping method uses one nonlinear optimiza- previous work on image and video texture replacement.tion of a spring model to control the behavior of texture
texture replacements are conducted [4], [5]. Second, the generated results are propagated to the rest frames. These methods either need cumbersome interaction to locate the region of interest (ROI) frame by frame [4] or utilize special color-coded pattern as input [5]. In this paper, we propose one novel approach for optimized retexturing on image and video. For image texture replacement, we formulate texture distortion as one stretch-based parameterization. The ROI is represented as a feature-mesh coupled with normal field. The corresponding mesh of the feature-mesh is computed in texture space during parameterization. For simulating the effect of texture deformation at fine scale, one Poisson-based refinement process is developed. Based on our image-retexturing scheme, we design one key-frame-based video retexturing approach similar to RotoTexture [4]. Once replacing textures of the specified regions of key frames, these generated effects are iteratively transferred to the rest frames. For achieving temporal coherence, mesh points of key frame serve as features that are tracked and further optimized using motion analysis as well over the whole image sequence through temporal smoothing. Graphcut segmentation is adopted for handling object occlusions by extracting new appearing parts in each frame. Our optimized retexturing approach has the following new features: . A two-step mesh guided process for image texture replacement. Coupled with recovered normal field, visually pleasing deformation effect of the replaced texture is produced by performing stretch-based mesh parameterization. Furthermore, a Poissonbased refinement process is used to improve the effect and enhance the efficiency. . Creation of special retexturing effects. Based on mesh parameterization, we can easily generate a replacement effect with progressively variant texton scales. In addition, texture discontinuities can be realistically simulated in self-occlusion regions, which are usually difficult to produce for most previous approaches. . An optimized framework of video retexturing. We extend and generalize our image retexturing approach to video. Rather than presegmenting the ROI throughout the whole image sequence, our approach only needs to select a few of key frames. The generated results are optimally propagated to the rest frames. Texture drifting and visibility shifting are also tackled effectively. The rest of the paper is organized as follows: The related work is described in Section 2. Our optimized retexturing approach for image is presented in Section 3. In Section 4, the image retexturing approach is extended and generalized to video. The experimental results are presented in Section 5. Finally, we draw conclusions and point out the future work. 2 RELATED WORK This paper is made possible by many inspirations from previous work on image and video texture replacement. Since tracking objects throughout video sequence is one important step of our approach, it is briefly reviewed. 2.1 Image Texture Replacement The pioneering work on texture replacement dealt with extracting lighting map from given image [3]. Based on certain lighting distribution models, Tsin et al. introduced one Bayesian framework for near-regular texture, which relies on the color observation at each pixel [3]. Oh et al. assumed that large-scale luminance variations are due to geometry and lighting [6] and presented an algorithm for decoupling texture luminance from image by applying an improved bilateral filter. Currently, accurate recovery of lighting from natural image is still a challenging problem. Assuming that object appearance satisfies the Lambertian reflectance model, Textureshop [1] recovered normal field of specified region using a simplified shape-fromshading algorithm [7]. One propagation rule of adjacent texture coordinates is deduced to guide a normal-related synthesis process. The limitation of employing texture synthesis is that the synthesis process must be re-executed when a new texture is applied. The work developed by Zelinka et al. [8] is an improvement over Textureshop. It reduces user interaction of object specification by employing efficient object cutout algorithm [9]. In addition, jump map-based synthesis [10] is adopted to speed up the computation process. Instead of texture synthesis, our method belongs to the texture mapping approach. Texture replacement can be efficiently carried out after the mesh parameterization is completed. The method presented in [4] warps an entire texture onto photographed surface. It minimizes one energy function of a spring network with known evenly distributed rectilinear grid in texture space. In most cases, the specified region of the image is a usually irregular grid. Hence, it is difficult for this approach to accurately control the mapping position of the replaced texture. For extracting deformation fields of textures in natural images, Liu et al. introduced one user-assisted adjustment scheme on the regular lattice of real texture [2]. A bijective mapping between the regular lattice and its deformed shape on the surface image is obtained. Any new texture can thus be replaced onto the source image by exerting the corresponding mapping. Since this method often requires elaborate user interaction, it is more suitable for regular/ near-regular textures. Besides image texture replacement, recent research demonstrated that material properties of objects can be changed in image space [11]. Exploiting the fact that human vision is surprisingly tolerant of certain physical inaccuracies, Khan et al. reconstructed depth map of the concerned object with other environment parameters [11] and realized compelling material editing effects using complex relighting techniques. 2.2 Video Texture Replacement Rototexture [4] generalized the method of Textureshop [1] to video. It provides two means of texturing a raw video sequence, namely, texture mapping and texture synthesis. The texture mapping method uses one nonlinear optimization of a spring model to control the behavior of texture GUO ET AL.: MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 427
428 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,VOL.14,NO.2,MARCH/APRIL 2008 image that is deformed to match the evolvement of normal field throughout the video.For the synthesis method,the minimum advection tree is constructed to deal with the visibility issue due to dynamic motions of moving objects. Such tree determines the initial frame for each image cluster and the advection for clusters among frames.The main challenge of video texture replacement is how to stably track the moving objects and their interior regions.At present,accurately tracking moving objects of dynamic video is an open problem.The replaced textures drift in the experimental results [4]. (a) 6 For stably tracking moving objects and their interior Fig.1.Mesh generation.(a)The input image.(b)The generated mesh parts,Scholz and Magnor presented one system of video The yellow dots are detected by Canny operator,whereas the green texture replacement [5]using color-coded patterns.The ones are added automatically with a distance threshold to maintain deformation process of the texture throughout video clip mesh uniformity. can be accurately extracted.Since the deformation is accurate,compelling results can be achieved.However, into several nearly developable parts,each of them is videos are usually captured by off-the-shelf camera without handled using the texture replacement.Based on this the color-coded patterns,the system is not applicable to assumption,the basic idea is converting reconstruction of them.Our approach is designed for those videos in which the underlying 3D surface of ROI into computation of its the special patterns are unavailable. corresponding mesh in texture space.Using projective Recently,White and Forsyth proposed another video geometry,we further formulate the retexturing task as a retexturing method [12].At coarse scale,old texture is stretch-based mesh parameterization problem.After the replaced with a new one by tracking deforming surface in parameterization is completed,the result is further refined 2D.At fine scale,local irradiance is estimated to preserve with one Poisson-based refinement process the structure information in real lighting environment. Since local irradiance estimation is difficult and unreliable, 3.1 Mesh Generation the approach is limited to screen printing with a finite We first generate an initial mesh on the concerned region number of colors.Our method can be applied to video and make its shape consistent with the underlying sequences with rich color details. geometry of this region.Mesh generation for image was 2.3 Object Tracking addressed in motion compensation for video compression Object tracking is the process of locating moving object [20].The content-based mesh was computed by extracting a throughout the whole image sequence taken by video set of feature points followed by Delaunay triangulation. camera.For general object motion,the nonparametric Our algorithm as follows shares the same idea in [20]. algorithm such as optical flow [13]can be applied.When First,the concerned region is specified interactively by the motion can be described using simple models,methods outlining the boundary of ROI using snakes.For reducing based on feature points and parametric models are more user intervention,our approach supports extracting ROI preferable [14].For instance,Jin et al.presented one using the up-to-date segmentation techniques [21],[9]. combined model of geometry and photometry to track Second,we employ the edge detection operator,for features and detect outliers in video [15].Contour tracking example,Canny operator,for extracting some feature points can be more effective for nonrigid objects than isolated inside ROI.For keeping uniformity of the points,some point tracking.Agarwala et al.[16]and Wang et al.[17] auxiliary ones are usually added.Finally,the constrained introduced frameworks for tracking contours of moving Delaunay triangulation algorithm is adopted to generate a objects in video sequences,which are based on spatiotem- feature-consistent mesh.Fig.1 shows an example of mesh poral optimization and user assistance.Chuang et al. generation. described one method of accurately tracking specified trimap for video matting [18].A trimap is one labeling 3.2 Mesh parameterization image in which 0 stands for background,1 stands for Let M denote the generated mesh of input image.M is one foreground,and the rest is the unknown region to be 2D mesh that represents the 2D projection of 3D surface of labeled.Stably tracking of the trimap is carried out based on ROI.If normal vector of every mesh point of M is recovered, robust optical flow algorithm [19]. the normal field of M will encode the geometry shape of the underlying surface.For obtaining the distortion effect of the 3 IMAGE RETEXTURING new texture,it is feasible to first parameterize M,then map the new texture onto ROI.Since M is one 2D mesh, The key issue of image texture replacement is how to parameterizing M onto the texture space is 2D-to-2D,which preserve the distortion effect of texture,as well as the can be computed using the geometry information of M. shading effect encoded in the original image.Texture Let M'be the parameterized mesh in texture space. distortion is mainly caused by the undulation of underlying Theoretically,M'can be completely determined by lengths surface of object in the given image.Assume that the of all edges and topology of M.For avoiding artifacts,the surface where texture replacement is performed on,is topology of M'should be the same as that of M.The length nearly developable.Otherwise,the surface can be divided of each edge in M'should be ideally equal to the 3D length
image that is deformed to match the evolvement of normal field throughout the video. For the synthesis method, the minimum advection tree is constructed to deal with the visibility issue due to dynamic motions of moving objects. Such tree determines the initial frame for each image cluster and the advection for clusters among frames. The main challenge of video texture replacement is how to stably track the moving objects and their interior regions. At present, accurately tracking moving objects of dynamic video is an open problem. The replaced textures drift in the experimental results [4]. For stably tracking moving objects and their interior parts, Scholz and Magnor presented one system of video texture replacement [5] using color-coded patterns. The deformation process of the texture throughout video clip can be accurately extracted. Since the deformation is accurate, compelling results can be achieved. However, videos are usually captured by off-the-shelf camera without the color-coded patterns, the system is not applicable to them. Our approach is designed for those videos in which the special patterns are unavailable. Recently, White and Forsyth proposed another video retexturing method [12]. At coarse scale, old texture is replaced with a new one by tracking deforming surface in 2D. At fine scale, local irradiance is estimated to preserve the structure information in real lighting environment. Since local irradiance estimation is difficult and unreliable, the approach is limited to screen printing with a finite number of colors. Our method can be applied to video sequences with rich color details. 2.3 Object Tracking Object tracking is the process of locating moving object throughout the whole image sequence taken by video camera. For general object motion, the nonparametric algorithm such as optical flow [13] can be applied. When the motion can be described using simple models, methods based on feature points and parametric models are more preferable [14]. For instance, Jin et al. presented one combined model of geometry and photometry to track features and detect outliers in video [15]. Contour tracking can be more effective for nonrigid objects than isolated point tracking. Agarwala et al. [16] and Wang et al. [17] introduced frameworks for tracking contours of moving objects in video sequences, which are based on spatiotemporal optimization and user assistance. Chuang et al. described one method of accurately tracking specified trimap for video matting [18]. A trimap is one labeling image in which 0 stands for background, 1 stands for foreground, and the rest is the unknown region to be labeled. Stably tracking of the trimap is carried out based on robust optical flow algorithm [19]. 3 IMAGE RETEXTURING The key issue of image texture replacement is how to preserve the distortion effect of texture, as well as the shading effect encoded in the original image. Texture distortion is mainly caused by the undulation of underlying surface of object in the given image. Assume that the surface where texture replacement is performed on, is nearly developable. Otherwise, the surface can be divided into several nearly developable parts, each of them is handled using the texture replacement. Based on this assumption, the basic idea is converting reconstruction of the underlying 3D surface of ROI into computation of its corresponding mesh in texture space. Using projective geometry, we further formulate the retexturing task as a stretch-based mesh parameterization problem. After the parameterization is completed, the result is further refined with one Poisson-based refinement process. 3.1 Mesh Generation We first generate an initial mesh on the concerned region and make its shape consistent with the underlying geometry of this region. Mesh generation for image was addressed in motion compensation for video compression [20]. The content-based mesh was computed by extracting a set of feature points followed by Delaunay triangulation. Our algorithm as follows shares the same idea in [20]. First, the concerned region is specified interactively by outlining the boundary of ROI using snakes. For reducing user intervention, our approach supports extracting ROI using the up-to-date segmentation techniques [21], [9]. Second, we employ the edge detection operator, for example, Canny operator, for extracting some feature points inside ROI. For keeping uniformity of the points, some auxiliary ones are usually added. Finally, the constrained Delaunay triangulation algorithm is adopted to generate a feature-consistent mesh. Fig. 1 shows an example of mesh generation. 3.2 Mesh parameterization Let M denote the generated mesh of input image. M is one 2D mesh that represents the 2D projection of 3D surface of ROI. If normal vector of every mesh point of M is recovered, the normal field of M will encode the geometry shape of the underlying surface. For obtaining the distortion effect of the new texture, it is feasible to first parameterize M, then map the new texture onto ROI. Since M is one 2D mesh, parameterizing M onto the texture space is 2D-to-2D, which can be computed using the geometry information of M. Let M0 be the parameterized mesh in texture space. Theoretically, M0 can be completely determined by lengths of all edges and topology of M. For avoiding artifacts, the topology of M0 should be the same as that of M. The length of each edge in M0 should be ideally equal to the 3D length 428 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 2, MARCH/APRIL 2008 Fig. 1. Mesh generation. (a) The input image. (b) The generated mesh. The yellow dots are detected by Canny operator, whereas the green ones are added automatically with a distance threshold to maintain mesh uniformity
GUO ET AL.:MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 429 Parameterization Surface Texture mapping VA Image space Texture space A d B Fig.3.In virtue of the normal field,simulating texture distortion is Image Plane converted into one 2D-to-2D parameterization,after which texture mapping is then applied. Camera The 3D length of each edge in M can be figured out using the above equation.Although some approximations Fig.2.Calculation of the 3D lengthFD for the observed edge e(AB). are made,our experimental results show that the method V:view direction,N:the projection of edge e(FD)'s normal vector on the can generate visually pleasing effects. plane OAB,CD//AB,and CD LCE. 3.2.2 The Stretch-Based Parameterization of its corresponding edge in M.The 3D length reflects the With the calculated edge lengths,M'can be obtained by edge length on the underlying surface.In the following one stretch-based parameterization method. sections,we first introduce our method of computing the Mesh parameterization has been extensively studied in 3D length of each edge in M,then present the stretch-based computer graphics [22],[23],[24],[25],[26],[271,[28],and parameterization scheme for computing M'in texture [29].These methods take different metric criterions as the space. objective of energy minimization process.Among them, stretch-based methods work well when reducing the global 3.2.1 Computing Length of 3D Edge mesh distortion [22],[25],[29].Our concern is computing Since normal field encodes the 3D geometry information of one 2D-to-2D parameterization,which is different from the M,we first recover one rough normal field of M using the aforementioned 3D-to-2D process(Fig.3). approach in [1].The shape-from-shading algorithm in [1] We first describe some related notations.Let {P= has a linear complexity when recovering the normal field (i,)i=1,...,n}denote the nodes of M.Qi =(ui,vi)li= for diffuse surface.It is easy to implement and quite 1,...,n}represent the corresponding nodes of M'to be effective.For more details,please refer to [1]and [8]. solved in texture space.M'has the same topology as M,Qi With the recovered normal vector,3D length of each corresponds to Pi,and edge e(QiQj)corresponds to e(PP) edge in M can be calculated,as illustrated in Fig.2.The 3D length lij of e(Q:Qj)is obtained using the aforemen- Suppose the observed length of edge e(AB)in the input tioned method. image is d.Its corresponding 3D length is the distance M'is completely characterized by the lengths of its between F and D on the underlying surface.The length is edges.As each edge of M'has been figured out,M'can be denoted as FDll. computed by minimizing the following energy function: Let N be the projection of normal vector of edge e(FD) on the plane OAB.It can be calculated first by averaging the =∑(Io-0-)/ (4) recovered normal vectors of A and B,then projecting the (i.j)Eedges average vector onto plane OAB.From the geometry relationship in Fig.2,we derive the length of ED: Exploiting the symmetry of the above equation,energy gradients on point Q;are IEDI=CDI/(V·N)=d/V·N): (1) where V is the view direction of camera,both V and N have B=8∑(le-0P-)°4-西/g, (i.j)Eedges been normalized.With an approximation FDEDIl,the 3D length of edge e(AB)can be expressed as follows: l=IFDI=d/(V·N) (2) 5=8∑(IlQ-Q,P-)°-/o (ij)Eedges In the above equation,d'is determined by the observed When M is dense,directly solving the above equations length dofe(AB)and scene depth ofe(FD).In most cases,the may cause adjacent triangles of M'to flip over,leading to object is relatively far from camera.It is reasonable to assume invalid topology.This is caused by inverting the orientation that the scene depths for all edges in M are close enough with of the three points of triangle.To tackle this issue,we revise small variation.Consequently,the homogeneous scale factor the energy function by penalizing the orientations of of every edge of M can be eliminated.The surface length of triangles using the sgn function [30]. edge e(AB)can then be approximated as Assume that the adjacent triangles incident upon edge FDABI/(V.N)=d/(V.N). (3) e(QiQj)are To =T(QiQRQj)and To2 =T(QiQjQ2). Their corresponding triangles in the input image are
of its corresponding edge in M. The 3D length reflects the edge length on the underlying surface. In the following sections, we first introduce our method of computing the 3D length of each edge in M, then present the stretch-based parameterization scheme for computing M0 in texture space. 3.2.1 Computing Length of 3D Edge Since normal field encodes the 3D geometry information of M, we first recover one rough normal field of M using the approach in [1]. The shape-from-shading algorithm in [1] has a linear complexity when recovering the normal field for diffuse surface. It is easy to implement and quite effective. For more details, please refer to [1] and [8]. With the recovered normal vector, 3D length of each edge in M can be calculated, as illustrated in Fig. 2. Suppose the observed length of edge eðABÞ in the input image is d. Its corresponding 3D length is the distance between F and D on the underlying surface. The length is denoted as kFDk. Let N be the projection of normal vector of edge eðFDÞ on the plane OAB. It can be calculated first by averaging the recovered normal vectors of A and B, then projecting the average vector onto plane OAB. From the geometry relationship in Fig. 2, we derive the length of kEDk: kEDk¼kCDk=ðV NÞ ¼ d0 =ðV NÞ; ð1Þ where V is the view direction of camera, both V and N have been normalized. With an approximation kFDk¼: kEDk, the 3D length of edge eðABÞ can be expressed as follows: l ¼ kFDk¼: d0 =ðV NÞ: ð2Þ In the above equation, d0 is determined by the observed length d of eðABÞ and scene depth of eðFDÞ. In most cases, the object is relatively far from camera. It is reasonable to assume that the scene depths for all edges inMare close enough with small variation. Consequently, the homogeneous scale factor of every edge of M can be eliminated. The surface length of edge eðABÞ can then be approximated as l ¼ kFDk¼: kABk=ðV NÞ ¼ d=ðV NÞ: ð3Þ The 3D length of each edge in M can be figured out using the above equation. Although some approximations are made, our experimental results show that the method can generate visually pleasing effects. 3.2.2 The Stretch-Based Parameterization With the calculated edge lengths, M0 can be obtained by one stretch-based parameterization method. Mesh parameterization has been extensively studied in computer graphics [22], [23], [24], [25], [26], [27], [28], and [29]. These methods take different metric criterions as the objective of energy minimization process. Among them, stretch-based methods work well when reducing the global mesh distortion [22], [25], [29]. Our concern is computing one 2D-to-2D parameterization, which is different from the aforementioned 3D-to-2D process (Fig. 3). We first describe some related notations. Let fPi ¼ ðxi; yiÞji ¼ 1; ... ; ng denote the nodes of M. fQi ¼ ðui; viÞji ¼ 1; ... ; ng represent the corresponding nodes of M0 to be solved in texture space. M0 has the same topology as M, Qi corresponds to Pi, and edge eðQiQjÞ corresponds to eðPiPjÞ. The 3D length lij of eðQiQjÞ is obtained using the aforementioned method. M0 is completely characterized by the lengths of its edges. As each edge of M0 has been figured out, M0 can be computed by minimizing the following energy function: El ¼ X ði;jÞ2edges Qi Qj 2 l 2 ij 2. l 2 ij: ð4Þ Exploiting the symmetry of the above equation, energy gradients on point Qi are @El @ui ¼ 8 X ði;jÞ2edges Qi Qj 2 l 2 ij 2 ðui ujÞ=l2 ij; ð5Þ @El @vi ¼ 8 X ði;jÞ2edges Qi Qj 2 l 2 ij 2 ðvi vjÞ=l2 ij: ð6Þ When M is dense, directly solving the above equations may cause adjacent triangles of M0 to flip over, leading to invalid topology. This is caused by inverting the orientation of the three points of triangle. To tackle this issue, we revise the energy function by penalizing the orientations of triangles using the sgn function [30]. Assume that the adjacent triangles incident upon edge eðQiQjÞ are TQ1 ¼ TðQiQk1QjÞ and TQ2 ¼ TðQiQjQk2Þ. Their corresponding triangles in the input image are GUO ET AL.: MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 429 Fig. 2. Calculation of the 3D length kFDk for the observed edge eðABÞ. V : view direction, N: the projection of edge eðFDÞ’s normal vector on the plane OAB, CD==AB, and CD ? CE. Fig. 3. In virtue of the normal field, simulating texture distortion is converted into one 2D-to-2D parameterization, after which texture mapping is then applied.
430 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,VOL.14,NO.2,MARCH/APRIL 2008 interior values of the function with known Dirichlet boundary condition and a guidance vector field.Due to its sparse linear property,the Poisson equation can be solved efficiently using conjugate gradient method or multigrid method [34]. Our Poisson-based algorithm adopts the normal-guided propagation rules for texture synthesis developed in Image space Texture space Textureshop [1].These rules describe the offsets of texture coordinates among adjacent pixels.We rewrite them as Fig.4.The orientation of the triangles(a)in image space should keep follows: consistent with that of their corresponding ones (b)in texture space. u(x+1,y)-u(x,y)=fur(N(x,y)), (9) TPI =T(PiPl Pi),and Tp2 =T(PiPi P2)(Fig.4).For each pair of corresponding triangles,the orientations of points v(x+1,))-v(x,=fm(N(x) (10) should be equal.To achieve this,we define wij=sgn min(det(QiQ,QjQridet(P.Pei,P;Pe), u(x,y+1)-u(x,y)=f(N(x,) (11) det(oQa,QQa)·det(BP,BPa)) v(,y+1)-(x,)=fw(N(x,) (12) (7) where (u(r,y),u(,y))is the texture coordinate of the pixel (y)in the concerned region,N(r,y)=(N,Ny,N:)is the The energy function is then transformed into normal vector at pixel(x,y),and =∑(I.-Q,2-)/% (8) fa(N(c,)=(1+N:-N)/(1+N)N), (13) (i,j∈edges where the coefficient w;penalizes the triangle in M'whose fw(N(,y))=(NzNu)/((1+N2)Na), (14) orientation of points flips over with respect to its corre- sponding triangle in M.If so,wij is chosen as-1,otherwise, fw(N(x,)=(1+N:-N)/(1+N)N). (15) +1.With this energy function,a valid mesh in texture space is obtained. Since the texture coordinates of nodes in M are available, The minimal value of (8)is computed by the multi- (9)-(12)can be used directly to calculate texture coordinates dimensional Newton's method.For each iteration of New- of the interior pixels of triangles in M.In practice,this may ton's method,one multigrid solver is adopted for solving result in a wired mapping.For avoiding it,the texture the sparse linear equations.In practice,it converges to the coordinates are obtained by solving the energy minimiza- final solution within several seconds. tion problem below with respect to the u component.The v Once M'is obtained with the parameterization process, component can be computed in a similar way: a new texture can be mapped onto the ROI of the input image.Since the parameterization takes into account the minu(z.y) IVu(z,y)-Du(y)2 (16) underlying geometry of ROI,the new texture deforms naturally with respect to the underlying surface.Our where Vu(x,y)=(u(x+1,y)-u(x,y),u(x,y+1)-u(x,y)), experimental results demonstrate that the distortion effects and Du(,y)=(fur(N(,))fu(N(,y)). of the new textures are visually pleasing. Minimizing (16)yields a set of Poisson equations: 3.3 Poisson-Based Refinement △u(x,y)=divDu(z,y), (17) After the stretch-based parameterization,texture coordinates where A and div represent the Laplacian and divergence of the nodes in M have been obtained.Texture coordinates of operators,respectively.We adopt a multigrid solver to the interior pixels of triangles in M can be computed by obtain the solution with high efficiency. interpolating the obtained ones using barycenter coordinates Unlike the widely used Dirichlet Boundary conditions or the Radial Basis Functions (RBFs).However,such [32],[34]of the generic Poisson process,the external force in interpolation techniques cannot reflect the distortion effect our Poisson equations is imposed by the discrete texture of the new texture in the interior of each triangle.For coordinates of nodes of M. obtaining natural and smoother distortion,we design a Poisson-based refinement process instead of using interpola- 3.4 Lighting Effect Transfer tion techniques with barycenter coordinates or the RBFs. Using texture coordinates deduced by the stretch-based The origin of Poisson equation is from Isaac Newton's parameterization and Poisson-based refinement process,a laws of gravitation [31].It has been widely used in new texture is mapped onto ROI of the input image to computer graphics,including seamlessly image editing overwrite the old one.Due to the lack of simulating lighting [32],digital photomontage [33],gradient field mesh manip- effect exhibited in ROI,the mapping result looks flattening. ulation [34],and mesh metamorphosis [35].The main For realistic appearance,transferring the lighting effect principle of Poisson equation lies in how to compute the must be considered
TP1 ¼ TðPiPk1PjÞ, and TP2 ¼ TðPiPjPk2Þ (Fig. 4). For each pair of corresponding triangles, the orientations of points should be equal. To achieve this, we define wij ¼ sgn min det QiQk1 !; QjQk1 ! det PiPk1 !; PjPk1 ! ; det QiQk2 !; QjQk2 ! det PiPk2 !; PjPk2 ! : ð7Þ The energy function is then transformed into El ¼ X ði;jÞ2edges wij kQi Qjk2 l 2 ij 2. l 2 ij; ð8Þ where the coefficient wij penalizes the triangle in M0 whose orientation of points flips over with respect to its corresponding triangle in M. If so, wij is chosen as 1, otherwise, þ1. With this energy function, a valid mesh in texture space is obtained. The minimal value of (8) is computed by the multidimensional Newton’s method. For each iteration of Newton’s method, one multigrid solver is adopted for solving the sparse linear equations. In practice, it converges to the final solution within several seconds. Once M0 is obtained with the parameterization process, a new texture can be mapped onto the ROI of the input image. Since the parameterization takes into account the underlying geometry of ROI, the new texture deforms naturally with respect to the underlying surface. Our experimental results demonstrate that the distortion effects of the new textures are visually pleasing. 3.3 Poisson-Based Refinement After the stretch-based parameterization, texture coordinates of the nodes inMhave been obtained. Texture coordinates of the interior pixels of triangles in M can be computed by interpolating the obtained ones using barycenter coordinates or the Radial Basis Functions (RBFs). However, such interpolation techniques cannot reflect the distortion effect of the new texture in the interior of each triangle. For obtaining natural and smoother distortion, we design a Poisson-based refinement process instead of using interpolation techniques with barycenter coordinates or the RBFs. The origin of Poisson equation is from Isaac Newton’s laws of gravitation [31]. It has been widely used in computer graphics, including seamlessly image editing [32], digital photomontage [33], gradient field mesh manipulation [34], and mesh metamorphosis [35]. The main principle of Poisson equation lies in how to compute the interior values of the function with known Dirichlet boundary condition and a guidance vector field. Due to its sparse linear property, the Poisson equation can be solved efficiently using conjugate gradient method or multigrid method [34]. Our Poisson-based algorithm adopts the normal-guided propagation rules for texture synthesis developed in Textureshop [1]. These rules describe the offsets of texture coordinates among adjacent pixels. We rewrite them as follows: uðx þ 1; yÞ uðx; yÞ ¼ fuxð Þ Nðx; yÞ ; ð9Þ vðx þ 1; yÞ vðx; yÞ ¼ fuvð Þ Nðx; yÞ ; ð10Þ uðx; y þ 1Þ uðx; yÞ ¼ fuvð Þ Nðx; yÞ ; ð11Þ vðx; y þ 1Þ vðx; yÞ ¼ fvyð Þ Nðx; yÞ ; ð12Þ where ðuðx; yÞ; vðx; yÞÞ is the texture coordinate of the pixel ðx; yÞ in the concerned region, Nðx; yÞ¼ðNx; Ny; NzÞ is the normal vector at pixel ðx; yÞ, and fuxð Þ¼ð Nðx; yÞ 1 þ Nz N2 y Þ=ð Þ ð1 þ NzÞNz ; ð13Þ fuvð Þ¼ð Nðx; yÞ NxNyÞ=ð Þ ð1 þ NzÞNz ; ð14Þ fvyð Þ¼ð Nðx; yÞ 1 þ Nz N2 xÞ=ð Þ ð1 þ NzÞNz : ð15Þ Since the texture coordinates of nodes in M are available, (9)-(12) can be used directly to calculate texture coordinates of the interior pixels of triangles in M. In practice, this may result in a wired mapping. For avoiding it, the texture coordinates are obtained by solving the energy minimization problem below with respect to the u component. The v component can be computed in a similar way: minuðx;yÞ Z M j j ruðx; yÞ Duðx; yÞ 2 ; ð16Þ where ruðx; yÞ¼ðuðx þ 1; yÞ uðx; yÞ, uðx; y þ 1Þ uðx; yÞÞ, and Duðx; yÞ¼ðfuxðNðx; yÞÞ; fuvðNðx; yÞÞ. Minimizing (16) yields a set of Poisson equations: uðx; yÞ ¼ divDuðx; yÞ; ð17Þ where and div represent the Laplacian and divergence operators, respectively. We adopt a multigrid solver to obtain the solution with high efficiency. Unlike the widely used Dirichlet Boundary conditions [32], [34] of the generic Poisson process, the external force in our Poisson equations is imposed by the discrete texture coordinates of nodes of M. 3.4 Lighting Effect Transfer Using texture coordinates deduced by the stretch-based parameterization and Poisson-based refinement process, a new texture is mapped onto ROI of the input image to overwrite the old one. Due to the lack of simulating lighting effect exhibited in ROI, the mapping result looks flattening. For realistic appearance, transferring the lighting effect must be considered. 430 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 2, MARCH/APRIL 2008 Fig. 4. The orientation of the triangles (a) in image space should keep consistent with that of their corresponding ones (b) in texture space.
GUO ET AL.:MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 431 Select Generate Replace Transfer Key Frame Mesh Texture Track Optimize Mesh Fig.5.Video retexturing pipeline.Retexturing is first applied to each key frame selected from the input sequence.The mesh generated for the key frame is tracked backward/forward throughout the video.After a temporal smoothing optimization,the replaced texture of the key frame is transferred onto the rest frames.In the above process,the lighting effect is transferred independently for each frame. Accurate recovery of the lighting conditions of a real- result more similar to that of the input one.Through world image is an open problem in computer vision. extensive experiments,it is set to 0.9 for results in Section 5. Researchers developed methods [3],[6]that are tailored for Our lighting preservation method is unsuitable for input planar patterns exhibited in an image.Although luminance images with strong textures.For these images,the lumi- can be extracted from curved surfaces [2],the method needs nance components are dominated by contributions from one regular texture as reference.The general solution of textures instead of from lighting effects.Fortunately,our decoupling illuminance from image demands further approach separates the task of image replacement into two investigations. independent tasks:simulating texture distortion and pre- Based on the above analysis,an approximate method is serving lighting effect.It allows other lighting preservation designed to transfer the lighting effect for texture replace- method [3]to be incorporated for enhancing the results. ment.In color theory,the intensity of each pixel can be regarded as one combination of chromaticity and lumi- 4 VIDEO RETEXTURING nance.YCbCr is one color space described by luminance and chromaticity.The Cb and Cr components represent the Texture replacement of video frames is more complicated chromaticity,and Y component encodes the luminance. than retexturing one image.For retexturing video,applying This color space is widely employed in video systems. the image retexturing approach frame by frame will cause For texture replacement,we transform the color of each the drifting phenomenon.The main reason is that the pixel into YCbCr space.The Y component of each pixel generated mesh in each frame may be quite different.After inside ROI of input image is fused with the Y component of stretch-based parameterization and Poisson-based refine- the corresponding pixel in the new texture.This scheme ment process,the texture coordinate computed at each pixel preserves luminance of the lighting effect.For transferring of ROI usually changes over frames.Hence,the region with chromaticity,the Cb,Cr components of each pixel inside new texture is flickering when playing the retextured video. ROI are overwritten by the Cb,Cr components of the We propose one key-frame based video retexturing corresponding pixel of the new texture. approach.Fig.5 outlines the processing pipeline of our Let Yi,Cbt,Crt,Yi,Cbi,Cri,and Yr,Cor,Cr,represent the approach.Some key frames are first selected from the input Y,Cb,Cr components of the pixel color in the new texture, video frames with user interaction.In these selected frames, its corresponding one in the input image,and the one in the the concerned objects usually have maximal visibility. replacement result.The above rules of transferring light Second,the correspondences between key frames are effect can be expressed as follows: established,which are triangulated into one coarse mesh. The coarse mesh serves as the initial mesh of ROI of key Cbr =Cbt, (18) frame that will be refined using the aforementioned mesh generation technique.For each key frame,the image Cr,=Crt. (19) retexturing approach is applied.The replaced results are propagated to other frames repeatedly.For preventing Y=(1-m)·Y+mtY (20) texture drifting,the texture coordinates of mesh points should be tracked accurately in consecutive frames.The where mt is one weight factor balancing the luminance generated mesh of ROI of key frame serves as an object that between the new texture and the input image.Increasing mt is tracked backward/forward throughout the video using will make the luminance component of the replacement the robust optical flow algorithm [19].Although the robust
Accurate recovery of the lighting conditions of a realworld image is an open problem in computer vision. Researchers developed methods [3], [6] that are tailored for planar patterns exhibited in an image. Although luminance can be extracted from curved surfaces [2], the method needs one regular texture as reference. The general solution of decoupling illuminance from image demands further investigations. Based on the above analysis, an approximate method is designed to transfer the lighting effect for texture replacement. In color theory, the intensity of each pixel can be regarded as one combination of chromaticity and luminance. Y CbCr is one color space described by luminance and chromaticity. The Cb and Cr components represent the chromaticity, and Y component encodes the luminance. This color space is widely employed in video systems. For texture replacement, we transform the color of each pixel into Y CbCr space. The Y component of each pixel inside ROI of input image is fused with the Y component of the corresponding pixel in the new texture. This scheme preserves luminance of the lighting effect. For transferring chromaticity, the Cb, Cr components of each pixel inside ROI are overwritten by the Cb, Cr components of the corresponding pixel of the new texture. Let Yt, Cbt, Crt, Yi, Cbi, Cri, and Yr, Cbr, Crr represent the Y , Cb, Cr components of the pixel color in the new texture, its corresponding one in the input image, and the one in the replacement result. The above rules of transferring light effect can be expressed as follows: Cbr ¼ Cbt; ð18Þ Crr ¼ Crt; ð19Þ Yr ¼ ð1 mtÞ Yt þ mt Yi; ð20Þ where mt is one weight factor balancing the luminance between the new texture and the input image. Increasing mt will make the luminance component of the replacement result more similar to that of the input one. Through extensive experiments, it is set to 0.9 for results in Section 5. Our lighting preservation method is unsuitable for input images with strong textures. For these images, the luminance components are dominated by contributions from textures instead of from lighting effects. Fortunately, our approach separates the task of image replacement into two independent tasks: simulating texture distortion and preserving lighting effect. It allows other lighting preservation method [3] to be incorporated for enhancing the results. 4 VIDEO RETEXTURING Texture replacement of video frames is more complicated than retexturing one image. For retexturing video, applying the image retexturing approach frame by frame will cause the drifting phenomenon. The main reason is that the generated mesh in each frame may be quite different. After stretch-based parameterization and Poisson-based refinement process, the texture coordinate computed at each pixel of ROI usually changes over frames. Hence, the region with new texture is flickering when playing the retextured video. We propose one key-frame based video retexturing approach. Fig. 5 outlines the processing pipeline of our approach. Some key frames are first selected from the input video frames with user interaction. In these selected frames, the concerned objects usually have maximal visibility. Second, the correspondences between key frames are established, which are triangulated into one coarse mesh. The coarse mesh serves as the initial mesh of ROI of key frame that will be refined using the aforementioned mesh generation technique. For each key frame, the image retexturing approach is applied. The replaced results are propagated to other frames repeatedly. For preventing texture drifting, the texture coordinates of mesh points should be tracked accurately in consecutive frames. The generated mesh of ROI of key frame serves as an object that is tracked backward/forward throughout the video using the robust optical flow algorithm [19]. Although the robust GUO ET AL.: MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 431 Fig. 5. Video retexturing pipeline. Retexturing is first applied to each key frame selected from the input sequence. The mesh generated for the key frame is tracked backward/forward throughout the video. After a temporal smoothing optimization, the replaced texture of the key frame is transferred onto the rest frames. In the above process, the lighting effect is transferred independently for each frame.
432 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,VOL.14,NO.2,MARCH/APRIL 2008 optical flow algorithm is proven to be of better performance ∑1o(0jto,/doj [18],[36],the estimated motion field still contains artifacts t0= (21) ∑-1o0j/d and errors.Hence,the trajectories of mesh points are further optimized by a temporal smoothing operation.After these steps,the replaced results are transferred from the key t0= ∑1a0jt,/d ∑10)/d, (22) frames to the rest frames by taking the texture in each triangle of the tracked mesh as transfer unit. with For video clip containing dynamic objects,visibility shifting often occurs at or near object boundaries.It is 1 if Poi is a stable point generally caused by moving objects,changing viewpoint, a(j)=0 otherwise. and occlusion/disocclusion events.Since it usually leads to variations of the replaced textures,visibility shifting must If the neighbors of Po are unstable,mesh points that are be dealt with carefully.We design one graphcut-based relatively farther are treated as new neighbors for the algorithm for handling visibility shifting. interpolation process.In addition,the new position of Po, calculated with the interpolated motion vector,should not 4.1 Local and Global Optimization for Object destroy the local topology around Po with respect to the Tracking reference frame.Otherwise,a random searching operation In our approach,the robust optical flow algorithm [19]is around the new position is performed to find another adopted to track the positions of mesh points.Although the suitable one. robust optical flow algorithm can handle large-scale motion, 4.1.2 Global Optimization it cannot handle image changes caused by phenomena that cannot be modeled by optical flow algorithm.Hence,the The local optimization mainly accounts for the shifting points in each frame.The global optimization aims to tracked points may deviate from their correct positions to a optimize the mesh points throughout the video sequence to certain extent.It is indispensable to optimize the tracking ensure the temporal coherence. results for avoiding texture drifting.We propose to locally Taking the video sequence as one 3D volume,the motion analyze the motion information of each mesh point and trajectory of each mesh point throughout the video can be optimize its position for each frame.Once the whole regarded as a piecewise motion curve.For the unstable sequence is processed,the trajectories of mesh points are point,it may jump suddenly or frequently along its motion further refined globally throughout the video. curve.The motion trajectory of unstable point can be refined using the curve smoothing technique. 4.1.1 Local Optimization We generalize the bilateral denoisingalgorithm [38],[39]to The optical flow algorithm takes the brightness matching smooth trajectories of all mesh points.Bilateral filter has been criterion when computing motion field between two successfully applied to remove noise from images/meshes frames.The computed motion of each mesh point can be while retaining details [38],[39].It is intuitive to extend the stabilized by considering the geometry and topology of the algorithm from mesh to curve.Before bilateral smoothing is mesh.For a mesh point,three tracking cases are usually performed,mesh points deviated significantly from the ideal considered as bad tracking: positions are manually dragged near the ideal ones. Combining the trajectories generated through bilateral Color inconsistency.It happens when the color smoothing,we compute the final trajectories of all mesh difference of a mesh point between the current points by solving an energy minimization problem. frame and reference frame is greater than a given threshold. Suppose the number of video frames is T between two Motion inconsistency.It happens when the motion key frames.Let P(t)denote the positions of vector of a mesh point differs too much from its mesh points tracked by robust optical flow with local neighbors'. optimization,and P(trepresent the positions Orientation inconsistency.It occurs when any of the refined by bilateral smoothing.P(t)are the orientations of the triangles incident upon a mesh ideal positions,which are solved by minimizing the point is flipped over with respect to its correspond- ing triangles in the reference frame. following objective function: One mesh point is considered to be unstable when any of the three bad-tracking cases occurs.Otherwise,it is a stable one.The position of an unstable point needs to be recalculated to remedy inconsistency.We employ inverse with distance interpolation [37]to recompute the motion vector of unstable point using its stable neighbors. 入(t)=e (2a) (24 Assume Po is an unstable point with neighbors Pj (to,).represent their corresponding motion vec- where o;is the standard deviation of the offsets between tors,and dodenote their distances from P,respec- P(t)and P(t)=.\is one weight factor that takes tively.The new motion vector of Po is computed as follows: greater effect on the unstable point.The weight factors are
optical flow algorithm is proven to be of better performance [18], [36], the estimated motion field still contains artifacts and errors. Hence, the trajectories of mesh points are further optimized by a temporal smoothing operation. After these steps, the replaced results are transferred from the key frames to the rest frames by taking the texture in each triangle of the tracked mesh as transfer unit. For video clip containing dynamic objects, visibility shifting often occurs at or near object boundaries. It is generally caused by moving objects, changing viewpoint, and occlusion/disocclusion events. Since it usually leads to variations of the replaced textures, visibility shifting must be dealt with carefully. We design one graphcut-based algorithm for handling visibility shifting. 4.1 Local and Global Optimization for Object Tracking In our approach, the robust optical flow algorithm [19] is adopted to track the positions of mesh points. Although the robust optical flow algorithm can handle large-scale motion, it cannot handle image changes caused by phenomena that cannot be modeled by optical flow algorithm. Hence, the tracked points may deviate from their correct positions to a certain extent. It is indispensable to optimize the tracking results for avoiding texture drifting. We propose to locally analyze the motion information of each mesh point and optimize its position for each frame. Once the whole sequence is processed, the trajectories of mesh points are further refined globally throughout the video. 4.1.1 Local Optimization The optical flow algorithm takes the brightness matching criterion when computing motion field between two frames. The computed motion of each mesh point can be stabilized by considering the geometry and topology of the mesh. For a mesh point, three tracking cases are usually considered as bad tracking: . Color inconsistency. It happens when the color difference of a mesh point between the current frame and reference frame is greater than a given threshold. . Motion inconsistency. It happens when the motion vector of a mesh point differs too much from its neighbors’. . Orientation inconsistency. It occurs when any of the orientations of the triangles incident upon a mesh point is flipped over with respect to its corresponding triangles in the reference frame. One mesh point is considered to be unstable when any of the three bad-tracking cases occurs. Otherwise, it is a stable one. The position of an unstable point needs to be recalculated to remedy inconsistency. We employ inverse distance interpolation [37] to recompute the motion vector of unstable point using its stable neighbors. Assume P0 is an unstable point with neighbors P0jj¼1;...;k. ðtx0j ; ty0j Þj¼1;...;k represent their corresponding motion vectors, and d0jj¼1;...;k denote their distances from P0, respectively. The new motion vector of P0 is computed as follows: tx0 ¼ Pk j¼1 ðjÞtx0j=d0j Pk j¼1 ðjÞ=d0j ; ð21Þ ty0 ¼ Pk j¼1 ðjÞty0j=d0j Pk j¼1 ðjÞ=d0j ; ð22Þ with ðjÞ ¼ 1 if P0j is a stable point 0 otherwise: If the neighbors of P0 are unstable, mesh points that are relatively farther are treated as new neighbors for the interpolation process. In addition, the new position of P0, calculated with the interpolated motion vector, should not destroy the local topology around P0 with respect to the reference frame. Otherwise, a random searching operation around the new position is performed to find another suitable one. 4.1.2 Global Optimization The local optimization mainly accounts for the shifting points in each frame. The global optimization aims to optimize the mesh points throughout the video sequence to ensure the temporal coherence. Taking the video sequence as one 3D volume, the motion trajectory of each mesh point throughout the video can be regarded as a piecewise motion curve. For the unstable point, it may jump suddenly or frequently along its motion curve. The motion trajectory of unstable point can be refined using the curve smoothing technique. We generalize the bilateral denoising algorithm [38], [39] to smooth trajectories of all mesh points. Bilateral filter has been successfully applied to remove noise from images/meshes while retaining details [38], [39]. It is intuitive to extend the algorithm from mesh to curve. Before bilateral smoothing is performed, mesh points deviated significantly from the ideal positions are manually dragged near the ideal ones. Combining the trajectories generated through bilateral smoothing, we compute the final trajectories of all mesh points by solving an energy minimization problem. Suppose the number of video frames is T between two key frames. Let PiðtÞi¼1;...;n;t¼1;...;T denote the positions of mesh points tracked by robust optical flow with local optimization, and PgiðtÞi¼1;...;n;t¼1;...;T represent the positions refined by bilateral smoothing. PiðtÞi¼1;...;n;t¼1;...;T are the ideal positions, which are solved by minimizing the following objective function: Et ¼ Xn i¼1 XT t¼1 iðtÞðPiðtÞ PiðtÞÞ2 þ ðPiðtÞ PgiðtÞÞ2 h i; ð23Þ with iðtÞ ¼ exp PiðtÞ PgiðtÞ 2 . ð22 iÞ; ð24Þ where i is the standard deviation of the offsets between PiðtÞt¼1;...;T and PgiðtÞt¼1;...;T . is one weight factor that takes greater effect on the unstable point. The weight factors are 432 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 2, MARCH/APRIL 2008
GUO ET AL.:MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 4 TF (a) (a) 6 Fig.6.Trimap is constructed near the object boundary,which is partly occluded by the shelter in the reference frame (a).In the current frame (b),the new appearing textured part in trimap(surrounded by the red rectangle)is extracted by graphcut.This region should be replaced with a new patch of texture. set to small costs when optimizing(23).Instead of trying to find the global minimum of (23),every mesh point is repeatedly perturbed toward its ideal position with a small (c) (d) incremental step until one local minimum is achieved. Once obtaining the final positions of mesh points in each Fig.7.Retexturing results for the image of(a).(b)is the computed mesh frame,the replaced texture adhering to the key frame is in texture space corresponding to the mesh in Fig.1b.(c)and (d)are the iteratively transferred to the other frames.During this replacement results process,the retextured effect of each frame incorporates the lighting effect encoded by the current frame using the operation is performed.For the appearing region,the algorithm described in Section 3.4. texture is replaced by fetching a new texture patch from 4.2 Graphcut Segmentation for Visibility Shifting boundary of the concerned region in texture space. Visibility shifting is one common phenomenon of dynamic video sequence.It may lead to variations of replaced EXPERIMENTAL RESULTS textures.For example,if the concerned object moves into a Extensively,experiments are carried out to verify our mesh- shelter,its boundary triangles will be reduced because parts guided retexturing approach.The results are generated of the texture change from visible to invisible.In contrast, when the object moves out,some previously occluded parts using an Intel Pentium IV 2.4-GHz PC with 512 Mbytes of will shift from invisible to visible.The shifting parts should main memory. be replaced accordingly with patches of the new texture. 5.1 Results on Image In RotoTexture [4],an advection tree is constructed to One texture replacement result for a single image is shown deal with the dynamic visibility of texture.It is mainly in Fig.7.The input image is Fig.7a.Fig.7b is the computed designed for the cluster-based synthesis method.Since our approach is mesh-guided,the tree scheme [4]is unsuitable mesh in texture space corresponding to the generated mesh in Fig.1b.The mesh in Fig.1b has 385 points.It takes for our framework. 9 seconds to solve the nonlinear equation (8)using the We propose a graphcut-segmentation-based algorithm to multidimensional Newton's method.The brick texture and address the visibility shifting issue of video retexturing, checkboard texture show regular/near regular patterns. which can keep the consistencies of object movements. Graphcut algorithm has been proven to be very effective They are used as new textures for replacing the input when handling partial occlusions in multiview stereo or image.Figs.7c and 7d are the replacement results with the volumetric rendering.In our scenario,graphcut [40]is brick and checkboard textures,respectively. adopted to precisely locate the new boundary of concerned Fig.8 shows another image retexturing result.Fig.8a is a region in the current frame. screenshot of one 3D cube rendered by Deep Exploration [41]. We first build a trimap near the boundary of concerned Figs.8b,8c,and 8d are the retexturing results with three region(Fig.6).The interior region of the trimap belongs to new textures.The results show that the stretch-based the concerned region TF.The exterior region is grouped into parameterization can produce realistic distortion effects the background TB.The rest region is called unknown incurred by perspective projection.For this example,the region Tu,which includes the boundary triangles together Poisson-based refinement process is not applied because the with their mirrored ones.Graphcut is applied to the trimap foreshortening effect can be obtained with RBF interpola- for extracting the texture part in the unknown region.To tion.Figs.9 and 10 present image retexturing results for enhance efficiency,color distributions of the foreground curved surfaces using our mesh-guided approach. and background used in graphcut are learned once via the Results with variant texton scales.Progressively variant key frame.They are then used when processing the other textures can be produced by texture synthesis technique frames.Since the texture part is extracted inside the trimap, [42].However,generating progressively variant textures mesh points are either deleted or appended according to poses challenges for previous methods of texture replace- whether there exists a triangle whose variation of texture ment [1],[2],[3],[8].The main difficulty lies in how to occupancy exceeds 90 percent.Accordingly,a remeshing effectively combine the algorithm of creating variant
set to small costs when optimizing (23). Instead of trying to find the global minimum of (23), every mesh point is repeatedly perturbed toward its ideal position with a small incremental step until one local minimum is achieved. Once obtaining the final positions of mesh points in each frame, the replaced texture adhering to the key frame is iteratively transferred to the other frames. During this process, the retextured effect of each frame incorporates the lighting effect encoded by the current frame using the algorithm described in Section 3.4. 4.2 Graphcut Segmentation for Visibility Shifting Visibility shifting is one common phenomenon of dynamic video sequence. It may lead to variations of replaced textures. For example, if the concerned object moves into a shelter, its boundary triangles will be reduced because parts of the texture change from visible to invisible. In contrast, when the object moves out, some previously occluded parts will shift from invisible to visible. The shifting parts should be replaced accordingly with patches of the new texture. In RotoTexture [4], an advection tree is constructed to deal with the dynamic visibility of texture. It is mainly designed for the cluster-based synthesis method. Since our approach is mesh-guided, the tree scheme [4] is unsuitable for our framework. We propose a graphcut-segmentation-based algorithm to address the visibility shifting issue of video retexturing, which can keep the consistencies of object movements. Graphcut algorithm has been proven to be very effective when handling partial occlusions in multiview stereo or volumetric rendering. In our scenario, graphcut [40] is adopted to precisely locate the new boundary of concerned region in the current frame. We first build a trimap near the boundary of concerned region (Fig. 6). The interior region of the trimap belongs to the concerned region TF . The exterior region is grouped into the background TB. The rest region is called unknown region TU, which includes the boundary triangles together with their mirrored ones. Graphcut is applied to the trimap for extracting the texture part in the unknown region. To enhance efficiency, color distributions of the foreground and background used in graphcut are learned once via the key frame. They are then used when processing the other frames. Since the texture part is extracted inside the trimap, mesh points are either deleted or appended according to whether there exists a triangle whose variation of texture occupancy exceeds 90 percent. Accordingly, a remeshing operation is performed. For the appearing region, the texture is replaced by fetching a new texture patch from boundary of the concerned region in texture space. 5 EXPERIMENTAL RESULTS Extensively, experiments are carried out to verify our meshguided retexturing approach. The results are generated using an Intel Pentium IV 2.4-GHz PC with 512 Mbytes of main memory. 5.1 Results on Image One texture replacement result for a single image is shown in Fig. 7. The input image is Fig. 7a. Fig. 7b is the computed mesh in texture space corresponding to the generated mesh in Fig. 1b. The mesh in Fig. 1b has 385 points. It takes 9 seconds to solve the nonlinear equation (8) using the multidimensional Newton’s method. The brick texture and checkboard texture show regular/near regular patterns. They are used as new textures for replacing the input image. Figs. 7c and 7d are the replacement results with the brick and checkboard textures, respectively. Fig. 8 shows another image retexturing result. Fig. 8a is a screenshot of one 3D cube rendered by Deep Exploration [41]. Figs. 8b, 8c, and 8d are the retexturing results with three new textures. The results show that the stretch-based parameterization can produce realistic distortion effects incurred by perspective projection. For this example, the Poisson-based refinement process is not applied because the foreshortening effect can be obtained with RBF interpolation. Figs. 9 and 10 present image retexturing results for curved surfaces using our mesh-guided approach. Results with variant texton scales. Progressively variant textures can be produced by texture synthesis technique [42]. However, generating progressively variant textures poses challenges for previous methods of texture replacement [1], [2], [3], [8]. The main difficulty lies in how to effectively combine the algorithm of creating variant GUO ET AL.: MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 433 Fig. 6. Trimap is constructed near the object boundary, which is partly occluded by the shelter in the reference frame (a). In the current frame (b), the new appearing textured part in trimap (surrounded by the red rectangle) is extracted by graphcut. This region should be replaced with a new patch of texture. Fig. 7. Retexturing results for the image of (a). (b) is the computed mesh in texture space corresponding to the mesh in Fig. 1b. (c) and (d) are the replacement results
434 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,VOL.14,NO.2,MARCH/APRIL 2008 (a) (b) Fig.10.Image retexturing result of curtains so the replaced textures may appear discontinuous in the region.It is mainly caused by the discontinuities of visual appearance of the underlying 3D surface.Unlike previous methods that cannot create such effects easily,our mesh- guided retexturing approach can generate the effects with convenient user interaction In our approach,visual discontinuities are handled in the (c) (d) process of stretch-based parameterization.We interactively specify relevant points around the boundary of occluded Fig.8.(b).(c),and(d)The replacement results for a screenshot image region (for example,the red points in Fig.12a).Each of (a)of a 3D cube.This example illustrates that our approach retains the these points is accompanied with a virtual point.The perspective distortion convincingly. original point and its virtual reflection represent the adjacent invisible/visible triangles,respectively.In the replaced texture with the means of simulating texture process of mesh parameterization,a virtual distance is distortions.With the stretch-based parameterization,our valued to the original specified point and its virtual one. retexturing approach,however,can create such effects with Fig.12b is the generated mesh in texture space.The holes in ease (Figs.11c and 11d). the mesh generate discontinuities of the replaced result in In our approach,we only need to specify a few key self-occlusion region.Figs.12c and 12d show two replace- vectors over the input image (Fig.11a)for describing texton ment results. scales.The scale of each mesh edge computed by the Fig.13 compares our method with Textureshop [1].The inverse distance interpolation is integrated into the objec- input image is the inset in Fig.13b.Fig.13a is our retexturing result,and Fig.13b is directly copied from that generated by tive function Textureshop [1,Fig.7].Texture discontinuities are created in drapes of the sculpture in both results.The result generated =∑(Ie:-QP-(1/喝)‘/层, (25) by our mesh-guided approach is visually comparable to that (i.j)Eedoes where 1/s;;represents the ideal scale interpolated for edge e(QiQ;).The smaller the edge scale is,the bigger the texton scale is.Minimizing the function yields the parameterized mesh in texture space (Fig.11b). Results with self-occlusion.In a self-occluded region, texture patches may be covered by their neighboring ones, (a (b (c) (d) Fig.11.Results with variant texton scales.The key vectors specified are shown in (a).(b)shows the computed mesh in texture space.From top left to bottom right,texton scales in (c)and (d)vary progressively from Fig.9.Image retexturing result of clothes. small to big along the horizontal and vertical directions
replaced texture with the means of simulating texture distortions. With the stretch-based parameterization, our retexturing approach, however, can create such effects with ease (Figs. 11c and 11d). In our approach, we only need to specify a few key vectors over the input image (Fig. 11a) for describing texton scales. The scale of each mesh edge computed by the inverse distance interpolation is integrated into the objective function: El ¼ X ði;jÞ2edges wij kQi Qjk2 ð1=s2 ijÞl 2 ij 2. l 2 ij; ð25Þ where 1=sij represents the ideal scale interpolated for edge eðQiQjÞ. The smaller the edge scale is, the bigger the texton scale is. Minimizing the function yields the parameterized mesh in texture space (Fig. 11b). Results with self-occlusion. In a self-occluded region, texture patches may be covered by their neighboring ones, so the replaced textures may appear discontinuous in the region. It is mainly caused by the discontinuities of visual appearance of the underlying 3D surface. Unlike previous methods that cannot create such effects easily, our meshguided retexturing approach can generate the effects with convenient user interaction. In our approach, visual discontinuities are handled in the process of stretch-based parameterization. We interactively specify relevant points around the boundary of occluded region (for example, the red points in Fig. 12a). Each of these points is accompanied with a virtual point. The original point and its virtual reflection represent the adjacent invisible/visible triangles, respectively. In the process of mesh parameterization, a virtual distance is valued to the original specified point and its virtual one. Fig. 12b is the generated mesh in texture space. The holes in the mesh generate discontinuities of the replaced result in self-occlusion region. Figs. 12c and 12d show two replacement results. Fig. 13 compares our method with Textureshop [1]. The input image is the inset in Fig. 13b. Fig. 13a is our retexturing result, and Fig. 13b is directly copied from that generated by Textureshop [1, Fig. 7]. Texture discontinuities are created in drapes of the sculpture in both results. The result generated by our mesh-guided approach is visually comparable to that 434 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 2, MARCH/APRIL 2008 Fig. 8. (b), (c), and (d) The replacement results for a screenshot image (a) of a 3D cube. This example illustrates that our approach retains the perspective distortion convincingly. Fig. 9. Image retexturing result of clothes. Fig. 10. Image retexturing result of curtains. Fig. 11. Results with variant texton scales. The key vectors specified are shown in (a). (b) shows the computed mesh in texture space. From top left to bottom right, texton scales in (c) and (d) vary progressively from small to big along the horizontal and vertical directions.
GUO ET AL.:MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 435 (a) (b) (a) (b (c) (d) Fig.12.Results with self-occlusion.(a)The relevant points specified over the mesh.(b)The computed mesh in texture space.(c)and (d)show the retexturing results.Texture discontinuities in the occluded regions are obvious. of cluster-based texture synthesis method [1].Both methods can generate good texture discontinuities in the drape (c) (d) regions.Figs.13c and 13d show more replacement results generated by our approach using checkboard and text Fig.13.Comparison with Textureshop.The input image is the inset in images.Fig.14 gives another result generated by our mesh- (b).(a)is our result and (b)is the result of Textureshop.The result guided approach in which the brick and checkboard textures generated by our mesh-guided approach is visually comparable to follow the surfaces in the photo. Textureshop's cluster-based texture synthesis method.Both methods generate good texture discontinuities in the drape regions. 5.2 Results on Video Figs.15,16,and 17 show some frames selected from three dress deforming largely.In this example,eight key frames video retexturing results.The dynamic effects of these are selected in time domain.The odd rows show some results can be found in the accompanying video.In the input frames.The even rows show the corresponding experiments,processing one video clip takes from several frames of retextured video with checkboard texture.Even minutes to tens of minutes according to the length and with the robust optical flow algorithm and local optimiza- resolution of the video clip.The computation intensive tion,perfect tracking is impossible.There are few mesh steps of our approach are tracking mesh points of moving points jumping suddenly along their motion curves.Before objects,local and global smoothing of the tracked points, the global optimization of motion trajectories,these points and texture sampling when propagating the replaced are dragged to the user-specified positions with little results between adjacent frames. editing operations.With the above scheme,our approach Fig.15 shows some frames of one retextured video to can generate a visually pleasing result with slight texture verify the effectiveness of our visibility shifting method. The waving arm of the announcer leads to visibility changes in front of the body (please refer to the accompanying video).The video contains 50 sequential frames,among which the first frame is selected as the key frame to initialize the retexturing process.The graphcut segmentation is adopted to tackle the visibility issue.Although graphcut segmentation is incorporated into our method,the new appearing regions in some frames cannot be extracted accurately.Hence,artifacts may appear in the retextured video.One solution of reducing such artifacts is improving the accuracy of boundary segmentation of moving object. With more precise models such as Gaussian Mixture Model (GMM)[21]to compute the foreground/background dis- tributions,the result can be improved. Fig.16 shows six frames from one video retexturing result.The input image sequence contains 118 sequential frames.One model walks through the stage with a silk Fig.14.Image retexturing result for a sculpture
of cluster-based texture synthesis method [1]. Both methods can generate good texture discontinuities in the drape regions. Figs. 13c and 13d show more replacement results generated by our approach using checkboard and text images. Fig. 14 gives another result generated by our meshguided approach in which the brick and checkboard textures follow the surfaces in the photo. 5.2 Results on Video Figs. 15, 16, and 17 show some frames selected from three video retexturing results. The dynamic effects of these results can be found in the accompanying video. In the experiments, processing one video clip takes from several minutes to tens of minutes according to the length and resolution of the video clip. The computation intensive steps of our approach are tracking mesh points of moving objects, local and global smoothing of the tracked points, and texture sampling when propagating the replaced results between adjacent frames. Fig. 15 shows some frames of one retextured video to verify the effectiveness of our visibility shifting method. The waving arm of the announcer leads to visibility changes in front of the body (please refer to the accompanying video). The video contains 50 sequential frames, among which the first frame is selected as the key frame to initialize the retexturing process. The graphcut segmentation is adopted to tackle the visibility issue. Although graphcut segmentation is incorporated into our method, the new appearing regions in some frames cannot be extracted accurately. Hence, artifacts may appear in the retextured video. One solution of reducing such artifacts is improving the accuracy of boundary segmentation of moving object. With more precise models such as Gaussian Mixture Model (GMM) [21] to compute the foreground/background distributions, the result can be improved. Fig. 16 shows six frames from one video retexturing result. The input image sequence contains 118 sequential frames. One model walks through the stage with a silk dress deforming largely. In this example, eight key frames are selected in time domain. The odd rows show some input frames. The even rows show the corresponding frames of retextured video with checkboard texture. Even with the robust optical flow algorithm and local optimization, perfect tracking is impossible. There are few mesh points jumping suddenly along their motion curves. Before the global optimization of motion trajectories, these points are dragged to the user-specified positions with little editing operations. With the above scheme, our approach can generate a visually pleasing result with slight texture GUO ET AL.: MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 435 Fig. 12. Results with self-occlusion. (a) The relevant points specified over the mesh. (b) The computed mesh in texture space. (c) and (d) show the retexturing results. Texture discontinuities in the occluded regions are obvious. Fig. 13. Comparison with Textureshop. The input image is the inset in (b). (a) is our result and (b) is the result of Textureshop. The result generated by our mesh-guided approach is visually comparable to Textureshop’s cluster-based texture synthesis method. Both methods generate good texture discontinuities in the drape regions. Fig. 14. Image retexturing result for a sculpture