Computers Graphics 38 (2014)174-182 Contents lists available at ScienceDirect Computers Graphics ELSEVIER journal homepage:www.elsevier.com/locate/cag CAD/Graphics 2013 Efficient view manipulation for cuboid-structured images CrossMark Yanwen Guo .*Guiping Zhang,Zili Lan3,Wenping Wang State Key Lab for Novel Software Technology.Nanjing University.PR China Department of Computer Science,The University of Hong Kong Hong Kong ARTICLE INFO ABSTRACT Article history: We present in this paper an efficient algorithm for manipulating the viewpoints of cuboid-structured Received 31 July 2013 images with moderate user interaction.Such images are very popular,and we first recover an Received in revised form approximate geometric model with the prior knowledge of the latent cuboid.While this approximated 27 October 2013 Accepted 28 October 2013 cuboid structure does not provide an accurate scene reconstruction,we demonstrate that it is sufficient Available online 13 November 2013 to re-render the images realistically under new viewpoints in a nearly geometrically accurate manner. The new image with high visual quality is generated by making the rest image region deform in Keywords: accordance with the re-projected cuboid structure,via a triangular mesh deformation scheme.The View manipulation energy function has been carefully designed to be a quadratic function so that it can be efficiently Mesh deformation Optimization minimized via solving a sparse linear system.We verify the effectiveness of our technique through testing images with standard and non-standard cuboid structures,and demonstrate an application of upright adjustment of photographs and a user interface which enables the user to watch the scene under new viewpoints on a viewing sphere interactively. 2013 Elsevier Ltd.All rights reserved 1.Introduction Stereoscopic devices and content relying on stereopsis are now widely available,and the problem of manipulating perspective in Advances in imaging technology and hardware improvement of stereoscopic pairs is addressed in [2].Assuming that depth digital cameras result in continuous improvements of image quality. variations of the scene relative to its distance from the camera People can take high quality photos at high resolutions without are small,slanted man-made structures can be straightened up by always suffering from noises,low contrast,and blur that may an improved homography model [3]. degrade photo quality,more easily than before.However,photos We do not intend to study the aesthetics of whether or not a taken by amateur photographers are often with bad viewpoints.for photograph looks visually pleasing under the current viewpoint. instance slanted man-made structures and unbalanced compositions, Instead,our goal is to enable the generation of novel images with making scenes look dull and less vivid.On the other hand,when new viewpoints given only a single image as input,with moderate looking at a photo shared by friends or downloaded from Flicker or user assistance.To this end,our primary observation is that many Photobucket,people may imagine naturally what the scene looks like images of man-made scenes exhibit the cuboid dominated three- if it is taken under a new viewpoint.Automatic optimization of the dimensional structures,in which projections of two perpendicular viewpoint of a given photograph is thus desired. planes dominating the latent three-dimensional geometry.occu- For rendering the image with a novel viewpoint,direct repro- pying the major part of an image.A pair of projected parallel lines jection of the 3D latent scene remains elusive since accurate in each plane can be found in the image.Such an image either reconstruction of the whole scene is still challenging.Recent itself has a cuboid structure or its scene is dominated by a cuboid- efforts have been made to optimize perspective or to imitate re- like object.As shown in Fig.1,the cuboid-structured images are projection by means of image transformation.Manipulation of very popular,for example,the photos of buildings(upper left and photographic perspective is enabled in [1]by combining recent lower left).indoor scenes (upper right),apartments,and buses. image warping techniques and constraints from projective geo- Essentially,some photos exhibit latent cuboid structures.For metry.Heavy user assistance based on understanding of the basic example,the lower right photo of Fig.1 is such an image since principles of perspective construction is often required to accu- we can easily construct two perpendicular planes by specifying rately mark the image with a number of image space constraints. auxiliary lines shown as the dotted lines in this photo,even though physically one of the two planes containing the auxiliary lines does not exist.Such a cuboid structure is the major visual cue .Corresponding author.Tel.:+86 1391 3028 596:fax:+86 25 896 86596 to depict a three-dimensional scene and to convey perspective.By E-mail addresses:ywguo.nju@gmail.com,ywguo@nju.edu.cn (Y.Guo). manipulating the cuboid structure reconstructed by acceptable 0097-8493/S-see front matter e 2013 Elsevier Ltd.All rights reserved. htp:/dx.doi.org10.10160.cag2013.10.038
CAD/Graphics 2013 Efficient view manipulation for cuboid-structured images Yanwen Guo a,n , Guiping Zhang a , Zili Lan a , Wenping Wang b a State Key Lab for Novel Software Technology, Nanjing University, PR China b Department of Computer Science, The University of Hong Kong, Hong Kong article info Article history: Received 31 July 2013 Received in revised form 27 October 2013 Accepted 28 October 2013 Available online 13 November 2013 Keywords: View manipulation Mesh deformation Optimization abstract We present in this paper an efficient algorithm for manipulating the viewpoints of cuboid-structured images with moderate user interaction. Such images are very popular, and we first recover an approximate geometric model with the prior knowledge of the latent cuboid. While this approximated cuboid structure does not provide an accurate scene reconstruction, we demonstrate that it is sufficient to re-render the images realistically under new viewpoints in a nearly geometrically accurate manner. The new image with high visual quality is generated by making the rest image region deform in accordance with the re-projected cuboid structure, via a triangular mesh deformation scheme. The energy function has been carefully designed to be a quadratic function so that it can be efficiently minimized via solving a sparse linear system. We verify the effectiveness of our technique through testing images with standard and non-standard cuboid structures, and demonstrate an application of upright adjustment of photographs and a user interface which enables the user to watch the scene under new viewpoints on a viewing sphere interactively. & 2013 Elsevier Ltd. All rights reserved. 1. Introduction Advances in imaging technology and hardware improvement of digital cameras result in continuous improvements of image quality. People can take high quality photos at high resolutions without always suffering from noises, low contrast, and blur that may degrade photo quality, more easily than before. However, photos taken by amateur photographers are often with bad viewpoints, for instance slanted man-made structures and unbalanced compositions, making scenes look dull and less vivid. On the other hand, when looking at a photo shared by friends or downloaded from Flicker or Photobucket, people may imagine naturally what the scene looks like if it is taken under a new viewpoint. Automatic optimization of the viewpoint of a given photograph is thus desired. For rendering the image with a novel viewpoint, direct reprojection of the 3D latent scene remains elusive since accurate reconstruction of the whole scene is still challenging. Recent efforts have been made to optimize perspective or to imitate reprojection by means of image transformation. Manipulation of photographic perspective is enabled in [1] by combining recent image warping techniques and constraints from projective geometry. Heavy user assistance based on understanding of the basic principles of perspective construction is often required to accurately mark the image with a number of image space constraints. Stereoscopic devices and content relying on stereopsis are now widely available, and the problem of manipulating perspective in stereoscopic pairs is addressed in [2]. Assuming that depth variations of the scene relative to its distance from the camera are small, slanted man-made structures can be straightened up by an improved homography model [3]. We do not intend to study the aesthetics of whether or not a photograph looks visually pleasing under the current viewpoint. Instead, our goal is to enable the generation of novel images with new viewpoints given only a single image as input, with moderate user assistance. To this end, our primary observation is that many images of man-made scenes exhibit the cuboid dominated threedimensional structures, in which projections of two perpendicular planes dominating the latent three-dimensional geometry, occupying the major part of an image. A pair of projected parallel lines in each plane can be found in the image. Such an image either itself has a cuboid structure or its scene is dominated by a cuboidlike object. As shown in Fig. 1, the cuboid-structured images are very popular, for example, the photos of buildings (upper left and lower left), indoor scenes (upper right), apartments, and buses. Essentially, some photos exhibit latent cuboid structures. For example, the lower right photo of Fig. 1 is such an image since we can easily construct two perpendicular planes by specifying auxiliary lines shown as the dotted lines in this photo, even though physically one of the two planes containing the auxiliary lines does not exist. Such a cuboid structure is the major visual cue to depict a three-dimensional scene and to convey perspective. By manipulating the cuboid structure reconstructed by acceptable Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cag Computers & Graphics 0097-8493/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cag.2013.10.038 n Corresponding author. Tel.: þ86 1391 3028 596; fax: þ86 25 896 86596. E-mail addresses: ywguo.nju@gmail.com, ywguo@nju.edu.cn (Y. Guo). Computers & Graphics 38 (2014) 174–182
Y.Guo et aL.Computers Graphics 38(2014)174-182 175 A spidery mesh is employed to obtain a simple scene model from the central perspective image using a graphical interface. The animators utilize this incomplete scene information to make animation from the input pictures.Instead of attempting to recover precise geometry,a rough 3D environment is constructed from a single image by applying a statistical framework [8].The model is constructed directly from the learned geometric labels: ground,vertical and sky,on the image.None of the above methods aim to re-generate a new image with high visual quality as if it is captured from a novel viewpoint.In contrast,we only need to partially recover a cuboid dominated 3D representation of the image with moderate user interaction,the whole image is re-rendered by making the rest image region deform in accor- dance with the re-projection of the cuboid structure. Recently.the advances of shape deformation [9,10]and retar- geting techiques [11-15]make it possible to manipulate perspec- Fig.1.Images with cuboid structures. tive by the means of image deformation [1.A 2D image warp is computed by optimizing an energy function such that the entire warp is as shape-preserving as possible,and meanwhile satisfies optimization,we are able to simulate novel viewpoints and to the constraints originated from projective geometry.The user first render the new images with high visual quality by letting the rest annotates an image by marking a number of image space con- image region deform in accordance with the transformation of the straints,with pixel accuracy.User assistance is required to accu- cuboid structure. rately mark the image and manipulate its perspective with a Although high-quality 3D reconstruction from a single image number of image space constraints.Overall eight different types remains difficult,we can recover an approximation of the three- of constraints which may oppose directly each other are incorpo- dimensional cuboid structure easily,with only a few user-specified rated into the energy function.Taking care of these constraints auxiliary lines on the image.It should be noted that although cautiously for efficient optimization poses a challenge for amateur simplistic model of geometry can be recovered by leveraging small users.The problem of manipulating perspective in stereoscopic amounts of annotation [4]or by user annotation assisted scene pairs is addressed in [2].Given a new perspective,correspondence analysis [5,6]for the applications of augmented reality.there is no constraints between stereoscopic image pairs are determined,and doubt that fully automatic recovering the geometry without any user a warp for each image which preserves salient image features and intervention is still challenging.especially for the images with our so guarantees proper stereopsis relative to the new camera is called non-standard cuboid structures where some cuboid edges computed. never exist.We thus allow the users to specify some auxiliary lines Perspective projections are limited to fairly narrow view angles. with trivial user efforts.More importantly,we show that the re- Correction of image deformations incurred by projecting wide projection of this approximated cuboid structure is sufficient to meet fields of view onto a flat 2D display surface is address in[16,171. the requirement of accuracy of viewpoint changes.Given a new Our work is also inspired by the recent efforts on photo viewpoint,the image is rendered by optimizing a quadratic image composition assessment and enhancement [18-20].Most methods warping energy.The energy function incorporates the hard constraint build their measures of visual aesthetics on the rule of thirds of cuboid transformation,and constraints on shape and straight lines which means that an image should be imaged as divided into nine Without significant manual effort,the newly perspective image equal parts by two equally spaced horizontal lines and two vertical rendered is nearly geometrically accurate,and visually pleasing. lines,and important compositional elements should be placed Applications:Firstly,our technique can be used for correcting along these lines or their intersections.Bhattacharya et al.[18 those slanted structures of photos taken by casual photographers. learn a support vector regression model for capturing aesthetics. Secondly,unlike previous image deformation driven methods,we Image quality is improved by recomposing the salient object onto can generate novel images under key viewpoints around the the inpainted background or by using a visual weight balancing viewpoint of input image,with given viewing angles.This enables technique.Liu et al.[20]modify image composition by using a us to design an interface through which the user can watch the compound operator of crop-and-retarget and seek the solution by scene by changing viewpoints smoothly on a viewing sphere, particle swarm optimization. mimicking 3D browsing experience.The images under key view- points are interpolated to produce intermediate results. To summarize,our main contributions are as follows: 3.View manipulation of the cuboid structure .Present an algorithm for manipulating views of cuboid- structured images with very little user effort. Our view manipulation method is specifically designed to Show that re-projection of the approximated cuboid structure optimize viewpoints of those images that show cuboid-dominated is sufficient to meet the requirement of viewpoint change. three-dimensional structures.Extracting a 3D representation from a Provide an interface that allows the users to watch the scene single-view image depicting a 3D object has been a longstanding under new viewpoints on a viewing sphere interactively. goal of computer vision.It has been shown recently that 3D cuboids in single-view images can be automatically localized by using a discriminative parts-based detector [21.We allow the users to interactively specify projected lines of the latent cuboid structure on 2.Related work the image,with which we estimate an approximation of the cuboid geometry.Hough transform and Canny edge detector are used Manipulation of the perspective in a photograph for the tasks to assist users and to reduce interaction errors in this process of touring into the pictures is made possible by Horry et al.[7]. We show that,given a new viewpoint,the re-projection of this
optimization, we are able to simulate novel viewpoints and to render the new images with high visual quality by letting the rest image region deform in accordance with the transformation of the cuboid structure. Although high-quality 3D reconstruction from a single image remains difficult, we can recover an approximation of the threedimensional cuboid structure easily, with only a few user-specified auxiliary lines on the image. It should be noted that although simplistic model of geometry can be recovered by leveraging small amounts of annotation [4] or by user annotation assisted scene analysis [5,6] for the applications of augmented reality, there is no doubt that fully automatic recovering the geometry without any user intervention is still challenging, especially for the images with our so called non-standard cuboid structures where some cuboid edges never exist. We thus allow the users to specify some auxiliary lines with trivial user efforts. More importantly, we show that the reprojection of this approximated cuboid structure is sufficient to meet the requirement of accuracy of viewpoint changes. Given a new viewpoint, the image is rendered by optimizing a quadratic image warping energy. The energy function incorporates the hard constraint of cuboid transformation, and constraints on shape and straight lines. Without significant manual effort, the newly perspective image rendered is nearly geometrically accurate, and visually pleasing. Applications: Firstly, our technique can be used for correcting those slanted structures of photos taken by casual photographers. Secondly, unlike previous image deformation driven methods, we can generate novel images under key viewpoints around the viewpoint of input image, with given viewing angles. This enables us to design an interface through which the user can watch the scene by changing viewpoints smoothly on a viewing sphere, mimicking 3D browsing experience. The images under key viewpoints are interpolated to produce intermediate results. To summarize, our main contributions are as follows: Present an algorithm for manipulating views of cuboidstructured images with very little user effort. Show that re-projection of the approximated cuboid structure is sufficient to meet the requirement of viewpoint change. Provide an interface that allows the users to watch the scene under new viewpoints on a viewing sphere interactively. 2. Related work Manipulation of the perspective in a photograph for the tasks of touring into the pictures is made possible by Horry et al. [7]. A spidery mesh is employed to obtain a simple scene model from the central perspective image using a graphical interface. The animators utilize this incomplete scene information to make animation from the input pictures. Instead of attempting to recover precise geometry, a rough 3D environment is constructed from a single image by applying a statistical framework [8]. The model is constructed directly from the learned geometric labels: ground, vertical and sky, on the image. None of the above methods aim to re-generate a new image with high visual quality as if it is captured from a novel viewpoint. In contrast, we only need to partially recover a cuboid dominated 3D representation of the image with moderate user interaction, the whole image is re-rendered by making the rest image region deform in accordance with the re-projection of the cuboid structure. Recently, the advances of shape deformation [9,10] and retargeting techiques [11–15] make it possible to manipulate perspective by the means of image deformation [1]. A 2D image warp is computed by optimizing an energy function such that the entire warp is as shape-preserving as possible, and meanwhile satisfies the constraints originated from projective geometry. The user first annotates an image by marking a number of image space constraints, with pixel accuracy. User assistance is required to accurately mark the image and manipulate its perspective with a number of image space constraints. Overall eight different types of constraints which may oppose directly each other are incorporated into the energy function. Taking care of these constraints cautiously for efficient optimization poses a challenge for amateur users. The problem of manipulating perspective in stereoscopic pairs is addressed in [2]. Given a new perspective, correspondence constraints between stereoscopic image pairs are determined, and a warp for each image which preserves salient image features and guarantees proper stereopsis relative to the new camera is computed. Perspective projections are limited to fairly narrow view angles. Correction of image deformations incurred by projecting wide fields of view onto a flat 2D display surface is address in [16,17]. Our work is also inspired by the recent efforts on photo composition assessment and enhancement [18–20]. Most methods build their measures of visual aesthetics on the rule of thirds which means that an image should be imaged as divided into nine equal parts by two equally spaced horizontal lines and two vertical lines, and important compositional elements should be placed along these lines or their intersections. Bhattacharya et al. [18] learn a support vector regression model for capturing aesthetics. Image quality is improved by recomposing the salient object onto the inpainted background or by using a visual weight balancing technique. Liu et al. [20] modify image composition by using a compound operator of crop-and-retarget and seek the solution by particle swarm optimization. 3. View manipulation of the cuboid structure Our view manipulation method is specifically designed to optimize viewpoints of those images that show cuboid-dominated three-dimensional structures. Extracting a 3D representation from a single-view image depicting a 3D object has been a longstanding goal of computer vision. It has been shown recently that 3D cuboids in single-view images can be automatically localized by using a discriminative parts-based detector [21]. We allow the users to interactively specify projected lines of the latent cuboid structure on the image, with which we estimate an approximation of the cuboid geometry. Hough transform and Canny edge detector are used to assist users and to reduce interaction errors in this process. We show that, given a new viewpoint, the re-projection of this Fig. 1. Images with cuboid structures. Y. Guo et al. / Computers & Graphics 38 (2014) 174–182 175
176 Y.Guo et al.Computers Graphics 38(2014)174-182 system.The new imaging plane passing through the world center o is I'whose normal vector is 0o'.Let (Qo....Qs)denote the real coordinates of the 3D cuboid,corresponding to the projected points (po.....Ps)on the input image.We prove that the 2D projections of (Po.....Ps)on the new imaging plane I'is identical to the projections of (Qo.....Qs).by eliminating translation.To achieve this,we only need to prove Fig.2.Left:a standard case of the projection of a cuboid structure.Note that paP1=9091: 2) IPop4l=IpPsl and lPopal=IpiPal are not required.paps and pap3 are not necessa- rily the projections of two vertical edges of the cuboid.Right:the 3D cuboid structure. where PoP;represents the projected vector of PoP on I',and qoq denotes the projected vector of QoQ on lo the imaging plane with approximated cuboid structure is sufficient to meet the require- focal length f if (Qo.Qs)are given (see Fig.3). ment of accuracy of viewpoint change. We should rotate the camera around 0o(0,0.z)if (Qo.....Qs} are known.In this case,the new viewpoint is 0o.We have 3.1.A standard case IOoFI=1000ol.Let lo passing through Oo be an auxiliary plane 3.1.1.Cuboid reconstruction whose normal is 0o0o.Let PooPo be the projected vector of For ease of exposition,it is initially assumed that we can find, QoQ on Io.we get on the image plane,the projections of a vertical edge and two pairs of parallel edges with two interaction points on the vertical edge of a standard cuboid structure,as shown in Fig.2.Without 9o=OaFPaoPa (3) loss of generality,we assume that the imaging plane is placed It is obvious that (0o'pi)and (OoOoPo)are similar triangles. along with the XY plane of the world coordinate system with its center at the world origin.The camera is stationed at Z-axis with A(poopi)and A(PoooPo)are similar triangles.We get center of projection F(0.0.f).fis the focal length of the camera.Let [Po....,Ps}and (po.....ps}denote six key vertexes on the latent PoP O'P 00 f (4) cuboid facing the camera and their projections on the image, PQoPQ1 OoPQ OQOo =0o万 respectively.(Po.....Ps)are specified by user on the image.Po and P are two corners of the cuboid shared by two perpendicular Combining(3)and (4).we get pop=gog1. planes P1(P1,Po,P4,Ps)and P2(Po,P1,P2,P3)on the three-dimen- sional cuboid.Since PoP3//P1P2 in 3D space,the extensions of Pop3 3.2.A more complex case and pipz will meet at a vanishing point c1.except for the special case PoP3//piP2.Similarly,the extensions of piPs and pop meet at In some images we cannot find,on the projected vertical edge, another one c2.Imagine that PoP3 and P1P2 meet at a point at two interaction points of the projected parallel edges.Fig.4 shows infinity whose projection on the imaging plane is c.The extension such an example.To tackle this issue,we first compute the of Fc will meet with PoP3 and P P2 at this point as well.We thus vanishing points of the projected parallel line segments.Two have FC1//PoP3.FC1//P1P2.and Fc1//P1.Similarly,Fc2 is parallel to auxiliary lines,shown as dotted segments in Fig.4,which pass P2,and Fc is perpendicular to Fc2 with which fcan be obtained easily. through the corresponding vanishing points can be drawn.This It is still impossible to recover the accurate geometry and case is then converted to the standard case we have described position of the 3D cuboid structure without any other prior previously.We then compute equations of the two perpendicular knowledge.By making reasonable assumptions,we wish to gen- erate an approximation of the structure which is exactly the same as the accurate one without considering the scale difference.Given any new viewpoint,we show that the re-projection of this approximated cuboid structure is sufficient to meet the require- ment of accuracy of viewpoint changes. Considering that IFpol is proportional to IFPol,coordinate of Po 00 can be obtained by setting the ratio of IFpol to IFPol to a constant. We set it to 1 in our experiments.P can be represented by parametric coordinate with F and p.We then compute (P1.....Ps) 00 by exploiting the geometric relationships PoP1 L (PoP3,PoP4,P1P2,P1Ps),PoP4//PsP1,PoP3//P1P2.(1) 3.1.2.Analysis of accuracy Given a new viewpoint,we assume that focal length f of the camera remains fixed,since zooming in and out can be easily imitated by upsampling and downsampling the image.Without loss of generality,center of the scene is placed at(0,0,zP)with Zpo the z-coordinate of Po.Recall that we set the ratio of IFpol to IFPol to 1.Therefore,center of the scene is (0,0,0)which is in accordance with the world center O.Let us denote the new viewpoint by o'(f sin o cos 6.f sin o sine.f cos o)where (0.) Fig.3.Projection of PoPi on the new imaging plane I'is identical to the projected is the polar angle and azimuthal angle in spherical coordinate QoQ on lo by eliminating translation
approximated cuboid structure is sufficient to meet the requirement of accuracy of viewpoint change. 3.1. A standard case 3.1.1. Cuboid reconstruction For ease of exposition, it is initially assumed that we can find, on the image plane, the projections of a vertical edge and two pairs of parallel edges with two interaction points on the vertical edge of a standard cuboid structure, as shown in Fig. 2. Without loss of generality, we assume that the imaging plane is placed along with the XY plane of the world coordinate system with its center at the world origin. The camera is stationed at Z-axis with center of projection Fð0; 0; fÞ. f is the focal length of the camera. Let fP0; …; P5g and fp0; …; p5g denote six key vertexes on the latent cuboid facing the camera and their projections on the image, respectively. fp0; …; p5g are specified by user on the image. P0 and P1 are two corners of the cuboid shared by two perpendicular planes P1ðP1; P0; P4; P5Þ and P2ðP0; P1; P2; P3Þ on the three-dimensional cuboid. Since P0P3==P1P2 in 3D space, the extensions of p0p3 and p1p2 will meet at a vanishing point c1, except for the special case p0p3==p1p2. Similarly, the extensions of p1p5 and p0p4 meet at another one c2. Imagine that P0P3 and P1P2 meet at a point at infinity whose projection on the imaging plane is c1. The extension of Fc1 will meet with P0P3 and P1P2 at this point as well. We thus have Fc1==P0P3, Fc1==P1P2, and Fc1==P1. Similarly, Fc2 is parallel to P2, and Fc1 is perpendicular to Fc2 with which f can be obtained easily. It is still impossible to recover the accurate geometry and position of the 3D cuboid structure without any other prior knowledge. By making reasonable assumptions, we wish to generate an approximation of the structure which is exactly the same as the accurate one without considering the scale difference. Given any new viewpoint, we show that the re-projection of this approximated cuboid structure is sufficient to meet the requirement of accuracy of viewpoint changes. Considering that jFp0j is proportional to jFP0j, coordinate of P0 can be obtained by setting the ratio of jFp0j to jFP0j to a constant. We set it to 1 in our experiments. Pi can be represented by parametric coordinate with F and pi. We then compute fP1; …; P5g by exploiting the geometric relationships P0P1 ? fP0P3; P0P4; P1P2; P1P5g; P0P4==P5P1; P0P3==P1P2: ð1Þ 3.1.2. Analysis of accuracy Given a new viewpoint, we assume that focal length f of the camera remains fixed, since zooming in and out can be easily imitated by upsampling and downsampling the image. Without loss of generality, center of the scene is placed at ð0; 0; zP0 Þ with zP0 the z-coordinate of P0. Recall that we set the ratio of jFp0j to jFP0j to 1. Therefore, center of the scene is ð0; 0; 0Þ which is in accordance with the world center O. Let us denote the new viewpoint by O′ðf sin φ cos θ; f sin φ sin θ; f cos φÞ where ðθ; φÞ is the polar angle and azimuthal angle in spherical coordinate system. The new imaging plane passing through the world center O is I′ whose normal vector is OO′ !. Let fQ0; …;Q5g denote the real coordinates of the 3D cuboid, corresponding to the projected points fp0; …; p5g on the input image. We prove that the 2D projections of fP0; …; P5g on the new imaging plane I′ is identical to the projections of fQ0; …;Q5g, by eliminating translation. To achieve this, we only need to prove p′ 0p′ 1 ! ¼ q′ 0q′ 1 !; ð2Þ where p′ 0p′ 1 ! represents the projected vector of P0P1 ! on I′, and q′ 0q′ 1 ! denotes the projected vector of Q0Q1 ! on I ′ Q the imaging plane with focal length f if fQ0; …;Q5g are given (see Fig. 3). We should rotate the camera around OQ ð0; 0; zQ0 Þ if fQ0; …;Q5g are known. In this case, the new viewpoint is O′ Q . We have jOQ Fj¼jOQO′ Q j. Let I ′ Q passing through OQ be an auxiliary plane whose normal is OQO′ Q !. Let p′ Q0p′ Q1 ! be the projected vector of Q0Q1 ! on I ′ Q , we get q′ 0q′ 1 ! ¼ f jOQ Fj p′ Q0p′ Q1 !: ð3Þ It is obvious that▵ðOO′p′ 1Þ and▵ðOQO′ Q p′ Q1Þ are similar triangles. ▵ðp′ 0O′p′ 1Þ and▵ðp′ Q0O′ Q p′ Q1Þ are similar triangles. We get p′ 0p′ 1 ! p′ Q0p′ Q1 ! ¼ O′p′ 1 ! O′ Q p′ Q1 ! ¼ OO′ ! OQO′ Q ! ¼ f jOQ Fj : ð4Þ Combining (3) and (4), we get p′ 0p′ 1 ! ¼ q′ 0q′ 1 !. 3.2. A more complex case In some images we cannot find, on the projected vertical edge, two interaction points of the projected parallel edges. Fig. 4 shows such an example. To tackle this issue, we first compute the vanishing points of the projected parallel line segments. Two auxiliary lines, shown as dotted segments in Fig. 4, which pass through the corresponding vanishing points can be drawn. This case is then converted to the standard case we have described previously. We then compute equations of the two perpendicular p0 1p 2p 3p 4 p 5 p c2 c1 P0 1 2P 3 P P4 P5 P Fig. 2. Left: a standard case of the projection of a cuboid structure. Note that, jp0p4j¼jp1p5j and jp0p3j¼jp1p2j are not required. p4p5 and p2p3 are not necessarily the projections of two vertical edges of the cuboid. Right: the 3D cuboid structure. F Q0 Q1 P1 P0 O ' 0 p ' 1p ' 0q ' 1q OQ ' OQ ' Q0 p ' Q1 p X Y Z I O' ' I' Q I' q Fig. 3. Projection of P0P1 on the new imaging plane I′ is identical to the projected Q0Q1 on I ′ Q by eliminating translation. 176 Y. Guo et al. / Computers & Graphics 38 (2014) 174–182
Y.Guo et aL.Computers Graphics 38(2014)174-182 177 planes with which spatial coordinates of the eight endpoints of the re-projected cuboid structure and other important visual features. originally specified line segments can be easily obtained. This is formulated as a mesh deformation problem which tries to find a target mesh M'=(V',E',F')that has the same topology as M. M'is solved for by optimizing the energy function integrating 4.Cuboid-guided image warp mesh deformation and other constraints Cuboid constraint:Given a novel viewpoint,target positions of Given a new viewpoint,the cuboid structure is projected onto mesh vertices (ve.....vk)on the cuboid structure are determined by perspective projection of this structure on the new imaging the new imaging plane.The new image is rendered by making the rest image region deform in accordance with the transformation of plane.This is treated as the cuboid constraint Fc(va.....vk)=0. It is a hard constraint in our system.That is to say,to construct the the cuboid structure.We use a mesh representation to realize mesh deformation function coordinates of those mesh vertices on image deformation as shown in Fig.5.Generating a mesh for an the cuboid structure are computed by the projection in advance. image for the tasks of image resizing and manipulation has been discussed in [13,14,1,22,231.Unlike the quad mesh employed by They remain unchanged during optimization. Shape constraint:To ensure a globally smooth warp,we most previous methods,we use triangular mesh to represent the input image.An advantage of the triangular mesh over quad mesh formulate shape deformation energy of the mesh in terms of conformality.Producing conformal maps in the least squares sense is that cuboid-structured region-of-interest (ROl)can be repre- sented compactly by the meshes with moderate density.since it for automatic texture atlas generation has been discussed in [24] A similar shape preservation term for warping quad mesh is used may have slant and irregular borders.Furthermore,slant edge in [1].We consider the map:M (x,y)(x',y').M is conformal if it structures can be approximated accurately by triangle edges. satisfies the Cauchy-Riemann equations, enabling easy preservation of them during image warp. We use constrained Delaunay triangulation to create a content- +4 ax =0 (5) aware mesh representation.Points are first evenly sampled from image borders,the cuboid structure,and strong edges detected where M is rewritten using complex numbers,i.e.M=x'+iy'. using Hough transform,and their connectivity are constraints for Obviously,the Jacobian matrix should be of the form triangulation.To keep uniformity of point density,we detect some corners and if necessary further add some auxiliary ones,since 「ab1 nearly uniform point density normally facilitates mesh processing. J=(b -a (6) We represent the triangular mesh as M=(V.E.T)with vertices We now consider the restriction of M to a source triangle V,edges E,and triangles T.V=[vo.v1.....Vn]with vi=v(Xi.y)ER2 T(vi.vj.vk)in the input image.Its counterpart to be solved is denote initial vertex positions. denoted by T'(v;,vi,v).In general,the affine transformation from For a new viewpoint,the cuboid structure is re-projected using T to T'can be expressed as the approximated geometry.With this re-projected cuboid struc- ture,we render the new image with high visual quality by making A=TT-1. (7) those important image structures be consistent with the projec- We rewrite the above equation using the homogeneous coordi- tion.Since geometry of the whole image is inaccessible,we seek to nates,and let T-be reduce possible visual artifacts caused by the inconsistency of the Ta bi di T-1 a2 b2 d2 (8) a3 b3 d3 From (6)and (8).the map is conformal if the following equations hold: ET=a1Xi+a2xj+a3Xk+(b1y;+b2yj+b3Vg)=0. ET=b1Xj+b2xj+b3xx-(a1Yi+a2yj+a3Vk)=0. (9) We define the total conformality energy by summing up the Fig.4.A more complex case.User specified solid line segments which represent individual energy terms on each T. the projections of two pairs of parallel edges do not intersect on the projected vertical edge.To handle this case,we draw auxiliary lines and convert it to the Es=+民) (10) standard case. Fig.5.Workflow of cuboid-guided image warp.The triangular mesh M is shown in the 2nd image where the blue and green line segments denote the cuboid structure and lines detected or specified by the user.The M'shown in the 3rd image is computed by solving a cuboid-guided mesh deformation problem.The final result with regular borders is generated by cropping the result with non-regular boundaries.(For interpretation of the references to color in this figure caption,the reader is referred to the web version of this article.)
planes with which spatial coordinates of the eight endpoints of the originally specified line segments can be easily obtained. 4. Cuboid-guided image warp Given a new viewpoint, the cuboid structure is projected onto the new imaging plane. The new image is rendered by making the rest image region deform in accordance with the transformation of the cuboid structure. We use a mesh representation to realize image deformation as shown in Fig. 5. Generating a mesh for an image for the tasks of image resizing and manipulation has been discussed in [13,14,1,22,23]. Unlike the quad mesh employed by most previous methods, we use triangular mesh to represent the input image. An advantage of the triangular mesh over quad mesh is that cuboid-structured region-of-interest (ROI) can be represented compactly by the meshes with moderate density, since it may have slant and irregular borders. Furthermore, slant edge structures can be approximated accurately by triangle edges, enabling easy preservation of them during image warp. We use constrained Delaunay triangulation to create a contentaware mesh representation. Points are first evenly sampled from image borders, the cuboid structure, and strong edges detected using Hough transform, and their connectivity are constraints for triangulation. To keep uniformity of point density, we detect some corners and if necessary further add some auxiliary ones, since nearly uniform point density normally facilitates mesh processing. We represent the triangular mesh as M ¼ ðV; E; TÞ with vertices V, edges E, and triangles T. V ¼ ½v0; v1; …; vn with vi ¼ vðxi; yiÞAR2 denote initial vertex positions. For a new viewpoint, the cuboid structure is re-projected using the approximated geometry. With this re-projected cuboid structure, we render the new image with high visual quality by making those important image structures be consistent with the projection. Since geometry of the whole image is inaccessible, we seek to reduce possible visual artifacts caused by the inconsistency of the re-projected cuboid structure and other important visual features. This is formulated as a mesh deformation problem which tries to find a target mesh M′ ¼ ðV′; E′; F′Þ that has the same topology as M. M′ is solved for by optimizing the energy function integrating mesh deformation and other constraints. Cuboid constraint: Given a novel viewpoint, target positions of mesh vertices fvc1; …; vckg on the cuboid structure are determined by perspective projection of this structure on the new imaging plane. This is treated as the cuboid constraint FCðv′ c1; …; v′ ckÞ ¼ 0. It is a hard constraint in our system. That is to say, to construct the mesh deformation function coordinates of those mesh vertices on the cuboid structure are computed by the projection in advance. They remain unchanged during optimization. Shape constraint: To ensure a globally smooth warp, we formulate shape deformation energy of the mesh in terms of conformality. Producing conformal maps in the least squares sense for automatic texture atlas generation has been discussed in [24]. A similar shape preservation term for warping quad mesh is used in [1]. We consider the map: M : ðx; yÞ↦ðx′; y′Þ. M is conformal if it satisfies the Cauchy–Riemann equations, ∂M ∂x þi ∂M ∂y ¼ 0; ð5Þ where M is rewritten using complex numbers, i.e. M ¼ x′þiy′. Obviously, the Jacobian matrix should be of the form J ¼ a b b a : ð6Þ We now consider the restriction of M to a source triangle Tðvi; vj; vkÞ in the input image. Its counterpart to be solved is denoted by T′ðv′ i ; v′ j ; v′ kÞ. In general, the affine transformation from T to T′ can be expressed as A ¼ T′T 1: ð7Þ We rewrite the above equation using the homogeneous coordinates, and let T 1 be T 1 ¼ a1 b1 d1 a2 b2 d2 a3 b3 d3 2 6 4 3 7 5: ð8Þ From (6) and (8), the map is conformal if the following equations hold: ETJ1 ¼ a1x′ i þa2x′ j þa3x′ kþ ðb1y′ i þb2y′ j þb3y′ kÞ ¼ 0; ETJ2 ¼ b1x′ i þb2x′ j þb3x′ k ða1y′ i þa2y′ j þa3y′ kÞ ¼ 0: ð9Þ We define the total conformality energy by summing up the individual energy terms on each T, ES ¼ ∑ T ðE2 TJ1 þE2 TJ2 Þ: ð10Þ p0 1p 2p 3 p 4p 5 p Fig. 4. A more complex case. User specified solid line segments which represent the projections of two pairs of parallel edges do not intersect on the projected vertical edge. To handle this case, we draw auxiliary lines and convert it to the standard case. Fig. 5. Workflow of cuboid-guided image warp. The triangular mesh M is shown in the 2nd image where the blue and green line segments denote the cuboid structure and lines detected or specified by the user. The M′ shown in the 3rd image is computed by solving a cuboid-guided mesh deformation problem. The final result with regular borders is generated by cropping the result with non-regular boundaries. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) Y. Guo et al. / Computers & Graphics 38 (2014) 174–182 177
178 Y.Guo et al.Computers Graphics 38(2014)174-182 Fig.6.Viewpoint adjustment for the images with standard cuboid structures.The 1st row is the inputs with cuboid structures marked by blue segments and line constraints by green ones.The 2nd row shows corresponding results with constrained lines,and the results without constraints are given in the 3rd row.(For interpretation of the references to color in this figure caption,the reader is referred to the web version of this article.) Line constraint:Strong edges such as straight lines are impor- Border constraint:Physically each side of image borders should tant visual features.They are vital clues for understanding image be constrained to remain straight.The energy term Es of this content,and should be maintained as-rigid-as possible.We detect constraint is defined similarly to the line constraint. those line segments using Hough transform.Users can also specify Total energy:In summary,combining all the above energy some additional curved edges.Points sampled from the edges and terms,we wish to minimize the following energy function: their connectivity yielded from the corresponding edges are fed arg max AsEs+ALEL+AvEy+4BEB, into the triangulation process beforehand.Let (vi,vi,vk)denote a triplet of vertices on a straight line.To preserve the shape of strong (14) edges,we preserve the length ratio ry of vivk to vivj.and the angle s.t.Fc(Vel...Vek)=0. 0 formed by vivj and vivg in each triplet locally [15].We express where is.i,Ay,and ig are the coefficients weighting different the energy term regarding line constraint as energy terms.Straight and important curved lines as visually E=∑Iwk-9-·R·(y-2, prominent features should be kept.To mimic hard constraints, (11) AL.the weight of straight line constraint,is often set to a bigger value compared with the weight of shape constraint.ly,the with weight of vertical and horizontal line constraint,can be set by COS j the user with respect to image content.Obviously.Ev can be Rj= sin 0j cos 0j (12) enforced as a hard constraint with a bigger iv.It is useful to straighten up those slanted man-made structures in an input Besides the lines detected by Hough transform or specified by image to improve its perceptual quality.In practice,to deal with the user,we use line constraint to preserve the shapes of those possible confliction between border constraint and constraint on salient objects that lie across two different faces of the latent the cuboid structure near image border,Eg often takes effect as a cuboid or the cuboid and the rest image region.A line segment is soft constraint by setting ig to a small value.For all the results in specified on each of such objects,and is fed into E for avoiding this paper we use weights of is=1.=100.v and is are set to heavy distortion. 100 as well if Ey and Eg are taken as hard constraints,and are set to Vertical and horizontal line constraint:Photos taken by amateur 10 for soft constraints. photographers often contain slanted vertical or horizontal lines In is noted that we do not impose the constraint for avoiding due to improper camera rotations,for instance slanted buildings, mesh flip-over in the above energy function.In all our experi- windows,and picture frames.This may cause visual discomfort ments,mesh flipping is seldom encountered.We check it after the when we look at such photos.Our system supports automatic deformed mesh is obtained,and once flipping is detected,we correction of slanted line segments when viewpoint is changed correct them locally. and image is warped.For the slanted cuboid structure,the idea is The energy function is a quadratic function of V.The solution to re-project it properly.The new viewpoint is computed auto- can be obtained efficiently by solving a sparse linear system. matically by letting the projected edges to be horizontal or vertical.While for the rest slanted line segments,this is treated as the vertical and horizontal line constraint.Let I denote a slanted 5.Experiments line segment which can be detected automatically or specified by the user,and (vn,...,vim}represent vertices on I.The vertical line We have implemented our view manipulation algorithm on a constraint is expressed as PC with Intel Core i3-2100 CPU at 3.1 GHz,and experimented with our technique on a variety of images.Some representative results Ev =( (13) are shown in Figs.6-10 and 12. Figs.6 and 7 demonstrate the results on several images of man- The horizontal line constraint is defined similarly. made buildings.The first and third rows are the input images
Line constraint: Strong edges such as straight lines are important visual features. They are vital clues for understanding image content, and should be maintained as-rigid-as possible. We detect those line segments using Hough transform. Users can also specify some additional curved edges. Points sampled from the edges and their connectivity yielded from the corresponding edges are fed into the triangulation process beforehand. Let 〈vi; vj; vk〉 denote a triplet of vertices on a straight line. To preserve the shape of strong edges, we preserve the length ratio rj of vjvk to vivj, and the angle θj formed by vivj and vjvk in each triplet locally [15]. We express the energy term regarding line constraint as EL ¼ ∑ 〈vi;vj;vk〉 ‖ðv′ k v′ j Þrj Rj ðv′ j v′ i Þ‖2; ð11Þ with Rj ¼ cos θj sin θj sin θj cos θj !: ð12Þ Besides the lines detected by Hough transform or specified by the user, we use line constraint to preserve the shapes of those salient objects that lie across two different faces of the latent cuboid or the cuboid and the rest image region. A line segment is specified on each of such objects, and is fed into EL for avoiding heavy distortion. Vertical and horizontal line constraint: Photos taken by amateur photographers often contain slanted vertical or horizontal lines due to improper camera rotations, for instance slanted buildings, windows, and picture frames. This may cause visual discomfort when we look at such photos. Our system supports automatic correction of slanted line segments when viewpoint is changed and image is warped. For the slanted cuboid structure, the idea is to re-project it properly. The new viewpoint is computed automatically by letting the projected edges to be horizontal or vertical. While for the rest slanted line segments, this is treated as the vertical and horizontal line constraint. Let l denote a slanted line segment which can be detected automatically or specified by the user, and fvl1;…; vlmg represent vertices on l. The vertical line constraint is expressed as EV ¼ ∑ l ∑ lm i ¼ 1 ðx′ li x′ l1Þ 2: ð13Þ The horizontal line constraint is defined similarly. Border constraint: Physically each side of image borders should be constrained to remain straight. The energy term EB of this constraint is defined similarly to the line constraint. Total energy: In summary, combining all the above energy terms, we wish to minimize the following energy function: arg max V′ λSES þλLEL þλV EV þλBEB; s:t: FCðv′ c1; …; v′ ckÞ ¼ 0: ð14Þ where λS, λL, λV , and λB are the coefficients weighting different energy terms. Straight and important curved lines as visually prominent features should be kept. To mimic hard constraints, λL, the weight of straight line constraint, is often set to a bigger value compared with the weight of shape constraint. λV , the weight of vertical and horizontal line constraint, can be set by the user with respect to image content. Obviously, EV can be enforced as a hard constraint with a bigger λV . It is useful to straighten up those slanted man-made structures in an input image to improve its perceptual quality. In practice, to deal with possible confliction between border constraint and constraint on the cuboid structure near image border, EB often takes effect as a soft constraint by setting λB to a small value. For all the results in this paper we use weights of λS ¼ 1, λL ¼ 100. λV and λB are set to 100 as well if EV and EB are taken as hard constraints, and are set to 10 for soft constraints. In is noted that we do not impose the constraint for avoiding mesh flip-over in the above energy function. In all our experiments, mesh flipping is seldom encountered. We check it after the deformed mesh is obtained, and once flipping is detected, we correct them locally. The energy function is a quadratic function of V′. The solution can be obtained efficiently by solving a sparse linear system. 5. Experiments We have implemented our view manipulation algorithm on a PC with Intel Core i3-2100 CPU at 3.1 GHz, and experimented with our technique on a variety of images. Some representative results are shown in Figs. 6–10 and 12. Figs. 6 and 7 demonstrate the results on several images of manmade buildings. The first and third rows are the input images. Fig. 6. Viewpoint adjustment for the images with standard cuboid structures. The 1st row is the inputs with cuboid structures marked by blue segments and line constraints by green ones. The 2nd row shows corresponding results with constrained lines, and the results without constraints are given in the 3rd row. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) 178 Y. Guo et al. / Computers & Graphics 38 (2014) 174–182
Y.Guo et aL.Computers Graphics 38(2014)174-182 179 Fig.7.Viewpoint adjustment for the images with standard cuboid structures.(For interpretation of the references to color in this figure caption,the reader is referred to the web version of this article.) Fig.8.Viewpoint adju stment for two indoor scenes Input Results of a new viewpoint with non-regular and regular borders Constraints Results of another viewpoint with non-regular and regular borders Fig.9.Viewpoint adjustment for an image with a non-standard cuboid structure.(For interpretation of the references to color in this figure caption,the reader is referred to the web version of this article.) The standard cuboid structures of the interest objects,shown as changed viewpoints.The results of viewpoint shift on two indoor blue line segments specified by the user on the images.can be images are shown in Fig.8. found in these images.We specify them manually in our current In Fig.9 we demonstrate an example where user-specified line implementation.Note that,for each pair of parallel edges of the segments in blue cannot meet the standard of a cuboid structure. cuboid structure,lengthes of the edges that are parallel to each We add auxiliary lines,shown as dotted blue segments,under the other are not necessarily identical,as shown by these examples. constraint of vanishing point,allowing for better control over The second and fourth rows are the corresponding results with image deformation.The green segments are detected by using
The standard cuboid structures of the interest objects, shown as blue line segments specified by the user on the images, can be found in these images. We specify them manually in our current implementation. Note that, for each pair of parallel edges of the cuboid structure, lengthes of the edges that are parallel to each other are not necessarily identical, as shown by these examples. The second and fourth rows are the corresponding results with changed viewpoints. The results of viewpoint shift on two indoor images are shown in Fig. 8. In Fig. 9 we demonstrate an example where user-specified line segments in blue cannot meet the standard of a cuboid structure. We add auxiliary lines, shown as dotted blue segments, under the constraint of vanishing point, allowing for better control over image deformation. The green segments are detected by using Fig. 7. Viewpoint adjustment for the images with standard cuboid structures. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) Fig. 8. Viewpoint adjustment for two indoor scenes. Input Constraints Results of a new viewpoint with non-regular and regular borders Results of another viewpoint with non-regular and regular borders Fig. 9. Viewpoint adjustment for an image with a non-standard cuboid structure. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) Y. Guo et al. / Computers & Graphics 38 (2014) 174–182 179
180 Y.Guo et al.Computers Graphics 38(2014)174-182 Fig.10.Viewpoint adjustment for a photograph(middle)of many tall buildings in the downtown of Toronto.The arrows indicate the directions of viewpoint changing. Fig.11.Comparison of our results(right)with the method of 1](middle).Our results are comparable to the results by 1. Table 1 The number of line segments for constraints used in each image. Fig.6 Fig.7 Fig.8 Fig.9 g.10 Fig.11 (left/middle/right) (left/middle/right) (left/right) (up/bottom) 13/6/17 16/13/16 13/13 12 18 15/17 Hough transform and they are subject to the line constraint and segments which include five on the cuboid structure and one for vanishing point constraint in our optimization.Two new images the extra line constraint for the middle image shown in Fig.6.For rendered under new viewpoints are shown. the input image of Fig.10,18 line segments are used.Normally Fig.10 shows the results of viewpoint adjustment for a around 15 lines segments are used,the number varying according photograph taken in the downtown of Toronto.The center of this to image complexity.It should be noted that we do no need to photograph is a building with cuboid structure.Many tall build- impose line constraint for each straight line in the image,and ings and a viaduct in the background make viewpoint adjustment evenly distributed sparse line segments are enough for constrain- of this photo a challenging task.Two new images with changed ing the resulting image from obvious line distortions.By compar- viewpoints are given. ison,it is reported by [1]that a total of 22 line segments and polygon edges are specified per image by using their method. 5.1.Comparison with previous work The algorithm of [1]needs to minimize a nonlinear least squares function.The optimization time is dominated by solving We first compare our results against the results produced by [1] the linear system at each iteration of the optimization.As (Fig.11).Our results are comparable to those by [11.In [11,the user reported,it takes on average 3.37 second for optimization.In needs to annotate an image by marking a number of image space contrast,computation of our approach is mainly consumed by constraints which are fed into the optimization framework.The optimizing Eq.(14)which results in solving a sparse linear system user needs to understand the basic principles of perspective only once due to the nature of quadratic formulation of our construction,such as vanishing points and lines,in order to use algorithm.Normally,a source image is divided into a spare mesh their tool properly.This,to some extent professional work may with several hundreds of vertexes.It takes around 100 ms to make image manipulation a frustrating task for users,and may generate a resulting image except for user interaction.Obviously. cause failure and even improper perspective due to improper our algorithm is more efficient than [1]. interaction operations.In contrast,to use our framework the user Our approach focuses on manipulating views of cuboid- only specifies the cuboid structure,and the knowledge on per- structured images.A relevant algorithm on image editing by spective projection is not needed.Furthermore,we can generate means of cuboid proxies for representing those cuboid-like objects novel images under key viewpoints around the viewpoint of input in man-made environments is given in [6].We would like to image,with given viewing angles,due to the reconstructed clarify the difference between our approach and that of [6.First, approximate cuboid structure.This is demonstrated by the appli- we have different goals.Our approach wishes to change the cation of viewing sphere as follows.By contrast,the algorithm in viewpoints of images with cuboid dominated structures.The [1]cannot manipulate the image in an intuitive manner as ours algorithm in 6].however,aims at editing the cuboid-like objects with controlled viewpoint.Table 1 shows the number of line mimicking real-world behavior such as replacing the objects with segments for constraints used in each image.We use six line new ones and deforming the objects in images.Viewpoint
Hough transform and they are subject to the line constraint and vanishing point constraint in our optimization. Two new images rendered under new viewpoints are shown. Fig. 10 shows the results of viewpoint adjustment for a photograph taken in the downtown of Toronto. The center of this photograph is a building with cuboid structure. Many tall buildings and a viaduct in the background make viewpoint adjustment of this photo a challenging task. Two new images with changed viewpoints are given. 5.1. Comparison with previous work We first compare our results against the results produced by [1] (Fig. 11). Our results are comparable to those by [1]. In [1], the user needs to annotate an image by marking a number of image space constraints which are fed into the optimization framework. The user needs to understand the basic principles of perspective construction, such as vanishing points and lines, in order to use their tool properly. This, to some extent professional work may make image manipulation a frustrating task for users, and may cause failure and even improper perspective due to improper interaction operations. In contrast, to use our framework the user only specifies the cuboid structure, and the knowledge on perspective projection is not needed. Furthermore, we can generate novel images under key viewpoints around the viewpoint of input image, with given viewing angles, due to the reconstructed approximate cuboid structure. This is demonstrated by the application of viewing sphere as follows. By contrast, the algorithm in [1] cannot manipulate the image in an intuitive manner as ours with controlled viewpoint. Table 1 shows the number of line segments for constraints used in each image. We use six line segments which include five on the cuboid structure and one for the extra line constraint for the middle image shown in Fig. 6. For the input image of Fig. 10, 18 line segments are used. Normally, around 15 lines segments are used, the number varying according to image complexity. It should be noted that we do no need to impose line constraint for each straight line in the image, and evenly distributed sparse line segments are enough for constraining the resulting image from obvious line distortions. By comparison, it is reported by [1] that a total of 22 line segments and polygon edges are specified per image by using their method. The algorithm of [1] needs to minimize a nonlinear least squares function. The optimization time is dominated by solving the linear system at each iteration of the optimization. As reported, it takes on average 3.37 second for optimization. In contrast, computation of our approach is mainly consumed by optimizing Eq. (14) which results in solving a sparse linear system only once due to the nature of quadratic formulation of our algorithm. Normally, a source image is divided into a spare mesh, with several hundreds of vertexes. It takes around 100 ms to generate a resulting image except for user interaction. Obviously, our algorithm is more efficient than [1]. Our approach focuses on manipulating views of cuboidstructured images. A relevant algorithm on image editing by means of cuboid proxies for representing those cuboid-like objects in man-made environments is given in [6]. We would like to clarify the difference between our approach and that of [6]. First, we have different goals. Our approach wishes to change the viewpoints of images with cuboid dominated structures. The algorithm in [6], however, aims at editing the cuboid-like objects mimicking real-world behavior such as replacing the objects with new ones and deforming the objects in images. Viewpoint Fig. 10. Viewpoint adjustment for a photograph (middle) of many tall buildings in the downtown of Toronto. The arrows indicate the directions of viewpoint changing. Fig. 11. Comparison of our results (right) with the method of [1] (middle). Our results are comparable to the results by [1]. Table 1 The number of line segments for constraints used in each image. Fig. 6 (left/middle/right) Fig. 7 (left/middle/right) Fig. 8 (left/right) Fig. 9 Fig. 10 Fig. 11 (up/bottom) 13/6/17 16/13/16 13/13 12 18 15/17 180 Y. Guo et al. / Computers & Graphics 38 (2014) 174–182
Y.Guo et aL.Computers Graphics 38(2014)174-182 181 Input Lightroom Ours Fig.12.Upright adjustment for a photo of the Taj Mahal (left).an indoor image of a temple (middle).and a church photo(right).The 2nd row shows the results by Adobe Lightroom Upright.and the 3rd row shows our results.The results of Lightroom show apparent visual artifacts (see the red ellipses in the 1st and 3rd results).(For interpretation of the references to color in this figure caption.the reader is referred to the web version of this article.) changing of the environment is not supported.Second,and more importantly,to meet the different goals,only partial and inaccu- rate recovery of the standard and non-standard cuboid structures is required by our approach.We do not need to accurately reconstruct the geometry of the cuboid structure.Only five edges on the cuboid dominated structure for an input image,including a so-called vertical edge and two pairs of parallel edges (not necessarily of identical length)are enough.This is validated by most of our experimental results.We believe it is challenging to automatically detect the cuboid structures and reconstruct the accurate geometry for most our testing images,such as most interest objects in Figs.6 and 7 and the indoor images in Fig.8,by the algorithm in [6.Furthermore,our approach is competent for viewpoint manipulations for the image with a non- standard cuboid dominated structure like the input photo shown in Fig.9. 5.2.Upright adjustment Man-made structures often appear to be slanted in photos taken by casual photographers.An example is shown in the 1st column of Fig.12.This is partly due to the improper position where the camera is placed at.Human visual system however Fig.13.Images under new viewpoints are interpolated from the original photo and always expects tall man-made structures to be straight-up.In [3]. four new images under key viewpoints highlighted with red rectangles.(For the slanted structures are dealt with by using an improved interpretation of the references to color in this figure caption,the reader is referred to the web version of this article. homography model.Our algorithm can also be used to straighten up the slanted cuboid structures in images.The idea is to re- generate the image by modifying the viewpoint of the input image properly.The new viewpoint is computed automatically by letting show apparent visual artifacts in the results of Taj Mahal and the projected structures to be vertical. Church photos (see the red ellipses).In [3].the authors assume A software implementation of [3]is Adobe Lightroom Upright. that depth variations of the scene relative to its distance from the We thus compare our results to those produced by Lightroom. camera are small.The reason for the artifacts may be that such Fig.12 shows the results produced by Lightroom (2nd row)and assumption does not hold exactly for the two images.Transforming those by our algorithm (3rd row).Our results are generally the input image with a homography is not always sufficient since it comparable to those by Lightroom upright.The results of Lightroom is oblivious to the depth variations of the latent scenes
changing of the environment is not supported. Second, and more importantly, to meet the different goals, only partial and inaccurate recovery of the standard and non-standard cuboid structures is required by our approach. We do not need to accurately reconstruct the geometry of the cuboid structure. Only five edges on the cuboid dominated structure for an input image, including a so-called vertical edge and two pairs of parallel edges (not necessarily of identical length) are enough. This is validated by most of our experimental results. We believe it is challenging to automatically detect the cuboid structures and reconstruct the accurate geometry for most our testing images, such as most interest objects in Figs. 6 and 7 and the indoor images in Fig. 8, by the algorithm in [6]. Furthermore, our approach is competent for viewpoint manipulations for the image with a nonstandard cuboid dominated structure like the input photo shown in Fig. 9. 5.2. Upright adjustment Man-made structures often appear to be slanted in photos taken by casual photographers. An example is shown in the 1st column of Fig. 12. This is partly due to the improper position where the camera is placed at. Human visual system however always expects tall man-made structures to be straight-up. In [3], the slanted structures are dealt with by using an improved homography model. Our algorithm can also be used to straighten up the slanted cuboid structures in images. The idea is to regenerate the image by modifying the viewpoint of the input image properly. The new viewpoint is computed automatically by letting the projected structures to be vertical. A software implementation of [3] is Adobe Lightroom Upright. We thus compare our results to those produced by Lightroom. Fig. 12 shows the results produced by Lightroom (2nd row) and those by our algorithm (3rd row). Our results are generally comparable to those by Lightroom upright. The results of Lightroom show apparent visual artifacts in the results of Taj Mahal and Church photos (see the red ellipses). In [3], the authors assume that depth variations of the scene relative to its distance from the camera are small. The reason for the artifacts may be that such assumption does not hold exactly for the two images. Transforming the input image with a homography is not always sufficient since it is oblivious to the depth variations of the latent scenes. Input Lightroom Ours Fig. 12. Upright adjustment for a photo of the Taj Mahal (left), an indoor image of a temple (middle), and a church photo (right). The 2nd row shows the results by Adobe Lightroom Upright, and the 3rd row shows our results. The results of Lightroom show apparent visual artifacts (see the red ellipses in the 1st and 3rd results). (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) Fig. 13. Images under new viewpoints are interpolated from the original photo and four new images under key viewpoints highlighted with red rectangles. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) Y. Guo et al. / Computers & Graphics 38 (2014) 174–182 181
182 Y.Guo et al.Computers Graphics 38(2014)174-182 allows users to watch the scene under new viewpoints on a viewing sphere interactively. In the current implementation,the user needs to manually specify lines of the latent cuboid structure on the input image, even though Hough transform and Canny detector can be used to assist in this operation.Recent research efforts on localizing 3D cuboids in single-view images may facilitate automation of this process [21,6.In future we plan to explore the possibility that automatic detection and analysis of the cuboid structures are Fig.14.A failure case.The roof of the house is not visible in the left input and its integrated into our framework. corresponding part looks hollow in the right result due to visibility shifting caused by viewpoint change Acknowledgments Viewing sphere:We can generate novel images under key viewpoints around the viewpoint of input image,with given We greatly thank the anonymous reviewers for the positive and viewing angles.This enables us to design an interface through constructive comments.This work was supported in part by the which the user can watch the scene by changing viewpoints National Science Foundation of China under Grants 61073098. smoothly on a viewing sphere,mimicking 3D browsing experi- 61021062,and 61373059,the National Basic Research Program of ence.As shown in Fig.13,the images on the viewing sphere under China(2010CB327903). new viewpoints are interpolated from the original input and the newly rendered images under four key viewpoints.Please refer to References our accompanied files for the live demo. Limitations:Our system creates a partial reconstruction from a [1]Carroll R.Agarwala A.Agrawala M.Image warps for artistic perspective single image.Although visually pleasing results are generated by manipulation.ACM Trans Graph (Siggraph)2010:29(4):127:1-9. our system for a variety of images with standard or non-standard [2]Du S.Hu S.Martin R.Changing perspective in stereoscopic images.IEEE Trans. cuboid structures,its applicability is limited.Furthermore,the new Visualization Comput.Graph 2013;19(8):1288-97. [3]Lee H.Shechtman E.Wang J.Lee S.Automatic upright adjustment of viewpoints are restricted to a certain range around the viewpoint photographs.In:IEEE conference on computer vision and pattern recognition of input image.Changing the viewpoint dramatically may lead to (CVPR):2012.p.877-84. dis-occlusions or holes that need to be filled.Some previously [4]Karsch K.Hedau V.Forsyth D.Hoiem D.Rendering synthetic objects into legacy photographs.ACM Trans Graph 2011:30(6):157:1-12. occluded parts may shift from invisible to visible which ultimately [5]Jiang N.Tan P.Cheong L-F.Symmetric architecture modeling with a single leads to visual artifacts since the single image only provides image.ACM Trans Graph(Siggraph Asia)2009:28(5):113:1-8. incomplete scene information.Fig.14 shows such an example [6]Zheng Y.Chen X.Cheng M-M.Zhou K.Hu S-M,Mitra NJ.Interactive images: cuboid proxies for smart image manipulation.ACM Trans Graph (Siggraph) where the roof of the house is not visible in the input and its 2012:31(4):99:1-99:11 corresponding part looks hollow in the right result.A possible 17]Horry Y.Anjyo KI.Arai K.Tour into the picture:using a spidery mesh interface solution is to incorporate the second input image with a different to make animation from a single image.In:Siggraph:1997. .225-32 viewpoint.This should be an interesting work to be explored in [8]Hoiem D.Efros AA.Hebert M.Automatic photo pop-up.ACM Trans Graph (Siggraph)2005:243):577-84. the future.This example also reveals another limitation of our 9]Igarashi T,Moscovich T.Hughes JF.As-rigid-as-possible shape manipulation. approach.The result is generated initially with non-regular bor- ACM Trans Graph 2005:24(3):1134-41. ders.Cropping it for regular borders inevitably discards some [10]Schaefer S.McPhail T.Warren J.Image deformation using moving least squares.ACM Trans Graph 2006:25(3):533-40. image content near image boundaries,sacrificing image resolu- [11]Liu F.Gleicher M.Automatic image re-targeting with fisheye-view warping.In: tion.This is especially true for those images whose cuboid ACMU1ST.2005.p.153-62 structure occupies nearly the whole image.In addition to that, [12]Gal R.Sorkine O.Cohen-Or D.Feature-aware texturing.In:17th eurographics workshop on rendering:2006.p.297-304. we use line constraint to preserve the shape of those objects that [13]Wang Y-S.Tai C-L Sorkine O.Lee T-Y.Optimized scale-and-stretch for image stand across two faces of the cuboid structures.Our method might resizing.ACM Trans Graph(Siggraph Asia)2008:27(5):118:1-8. not produce optimum results for such cases. [14]Guo Y,Liu F.Shi J.Zhou Z-H.Gleicher M.Image retargeting using mesh parametrization.IEEE Trans Multimedia 2009:11(4):856-67. Besides,the soft constraints in our optimization do not have [15]Jin Y.Liu L Wu Q.Nonhomogeneous scaling optimization for realtime image any 3D information.Perspective may be violated in image regions resizing.Vis Comput(Proc CGI)2010:26(6-8):769-78. that are not reached by the cuboid structure.This limitation is 16]Carroll R.Agrawala M.Agarwala A.Optimizing content-preserving projections for wide-angle images.ACM Trans Graph 2009:28(3):1-9. exposed in Fig.10(right)where the tall building in top-left is [17]Kopf J.Lischinski D.Deussen O.Cohen-Or D.Cohen MF.Locally adapted parallel to the main editing target in the input but such relation is projections to reduce panorama distortions.Comput Graph Forum 2009:28 destroyed in the result. (4):1083-9. [18]Bhattacharya S.Sukthankar R.Shah M.A framework for photo-quality assessment and enhancement based on visual aesthetics.In:ACM multimedia: 2010.p.271-80. 6.Conclusion [19]Chen T.Cheng M-M,Tan P.Shamir A.Hu S-M.Sketch2photo:internet image montage.ACM Trans Graph (Siggraph Asia)2009:28(5):124:1-10. We have presented an algorithm for manipulating the view- [20]Liu L Chen R.Wolf L.Cohen-Or D Optimizing photo composition.Comput Graph Forum (Eurographics)2010:29(2):469-78. points of those cuboid-structured images and generating new [21]Xiao J.Russell B,Torralba A.Localizing 3d cuboids in single-view images.In: images realistically.Our framework creates partial scene recon- Neural information processing systems (NIPS):2012.p.755-63. struction with minimal user interaction and we show that such an [22]Zhang G-X.Cheng M-M.Hu S-M,Martin RR A shape-preserving approach to approximate reconstruction is sufficient to re-render the image image resizing.Comput Graph Forum 2009:28(7):1897-906. [23]Huang Q-X,Mech R.Carr N.Optimizing structure preserving embedded under a new viewpoint,via a triangular mesh deformation deformation for resizing images and vector art.Comput Graph Forum scheme.The mesh deformation energy is optimized efficiently 2009:28(7):1887-96. by solving a sparse linear system.In addition to the generation of [24]Levy B.Petitjean S.Ray N.Maillo t J.Least squares conformal maps for automatic texture atlas generation.ACM Trans Graph (Siggraph)2002:21 images with novel viewpoints,we provide a user interface that (3:362-71
Viewing sphere: We can generate novel images under key viewpoints around the viewpoint of input image, with given viewing angles. This enables us to design an interface through which the user can watch the scene by changing viewpoints smoothly on a viewing sphere, mimicking 3D browsing experience. As shown in Fig. 13, the images on the viewing sphere under new viewpoints are interpolated from the original input and the newly rendered images under four key viewpoints. Please refer to our accompanied files for the live demo. Limitations: Our system creates a partial reconstruction from a single image. Although visually pleasing results are generated by our system for a variety of images with standard or non-standard cuboid structures, its applicability is limited. Furthermore, the new viewpoints are restricted to a certain range around the viewpoint of input image. Changing the viewpoint dramatically may lead to dis-occlusions or holes that need to be filled. Some previously occluded parts may shift from invisible to visible which ultimately leads to visual artifacts since the single image only provides incomplete scene information. Fig. 14 shows such an example where the roof of the house is not visible in the input and its corresponding part looks hollow in the right result. A possible solution is to incorporate the second input image with a different viewpoint. This should be an interesting work to be explored in the future. This example also reveals another limitation of our approach. The result is generated initially with non-regular borders. Cropping it for regular borders inevitably discards some image content near image boundaries, sacrificing image resolution. This is especially true for those images whose cuboid structure occupies nearly the whole image. In addition to that, we use line constraint to preserve the shape of those objects that stand across two faces of the cuboid structures. Our method might not produce optimum results for such cases. Besides, the soft constraints in our optimization do not have any 3D information. Perspective may be violated in image regions that are not reached by the cuboid structure. This limitation is exposed in Fig. 10 (right) where the tall building in top-left is parallel to the main editing target in the input but such relation is destroyed in the result. 6. Conclusion We have presented an algorithm for manipulating the viewpoints of those cuboid-structured images and generating new images realistically. Our framework creates partial scene reconstruction with minimal user interaction and we show that such an approximate reconstruction is sufficient to re-render the image under a new viewpoint, via a triangular mesh deformation scheme. The mesh deformation energy is optimized efficiently by solving a sparse linear system. In addition to the generation of images with novel viewpoints, we provide a user interface that allows users to watch the scene under new viewpoints on a viewing sphere interactively. In the current implementation, the user needs to manually specify lines of the latent cuboid structure on the input image, even though Hough transform and Canny detector can be used to assist in this operation. Recent research efforts on localizing 3D cuboids in single-view images may facilitate automation of this process [21,6]. In future we plan to explore the possibility that automatic detection and analysis of the cuboid structures are integrated into our framework. Acknowledgments We greatly thank the anonymous reviewers for the positive and constructive comments. This work was supported in part by the National Science Foundation of China under Grants 61073098, 61021062, and 61373059, the National Basic Research Program of China (2010CB327903). References [1] Carroll R, Agarwala A, Agrawala M. Image warps for artistic perspective manipulation. ACM Trans Graph (Siggraph) 2010;29(4):127:1–9. [2] Du S, Hu S, Martin R. Changing perspective in stereoscopic images. IEEE Trans. Visualization Comput. Graph 2013;19(8):1288–97. [3] Lee H, Shechtman E, Wang J, Lee S. Automatic upright adjustment of photographs. In: IEEE conference on computer vision and pattern recognition (CVPR); 2012. p. 877–84. [4] Karsch K, Hedau V, Forsyth D, Hoiem D. Rendering synthetic objects into legacy photographs. ACM Trans Graph 2011;30(6):157:1–12. [5] Jiang N, Tan P, Cheong L-F. Symmetric architecture modeling with a single image. ACM Trans Graph (Siggraph Asia) 2009;28(5):113:1–8. [6] Zheng Y, Chen X, Cheng M-M, Zhou K, Hu S-M, Mitra NJ. Interactive images: cuboid proxies for smart image manipulation. ACM Trans Graph (Siggraph) 2012;31(4):99:1–99:11. [7] Horry Y, Anjyo KI, Arai K. Tour into the picture: using a spidery mesh interface to make animation from a single image. In: Siggraph; 1997. p. 225–32. [8] Hoiem D, Efros AA, Hebert M. Automatic photo pop-up. ACM Trans Graph (Siggraph) 2005;24(3):577–84. [9] Igarashi T, Moscovich T, Hughes JF. As-rigid-as-possible shape manipulation. ACM Trans Graph 2005;24(3):1134–41. [10] Schaefer S, McPhail T, Warren J. Image deformation using moving least squares. ACM Trans Graph 2006;25(3):533–40. [11] Liu F, Gleicher M. Automatic image re-targeting with fisheye-view warping. In: ACM UIST; 2005. p. 153–62. [12] Gal R, Sorkine O, Cohen-Or D. Feature-aware texturing. In: 17th eurographics workshop on rendering; 2006. p. 297–304. [13] Wang Y-S, Tai C-L, Sorkine O, Lee T-Y. Optimized scale-and-stretch for image resizing. ACM Trans Graph (Siggraph Asia) 2008;27(5):118:1–8. [14] Guo Y, Liu F, Shi J, Zhou Z-H, Gleicher M. Image retargeting using mesh parametrization. IEEE Trans Multimedia 2009;11(4):856–67. [15] Jin Y, Liu L, Wu Q. Nonhomogeneous scaling optimization for realtime image resizing. Vis Comput (Proc CGI) 2010;26(6–8):769–78. [16] Carroll R, Agrawala M, Agarwala A. Optimizing content-preserving projections for wide-angle images. ACM Trans Graph 2009;28(3):1–9. [17] Kopf J, Lischinski D, Deussen O, Cohen-Or D, Cohen MF. Locally adapted projections to reduce panorama distortions. Comput Graph Forum 2009;28 (4):1083–9. [18] Bhattacharya S, Sukthankar R, Shah M. A framework for photo-quality assessment and enhancement based on visual aesthetics. In: ACM multimedia; 2010. p. 271–80. [19] Chen T, Cheng M-M, Tan P, Shamir A, Hu S-M. Sketch2photo: internet image montage. ACM Trans Graph (Siggraph Asia) 2009;28(5):124:1–10. [20] Liu L, Chen R, Wolf L, Cohen-Or D. Optimizing photo composition. Comput Graph Forum (Eurographics) 2010;29(2):469–78. [21] Xiao J, Russell B, Torralba A. Localizing 3d cuboids in single-view images. In: Neural information processing systems (NIPS); 2012. p. 755–63. [22] Zhang G-X, Cheng M-M, Hu S-M, Martin RR. A shape-preserving approach to image resizing. Comput Graph Forum 2009;28(7):1897–906. [23] Huang Q-X, Mech R, Carr N. Optimizing structure preserving embedded deformation for resizing images and vector art. Comput Graph Forum 2009;28(7):1887–96. [24] Lévy B, Petitjean S, Ray N, Maillo t J. Least squares conformal maps for automatic texture atlas generation. ACM Trans Graph (Siggraph) 2002;21 (3):362–71. Fig. 14. A failure case. The roof of the house is not visible in the left input and its corresponding part looks hollow in the right result due to visibility shifting caused by viewpoint change. 182 Y. Guo et al. / Computers & Graphics 38 (2014) 174–182