are encoded from 0 for black to 255 for white). II(t).Since ts()is estimated at sub-frame accuracy We do not apply any spatial filtering on the im- the final planeΠ(ts(元c))actually results from linear ages;that would generate undesired blending in the interpolation between the two planes nI(to-1)and final depth estimates,especially noticeable at depth II(to)if to-1<ts(e)<to and to integer.Once the discontinuities (at occlusions for example).However, range data are recovered,a mesh may be generated by it would be acceptable to low-pass filter the brightness connecting neighboring points in triangles.Rendered profiles of the top and bottom rows (there is no depth views of three reconstructed surface structures can be discontinuity on the tabletop)and low-pass filter the seen in figures 6,7 and 8. temporal brightness profiles at every pixel.These op- erations would preserve sharp spatial discontinuities and might decrease the effect of local processing noise by accounting for smoothness in the motion of the 3 Noise Sensitivity stick. Experimentally,we found that this thresholding ap- The overall scheme is based on first extracting from every frame (i.e.every time instants t)the z coordi- proach for shadow edge detection allow for some inter- nal reflections in the scene [9,8,14].However,if the nates of the two reference points top(t)and Tbot() and second estimating the shadow time t)at ev- light source is not close to an ideal point source,the ery pixel Those input data are used to estimate mean value between maximum and minimum bright- ness may not always constitute the optimal value for the depth Ze at every pixel.The purpose of the noise sensitivity analysis is to quantify the effect of the noise the threshold image Ishadow.Indeed,the shadow edge in the measurement data top(t),bot(t),ts(e))on profile becomes shallower as the distance between the the final reconstructed scene depth map.One key step stick and the surface increases.In addition,it deforms in the analysis is to transfer the noise affecting the asymmetrically as the surface normal-changes.These shadow time te)into a scalar noise affecting the effects could make the task of detecting the shadow x coordinate of e after scaling by the local shadow boundary points challenging.In the future,we in- tend to develop a geometrical model of extended light speed on the image at that pixel.Let V be the vol- ume of the parallelepiped formed by the three vectors sources and incorporate it in the system. OA,OB and OS,originating at O (see figure 2): Although Imin and Imax are needed to compute Ishadow,there exists an implementation of that al- gorithm that does not require storage of the com- V=xg.{区B-Xs)×XA-Xs)} plete image sequence in memory and therefore leads itself to real-time implementations.All that one needs to do is update at each frame five different arrays where Xs =[Xs Ys Zs]T,XA =[XA YA ZA]T and Imax(x,y),Imin(x,,Icontrast(x,y),/shadow(r,y)and XB =XB YB ZBl are the coordinate vectors of the shadow time ts(,y),as the images /(,y,t)are S,A and B in the camera reference frame (x is the acquired. For a given pixel (r,y),the maximum standard outer product operator).Notice that v is brightness Imax(,y)is collected at the very begin- computed at the triangulation stage,and therefore is ning of the sequence (the first frame),and then,as always available (see [3]).Define Xe=[XeYe Ze]T as time goes,the incoming images are used to update the coordinate vector in the camera reference frame the minimum brightness Imin(,y)and the contrast Icontrast(,y).Once Icontrast(,y)crosses Ithresh,the of the point in space corresponding to Te.Assume adaptive threshold Ishadow(,y)starts being computed that the x coordinates of the top and bottom reference points (after normalization)are affected by additive and updated at every frame (and activated).This pro- white Gaussian noise with zero mean and variances cess goes on until the pixel brightness I(,y,t)crosses Ishadow(,y)for the first time (in the upwards direc- of and of respectively.Assume in addition that the variance on the a coordinate of is o (different at tion).That time instant is registered as the shadow every pixel).The following expression for the variance time ta(,y).In that form of implementation,the left of the induced noise on the depth estimate 2 was edge of the shadow is tracked instead of the right one, derived by taking first order derivatives of Ze with however the principle remains the same. respect to the new'noisy input variables ceop,bot 2.3 Triangulation and(notice that the time variable does not appear Once the shadow time te(c)is estimated at a given any longer in the analysis): pixel one can identify the corresponding shadow plane II(ts(e)).Then,the 3D point P associated to is retrieved by intersecting II(ta())with the opti- O'te V2 W2hZo2.+(a1+3Y。+1Z)2a2+ cal ray (Oc,e)(see figure 2).Notice that the shadow time te(e)acts as an index to the shadow plane list (a2+.+2Z。)2} (1 46are encoded from 0 for black to 255 for white). We do not apply any spatial filtering on the images; that would generate undesired blending in the final depth estimates, especially noticeable at depth discontinuities (at occlusions for example). However, it would be acceptable to low-pass filter the brightness prafiles of the top and bottom rows (there is no depth discontinuity on the tabletop) and low-pass filter the temporal brightness profiles at every pixel. These operations would preserve sharp spatial discontinuities, and might decrease the effect of local processing noise by accounting for smoothness in the motion of the stick. Experimentally, we found that this thresholding approach for shadow edge detection allow for some internal reflections in the scene [9, 8, 141. However, if the light source is not close to an ideal point source, the mean value between maximum and minimum brightness may not always constitute the optimal value for the threshold image Ishadow. Indeed, the shadow edge profile becomes shallower as the distance between the stick and the surface increases. In addition, it deforms asymmetrically as the surface normal-changes. These effects could make the task of detecting the shadow boundary points challenging. In the future, we intend to develop a geometrical model of extended light sources and incorporate it in the system. Although I,,, and Imax are needed to compute Ishadow, there exists an implementation of that algorithm that does not require storage of the complete image sequence in memory and therefore leads itself to real-time implementations. All that one needs to do is update at each frame five different arrays Imax(IC,Y), Imin(z,y), Icontrast(z,Y), Ishadow(Z,!/) and the shadow time ts(z,y), as the images I(z,y,t) are acquired. For a given pixel (z,y), the maximum brightness Imax(2,y) is collected at the very beginning of the sequence (the first frame), and then, as time goes, the incoming images are used to update the minimum brightness Imin(z, y) and the contrast Icontrast(2, Y). Once lcontrast(x, y) crosses Ithresh, the adaptive threshold Ishadow (z, y) starts being computed and updated at every frame (and activated). This process goes on until the pixel brightness I(z, y, t) crosses Ishadow(~,y) for the first time (in the upwards direction). That time instant is registered as the shadow time ts(z,y). In that form of implementation, the left edge of the shadow is tracked instead of the right one, however the principle remains the same. 2.3 Triangulation Once the shadow time ts(T,) is estimated at a given pixel Z,, one can identify the corresponding shadow plane II(ts(3,)). Then, the 3D point P associated to ?Ec is retrieved by intersecting II(t8(Zc)) with the optical ray (O,,Z,) (see figure 2). Notice that the shadow time ts(Ec) acts a,s an index to the shadow plane list II(t). Since ti@,) is estimated at sub-frame accuracy, the final plane II(ts(Tc)) actually results from linear interpolation between the two planes II(t0 - 1) and n(t0) if to - 1 < ts(Z,) < to and to integer. Once the range data are recovered, a mesh may be generated by connecting neighboring points in triangles. Rendered views of three reconstructed surface structures can be seen in figures 6, 7 and 8. 3 Noise Sensitivity The overall scheme is based on first extracting from every frame (i.e. every time instants t) the z coordinates of the two reference points ztop(t) and xbot(t), and second estimating the shadow time ts(Ec) at every pixel I,. Those input data are used to estimate the depth 2, at every pixel. The purpose of the noise sensitivity analysis is to quantify the effect of the noise in the measurement data {xto,(t),xbot(t),ts(Zc))} on the final reconstructed scene depth map. One key step in the analysis is to transfer the noise affecting the shadow time ts(Ec) into a scalar noise affecting the IC coordinate of Tc after scaling by the local shadow speed on the image at that pixel. Let V be the vol- ____ ume of the parallelepiped formed by the three vectors OCA, O,B and m, originating at 0, (see figure 2): - where xs = [Xs Us ZsIT, -jf~ = [XA 12 Z,4IT and Xs = [X, YB ZgIT are the coordinate vectors of S, A and B in the camera reference frame (x is the standard outer product operator). Notice that T/’ is computed at the triangulation stge, and therefore is always available (see [3]). Define X, = [X, U, 2,IT as the coordinate vector in the camera reference frame of the point in space corresponding to Fc. Assume that the z coordinates of the top and bottom reference points (after normalization) are affected by additive white Gaussian noise with zero mean and variances o? and 02 respectively. Assume in addition that the variance on the IC coordinate of 5, is o:o (different at every pixel). The following expression for the variance uiC of the induced noise on the depth estimate 2, was derived by taking first order derivatives of 2, with respect to the ‘new’ noisy input variables ztop, 2bot and Z, (notice that the time variable does not appear any longer in the analysis): 46