This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing 4000 300 250 3000 6 2000 150 100 1000 50 150 200 0 150-100 50 050100150 40 Depth (cm) The horizontal coordinate:x(cm) (a)Depth histogram of multiple objects (b)Depth of objects in different horizontal lines (c)Depth histogram of the same object at differ- ent distances Fig.3.Experiment results of depth value Therefore,the depth camera can effectively estimate the To extract the depth of specified objects from the depth distance to a specified object according to the depth,because histogram of multiple objects,we set a threshold t to detect the depth is linearly increasing with the distance.If multiple the peaks in regard to the number of pixels.We thus iterate objects are placed at different positions in the scene,they are from the minimum depth to the maximum depth in the usually at different distances away from the depth camera. histogram,if the number of pixels for a certain depth is Therefore,it is possible to distinguish among different ob- larger than t,we identify it as a peak p(di,ni)with the jects according to the depth values from the depth camera. depth di and the number of pixels ni.It is found that for an In order to understand the characteristics of the depth in- irregularly-shaped object,the depth sensor usually detects formation collected from the depth camera,we conduct real multiple peaks with intermittent depths.In order to address experiments to obtain more observations.We first conduct the multiple-peaks problem of irregularly-shaped objects, an experiment to evaluate the characteristics of the depth. we set another threshold Ad.If the differences of these Without loss of generality,each experiment observation is peaks'depth values are smaller than Ad,we then combine summarized from the statistic properties of 100 repeatable them as one peak.Both the value of t and Ad are selected observations.We arbitrarily place three objects A,B,and C based on the empirical value from a number of experimental in front of the depth camera,i.e.,Microsoft Kinect,object A studies(t=200 and Ad=10cm in our implementation).Then, is a box at distance 68cm,object B is a can at distance 95cm, each peak actually represents a specified object.For each and object C is a tripod at distance 150cm.We then collect peak,we respectively find the leftmost depth d and the the depth histogram from the depth sensor.As shown in rightmost depth dr with the number of pixels nr>0.We Fig.3(a),the X-axis denotes the depth value,and the y- then compute the average depth for the specified object axis denotes the number of pixels at the specified depth. as follows:d(d).The average depth is We find that,as A and B are regular-shaped objects,there calculated in a weighted average approach according to the are respective peaks in the depth histogram for objects A number of pixels for each depth around the peak. and B,meaning that many pixels are detected from this Moreover,in Fig.3(a),we also find some background distance.Therefore,A and B can be easily distinguished noises past the distance of 175 cm,which are produced according to the distance.However,there exist two peaks by background objects,such as the wall and floor.To ad- in the corresponding distance of object C,because object dress the background noise problem,we note that these C is an irregularly-shaped object (the concave shape of background noises always lead to a continuous range of the tripod),there might be a number of pixels at different depth value,with a very close amount of pixels in the depth distances.This implies that,for the object with a continuous histogram.Therefore,we can use a specified pattern to surface,the depth sensor usually detects a peak in the detect and eliminate this range of depth values.Specifically, vicinity of its distance,for an irregularly-shaped object, we respectively set a threshold tr for the length of the the depth sensor detects multiple peaks with intermittent continuous range and a threshold tp for the number of pixels depths.Nevertheless,we find that these peaks are usually corresponding to each depth(t=50cm and tp=500 in our very close in distance.If multiple objects are placed with implementation).Then,for a certain range of depth value in a rather close proximity,it may increase the difficulty to the depth histogram,if the range is greater than t and the distinguish these objects. number of pixels for each depth value is greater than tp,we In order to further validate the relationship between the can determine this range as background noise. depth and distance,we set multiple horizontal lines with The effective scanning distance of the depth camera is different distances to the Kinect(from 500 mm to 2500 mm). very important to the potential range of AR applications, For each horizontal line,we then move a certain object along otherwise the potential application scenario should be very the line and respectively obtain the depth value from the limited.In fact,the effective scanning distance of the depth Kinect.We show the experiment results in Fig.3(b).Here camera,such as Kinect,can be as far as 475cm.To validate we find that,for each horizontal line,the depth values of the that,we perform a set of experiments in regard to the object keep nearly constant,with rather small deviations;for effective scanning distance of the depth camera,e.g.,Kinect. different horizontal lines,these depth values have obvious We deploy a cardboard of size 20cmx20cmx5cm on the top variations.Due to the limitation of the Kinect's view,the of a tripod,and evaluate the corresponding depth histogram Kinect has a smaller view angle in a closer distance.This when the cardboard is separated from the depth camera observation implies that,the depth value collected from the (i.e.,Kinect)with the distance of 50cm,150cm,300cm and depth cameras depicts the vertical distance rather than the 450cm,respectively.We plot the experiment results in Fig absolute distance between the objects and the depth camera. 3(c).Note that,when the object is deployed at different 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 4 Depth (cm) 100 150 200 250 Number of pixels 0 1000 2000 3000 4000 Depth value:x(cm) 140 145 150 155 160 Number of pixels 0 100 200 300 400 500 Background Objects A B C (a) Depth histogram of multiple objects The horizontal coordinate: x (cm) -150 -100 -50 0 50 100 150 Depth(cm) 0 50 100 150 200 250 300 (b) Depth of objects in different horizontal lines # of pixels 0 50 100 150 200 250 300 350 400 450 500 ×105 0 0.5 1 1.5 2 # of pixels 0 50 100 150 200 250 300 350 400 450 500 ×105 0 0.5 1 1.5 2 # of pixels 0 50 100 150 200 250 300 350 400 450 500 ×105 0 0.5 1 1.5 2 Depth(cm) # of pixels 0 50 100 150 200 250 300 350 400 450 500 ×105 0 0.5 1 1.5 2 (c) Depth histogram of the same object at different distances Fig. 3. Experiment results of depth value Therefore, the depth camera can effectively estimate the distance to a specified object according to the depth, because the depth is linearly increasing with the distance. If multiple objects are placed at different positions in the scene, they are usually at different distances away from the depth camera. Therefore, it is possible to distinguish among different objects according to the depth values from the depth camera. In order to understand the characteristics of the depth information collected from the depth camera, we conduct real experiments to obtain more observations. We first conduct an experiment to evaluate the characteristics of the depth. Without loss of generality, each experiment observation is summarized from the statistic properties of 100 repeatable observations. We arbitrarily place three objects A, B, and C in front of the depth camera, i.e., Microsoft Kinect, object A is a box at distance 68cm, object B is a can at distance 95cm, and object C is a tripod at distance 150cm. We then collect the depth histogram from the depth sensor. As shown in Fig. 3(a), the X-axis denotes the depth value, and the Y - axis denotes the number of pixels at the specified depth. We find that, as A and B are regular-shaped objects, there are respective peaks in the depth histogram for objects A and B, meaning that many pixels are detected from this distance. Therefore, A and B can be easily distinguished according to the distance. However, there exist two peaks in the corresponding distance of object C, because object C is an irregularly-shaped object (the concave shape of the tripod), there might be a number of pixels at different distances. This implies that, for the object with a continuous surface, the depth sensor usually detects a peak in the vicinity of its distance, for an irregularly-shaped object, the depth sensor detects multiple peaks with intermittent depths. Nevertheless, we find that these peaks are usually very close in distance. If multiple objects are placed with a rather close proximity, it may increase the difficulty to distinguish these objects. In order to further validate the relationship between the depth and distance, we set multiple horizontal lines with different distances to the Kinect (from 500 mm to 2500 mm). For each horizontal line, we then move a certain object along the line and respectively obtain the depth value from the Kinect. We show the experiment results in Fig. 3(b). Here we find that, for each horizontal line, the depth values of the object keep nearly constant, with rather small deviations; for different horizontal lines, these depth values have obvious variations. Due to the limitation of the Kinect’s view, the Kinect has a smaller view angle in a closer distance. This observation implies that, the depth value collected from the depth cameras depicts the vertical distance rather than the absolute distance between the objects and the depth camera. To extract the depth of specified objects from the depth histogram of multiple objects, we set a threshold t to detect the peaks in regard to the number of pixels. We thus iterate from the minimum depth to the maximum depth in the histogram, if the number of pixels for a certain depth is larger than t, we identify it as a peak p(di , ni) with the depth di and the number of pixels ni . It is found that for an irregularly-shaped object, the depth sensor usually detects multiple peaks with intermittent depths. In order to address the multiple-peaks problem of irregularly-shaped objects, we set another threshold ∆d. If the differences of these peaks’ depth values are smaller than ∆d, we then combine them as one peak. Both the value of t and ∆d are selected based on the empirical value from a number of experimental studies (t=200 and ∆d=10cm in our implementation). Then, each peak actually represents a specified object. For each peak, we respectively find the leftmost depth dl and the rightmost depth dr with the number of pixels nr > 0. We then compute the average depth for the specified object as follows: d = Pr i=l (di × P ni r i=l ni ). The average depth is calculated in a weighted average approach according to the number of pixels for each depth around the peak. Moreover, in Fig. 3(a), we also find some background noises past the distance of 175 cm, which are produced by background objects, such as the wall and floor. To address the background noise problem, we note that these background noises always lead to a continuous range of depth value, with a very close amount of pixels in the depth histogram. Therefore, we can use a specified pattern to detect and eliminate this range of depth values. Specifically, we respectively set a threshold tl for the length of the continuous range and a threshold tp for the number of pixels corresponding to each depth (tl=50cm and tp=500 in our implementation). Then, for a certain range of depth value in the depth histogram, if the range is greater than tl and the number of pixels for each depth value is greater than tp, we can determine this range as background noise. The effective scanning distance of the depth camera is very important to the potential range of AR applications, otherwise the potential application scenario should be very limited. In fact, the effective scanning distance of the depth camera, such as Kinect, can be as far as 475cm. To validate that, we perform a set of experiments in regard to the effective scanning distance of the depth camera, e.g., Kinect. We deploy a cardboard of size 20cm×20cm×5cm on the top of a tripod, and evaluate the corresponding depth histogram when the cardboard is separated from the depth camera (i.e., Kinect) with the distance of 50cm, 150cm, 300cm and 450cm, respectively. We plot the experiment results in Fig. 3(c). Note that, when the object is deployed at different