计算机科学与技术（参考文献）Tell Me What I See - Recognize RFID Tagged Objects in Augmented Reality Systems

团购合买资源类别：文库，文档格式：PDF，文档页数：12，文件大小：7.34MB

Tell Me What I See:Recognize RFID Tagged Objects in Augmented Reality Systems Lei Xief,Jiangiang Sun',Qingliang Cai',Chuyu Wang',Jie Wu,and Sanglu Lu' State Key Laboratory of Novel Software Technology,Nanjing University,China *Department of Computer Information and Sciences,Temple University,USA (lxie,sanglu)@nju.edu.cn,{SunJQ,caiqingliang,wangcyu217)@dislab.nju.edu.cn,jiewu@temple.edu ABSTRACT Nowadays,people usually depend on augmented reality(AR) systems to obtain an augmented view in a real-world environ- ment.With the help of advanced AR technology (e.g.object recognition),users can effectively distinguish multiple objects of different types.However,these techniques can only offer limited degrees of distinctions among different objects and cannot provide more inherent information about these objects. In this paper,we leverage RFID technology to further label different objects with RFID tags.We deploy additional RFID antennas to the COTS depth camera and propose a continuous scanning-based scheme to scan the objects,i.e.,the system Figure 1.Tell me what I see from the augmented reality system continuously rotates and samples the depth of field and RF- signals from these tagged objects.In this way,by pairing the conduct object recognition based on pattern recognition tech- tags with the objects according to the correlations between nology.Therefore,users can effectively distinguish multiple the depth of field and RF-signals,we can accurately identify objects of different categories,e.g.,a specified object in the and distinguish multiple tagged objects to realize the vision camera can be recognized as a vase,a laptop,or a pillow of"tell me what I see"from the augmented reality system. based on its natural features.However,these techniques can For example,in front of multiple unknown people wearing only offer a limited degree of distinctions among different RFID tagged badges in public events,our system can iden- objects,since multiple objects of the same type may have tify these people and further show their inherent information very similar features,e.g.,the system cannot effectively dis- from the RFID tags,such as their names,jobs,titles,etc.We tinguish between two laptops of the same brand,even if they have implemented a prototype system to evaluate the actual are owned by different people.Moreover,they cannot provide performance.The experiment results show that our solution more inherent information about these objects,e.g.,the spe- achieves an average match ratio of 91%in distinguishing up cific configurations,the manufacturers,and production date to dozens of tagged objects with a high deployment density. of the laptop.Therefore,it is rather difficult to provide these ACM Classification Keywords functions by purely leveraging the AR technology. H.5.m.Information Interfaces and Presentation (e.g.HCD): Miscellaneous Fortunately,the rise of RFID technology has brought new op- portunities to meet the new demands [27,31.15].The RFID Author Keywords tags can be used to label different objects,and store inherent RFID;Augmented Reality System;Prototype Design information of these objects in their onboard memory.More- over,in comparison to the optical markers such as QR code, INTRODUCTION the COTS RFID tag has an onboard memory with up to 4K or As the proliferation of augmented reality technology,people 8K bytes,and it can be effectively identified even if it is hidden nowadays start to leverage augmented reality (AR)systems (e.g.Microsoft Kinect,Google Glass)to obtain an augmented in/under the object.This provides us with an opportunity to effectively distinguish these objects,even if they belong to the view in a real-world environment.For example,devices like same brand and have the same features of appearance.Figure the Microsoft Kinect [13],i.e.,a depth camera,can effectively 1 shows a typical application scenario of the above vision.In this scenario,multiple people are standing or sitting together Permission to make digital or hard copies of all or part of this work for personal or in the cafe,while they are wearing the RFID tagged badges. classroom use is granted without fee provided that copies are not made or distributed From the camera's view,the depth camera can recognize mul- tiple objects,or rather human subjects,as well as the depth from its embedded depth sensor,which is associated with the must be honored.Abstracting with credit is permitted.To copy otherwise,or republish, distance to the camera.The RFID reader can identify multiple fee.Request permissions from Permissions@acm.org. tags within the scanning range,moreover,it is able to extract UbiComp'16.September 12-16,2016.Heidelberg.Germany 2016ACM.1SBN978-1-4503-4461-6/16/09.S15.00 the signal features like the received signal strength(RSSD) D0 http/dk.doi.org/10.1145/2971648.2971661

Tell Me What I See: Recognize RFID Tagged Objects in Augmented Reality Systems Lei Xie† , Jianqiang Sun† , Qingliang Cai† , Chuyu Wang† , Jie Wu‡ , and Sanglu Lu† †State Key Laboratory of Novel Software Technology, Nanjing University, China ‡Department of Computer Information and Sciences, Temple University, USA {lxie,sanglu}@nju.edu.cn, {SunJQ,caiqingliang,wangcyu217}@dislab.nju.edu.cn, jiewu@temple.edu ABSTRACT Nowadays, people usually depend on augmented reality (AR) systems to obtain an augmented view in a real-world environment. With the help of advanced AR technology (e.g. object recognition), users can effectively distinguish multiple objects of different types. However, these techniques can only offer limited degrees of distinctions among different objects and cannot provide more inherent information about these objects. In this paper, we leverage RFID technology to further label different objects with RFID tags. We deploy additional RFID antennas to the COTS depth camera and propose a continuous scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RFsignals from these tagged objects. In this way, by pairing the tags with the objects according to the correlations between the depth of field and RF-signals, we can accurately identify and distinguish multiple tagged objects to realize the vision of “tell me what I see” from the augmented reality system. For example, in front of multiple unknown people wearing RFID tagged badges in public events, our system can identify these people and further show their inherent information from the RFID tags, such as their names, jobs, titles, etc. We have implemented a prototype system to evaluate the actual performance. The experiment results show that our solution achieves an average match ratio of 91% in distinguishing up to dozens of tagged objects with a high deployment density. ACM Classification Keywords H.5.m. Information Interfaces and Presentation (e.g. HCI): Miscellaneous Author Keywords RFID; Augmented Reality System; Prototype Design INTRODUCTION As the proliferation of augmented reality technology, people nowadays start to leverage augmented reality (AR) systems (e.g. Microsoft Kinect, Google Glass) to obtain an augmented view in a real-world environment. For example, devices like the Microsoft Kinect [13], i.e., a depth camera, can effectively Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. UbiComp ’16, September 12-16, 2016, Heidelberg, Germany ©2016 ACM. ISBN 978-1-4503-4461-6/16/09...$15.00 DOI: http://dx.doi.org/10.1145/2971648.2971661 Figure 1. Tell me what I see from the augmented reality system conduct object recognition based on pattern recognition technology. Therefore, users can effectively distinguish multiple objects of different categories, e.g., a specified object in the camera can be recognized as a vase, a laptop, or a pillow based on its natural features. However, these techniques can only offer a limited degree of distinctions among different objects, since multiple objects of the same type may have very similar features, e.g., the system cannot effectively distinguish between two laptops of the same brand, even if they are owned by different people. Moreover, they cannot provide more inherent information about these objects, e.g., the specific configurations, the manufacturers, and production date of the laptop. Therefore, it is rather difficult to provide these functions by purely leveraging the AR technology. Fortunately, the rise of RFID technology has brought new opportunities to meet the new demands [27, 31, 15]. The RFID tags can be used to label different objects, and store inherent information of these objects in their onboard memory. Moreover, in comparison to the optical markers such as QR code, the COTS RFID tag has an onboard memory with up to 4K or 8K bytes, and it can be effectively identified even if it is hidden in/under the object. This provides us with an opportunity to effectively distinguish these objects, even if they belong to the same brand and have the same features of appearance. Figure 1 shows a typical application scenario of the above vision. In this scenario, multiple people are standing or sitting together in the cafe, while they are wearing the RFID tagged badges. From the camera’s view, the depth camera can recognize multiple objects, or rather human subjects, as well as the depth from its embedded depth sensor, which is associated with the distance to the camera. The RFID reader can identify multiple tags within the scanning range, moreover, it is able to extract the signal features like the received signal strength (RSSI)

and phase from the RFID tags.By effectively pairing these between the depth of field and RF-signals from the tagged information together,the system can realize the vision of"tell objects.We thus propose continuous scanning-based solutions me what I see from the augmented reality system".For exam- and respectively leverage the RSSI and phase value from RF- ple,as shown in Figure 1,the inherent information extracted signals to accurately distinguish the multiple tagged objects. from the RFID tags,such as their names,jobs and titles can 3)We implemented a prototype system and evaluated the ac- be directly associated with the corresponding human subjects tual performance with case studies.Our solution achieves an in the camera's view.This provides us more opportunities to average match ratio of 91%in distinguishing up to dozens of communicate with unknown people by leveraging this novel RFID tagged objects with a high deployment density. RFID assisted augmented reality. RELATED WORK Although many schemes for RFID-based localization [32.28. Depth camera-based pattern recognition:Depth camera- 34]have been proposed,they mainly focus on the absolute based pattern recognition aims at using the depth and RGB object localization,and usually require anchor nodes like ref- captured from the camera to recognize objects in a more ac- erence tags for accurate localization.They are not suitable curate approach.Based on the depth processing [11,18],a for distinguishing multiple tagged objects because of two rea- number of technologies are proposed in object recognition [23] sons.First,we only require to distinguish the relative location and gesture recognition [5,21,8,30,22].Nirjon et al.solve instead of absolute location of multiple tagged objects,by pair- the problem of localizing and tracking household objects us- ing the tags to the objects based on the correlation between ing depth-camera sensors [20].Kinect-based pose estimation the depth of field and RF-signals.Second,the depth camera method [21]is proposed in the context of physical exercise. cannot effectively use the anchor nodes,and it is impractical to examining the accuracy of joint localization and robustness of deploy multiple anchor nodes in conventional AR applications. pose estimation with respect to the orientation and occlusions. In this paper,we leverage the RFID technology [33,16]to RFID in Ubiquitous Applications:RFID has been investi- further label different objects with RFID tags.We deploy addi- gated in various ubiquitous applications,including indoor tional RFID antennas to the COTS depth camera and propose localization [34,24],activity sensing [2],tabletop inter- a continuous scanning-based scheme to scan the objects,i.e.. action[9],physical object search [19],etc.Prior work on the system continuously rotates and samples the depth of field RFID-based localization primarily relied on Received Signal and RF-signals from these tagged objects.In this way,we can Strength [34,24]or Angle of Arrival [1 to acquire the abso- accurately identify and distinguish multiple tagged objects,by lute location of an object.The state-of-the-art systems use the sufficiently exploring the inherent correlations between the phase value to estimate the absolute or relative location of an depth of field and the received RF-signal.Specifically,we object with higher accuracy [33,27,17,25].RF-IDraw uses a respectively extract the RSSI and phase value from RF-signal, 2-dimensional array of RFID antennas to track the movement and pair the tags with the objects according to the correlation trajectory of one finger attached with an RFID tag so that between the depth value and RSSI/phase value. it can reconstruct the trajectory shape of the specified finger 29.Tagoram exploits tag mobility to build a virtual antenna However,there are several challenges in distinguishing mul- array,and uses differential augmented hologram to facilitate tiple tagged objects in AR systems.The first challenge is to the instant tracking of a mobile RFID tag [32].Find My Stuff conduct accurate paring between the objects and the tags.In (FiMS)provides search support for physical objects inside real applications,the tagged objects are usually placed in very furniture,on room level,and in multiple locations [19] close proximity,and the number of objects is usually in the order of dozens.In this situation,it is difficult to realize accu- Combined use in augmented reality environment:Recent rate paring due to the large cardinality and mutual interference. works further consider using both depth camera and RFID The second challenge is to mitigate the interference from the is- for indoor localization and object recognition in augmented sues like the multi-path effect and object occlusion in realistic reality environment [26,14,6,3].Wang et al.propose an settings.These issues can lead to nonnegligible interference indoor real-time location system combined with active RFID to pair the tags with the objects,such as the missing tags or and Kinect by leveraging the positioning feature of identified objects which fail to be identified.The third challenge is in RFID and the object extraction ability of Kinect.Klompmaker devising an efficient solution without any additional assistance, et al.use RFID and depth-sensing cameras to enable person- like the anchor nodes.It is impractical to intentionally deploy alized authenticated tangible interactions on a tabletop [14] anchor nodes in real applications due to intensive deployment Galatas et al.propose a multimodal context-aware localization costs on manpower and time system,by using RFID and 3-D audio-visual information from 2 Kinect sensors deployed at various locations [6].Cerrada This paper represents the first study of using RFID technology et al.present a method to improve the object recognition by to precisely distinguish multiple objects in augmented reality combining the vision-based techniques applied to the range- systems.Specifically,we make three key contributions in this sensor captured 3D data,and object identification obtained paper.1)To the best of our knowledge,we are the first to con- from RFID tags [3]. sider identifying and distinguishing multiple tagged objects with RFID systems,it provides a key supporting technology SYSTEM OVERVIEW for the augmented reality systems to realize the vision"tell me what I see from the AR system".2)We conduct an exten- Design Goals We aim to implement a supporting technology for the AR sive experimental study to explore the inherent correlations systems to realize the vision of"tell me what I see from the

and phase from the RFID tags. By effectively pairing these information together, the system can realize the vision of “tell me what I see from the augmented reality system”. For example, as shown in Figure 1, the inherent information extracted from the RFID tags, such as their names, jobs and titles can be directly associated with the corresponding human subjects in the camera’s view. This provides us more opportunities to communicate with unknown people by leveraging this novel RFID assisted augmented reality. Although many schemes for RFID-based localization [32, 28, 34] have been proposed, they mainly focus on the absolute object localization, and usually require anchor nodes like reference tags for accurate localization. They are not suitable for distinguishing multiple tagged objects because of two reasons. First, we only require to distinguish the relative location instead of absolute location of multiple tagged objects, by pairing the tags to the objects based on the correlation between the depth of field and RF-signals. Second, the depth camera cannot effectively use the anchor nodes, and it is impractical to deploy multiple anchor nodes in conventional AR applications. In this paper, we leverage the RFID technology [33, 16] to further label different objects with RFID tags. We deploy additional RFID antennas to the COTS depth camera and propose a continuous scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RF-signals from these tagged objects. In this way, we can accurately identify and distinguish multiple tagged objects, by sufficiently exploring the inherent correlations between the depth of field and the received RF-signal. Specifically, we respectively extract the RSSI and phase value from RF-signal, and pair the tags with the objects according to the correlation between the depth value and RSSI/phase value. However, there are several challenges in distinguishing multiple tagged objects in AR systems. The first challenge is to conduct accurate paring between the objects and the tags. In real applications, the tagged objects are usually placed in very close proximity, and the number of objects is usually in the order of dozens. In this situation, it is difficult to realize accurate paring due to the large cardinality and mutual interference. The second challenge is to mitigate the interference from the issues like the multi-path effect and object occlusion in realistic settings. These issues can lead to nonnegligible interference to pair the tags with the objects, such as the missing tags or objects which fail to be identified. The third challenge is in devising an efficient solution without any additional assistance, like the anchor nodes. It is impractical to intentionally deploy anchor nodes in real applications due to intensive deployment costs on manpower and time. This paper represents the first study of using RFID technology to precisely distinguish multiple objects in augmented reality systems. Specifically, we make three key contributions in this paper. 1) To the best of our knowledge, we are the first to consider identifying and distinguishing multiple tagged objects with RFID systems, it provides a key supporting technology for the augmented reality systems to realize the vision “tell me what I see from the AR system”. 2) We conduct an extensive experimental study to explore the inherent correlations between the depth of field and RF-signals from the tagged objects. We thus propose continuous scanning-based solutions and respectively leverage the RSSI and phase value from RFsignals to accurately distinguish the multiple tagged objects. 3) We implemented a prototype system and evaluated the actual performance with case studies. Our solution achieves an average match ratio of 91% in distinguishing up to dozens of RFID tagged objects with a high deployment density. RELATED WORK Depth camera-based pattern recognition: Depth camerabased pattern recognition aims at using the depth and RGB captured from the camera to recognize objects in a more accurate approach. Based on the depth processing [11, 18], a number of technologies are proposed in object recognition [23] and gesture recognition [5, 21, 8, 30, 22]. Nirjon et al. solve the problem of localizing and tracking household objects using depth-camera sensors [20]. Kinect-based pose estimation method [21] is proposed in the context of physical exercise, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions. RFID in Ubiquitous Applications: RFID has been investigated in various ubiquitous applications, including indoor localization [34, 24] , activity sensing [2], tabletop interaction[9], physical object search [19], etc. Prior work on RFID-based localization primarily relied on Received Signal Strength [34, 24] or Angle of Arrival [1] to acquire the absolute location of an object. The state-of-the-art systems use the phase value to estimate the absolute or relative location of an object with higher accuracy [33, 27, 17, 25]. RF-IDraw uses a 2-dimensional array of RFID antennas to track the movement trajectory of one finger attached with an RFID tag so that it can reconstruct the trajectory shape of the specified finger [29]. Tagoram exploits tag mobility to build a virtual antenna array, and uses differential augmented hologram to facilitate the instant tracking of a mobile RFID tag [32]. Find My Stuff (FiMS) provides search support for physical objects inside furniture, on room level, and in multiple locations [19]. Combined use in augmented reality environment: Recent works further consider using both depth camera and RFID for indoor localization and object recognition in augmented reality environment [26, 14, 6, 3]. Wang et al. propose an indoor real-time location system combined with active RFID and Kinect by leveraging the positioning feature of identified RFID and the object extraction ability of Kinect. Klompmaker et al. use RFID and depth-sensing cameras to enable personalized authenticated tangible interactions on a tabletop [14]. Galatas et al. propose a multimodal context-aware localization system, by using RFID and 3-D audio-visual information from 2 Kinect sensors deployed at various locations [6]. Cerrada et al. present a method to improve the object recognition by combining the vision-based techniques applied to the rangesensor captured 3D data, and object identification obtained from RFID tags [3]. SYSTEM OVERVIEW Design Goals We aim to implement a supporting technology for the AR systems to realize the vision of “tell me what I see from the

augmented system",by leveraging RFID tags to label differ- 3D- Applications ent objects.In order to achieve this goal,we need to collect Camera the responses from multiple tags and objects,and then pair RFID Matching the RFID tags to the corresponding objects,according to the Algorithm correlations between the depth of field and RF-signals.There- 1 Fcature Sampling fore,we need to consider the following metrics in regard to and Extraction system performance:1)Accuracy:Since the objects are usu- Laptop RSSI RFID Depth ally placed in very close proximity,there is a high accuracy Reader requirement in distinguishing these objects,i.e.,the average @0回 match ratios should be greater than a certain value,e.g.,85%. 30 2)Time-efficiency:Since the AR applications are usually exe- cuted in a real-time approach,it is essential to reduce the time (a)Prototype System (b)Software framework delay in identifying and distinguishing the multiple objects. Figure 2.System Framework 3)Robustness:The environmental factors,like the multi-path peatable experiments.We set a typical indoor environment, effect and partial occlusion,may cause the responses from the i.e..a 10mx 8m lobby,as the testing environment. tagged objects to be missing or distorted.Besides.the tagged objects could be partially hidden behind each other due to the Extract the Depth of Field from Depth-Camera randomness in the deployment.The solution should be robust Depth cameras,such as the Microsoft Kinect,are a kind of to these noises and distractions range camera,which produces a 2D image showing the dis- System Framework tance to points in a scene from a specific point,normally We design a prototype system as shown in Figure 2(a).We associated with a depth sensor.Therefore,the depth camera deploy one or two additional RFID antennas to the COTS can effectively estimate the distance to a specified object ac- depth camera.The RFID antenna(s)and the depth camera are cording to the depth.because the depth is linearly increasing fixed to a rotating shaft so that they can rotate simultaneously. with the distance.If multiple objects are placed in different For the RFID system,we use the COTS ImpinJ R420 reader positions in the scene,they are usually at different distances [10],one or two Laird S9028 antennas,and multiple Alien away from the depth camera.Therefore,it is possible to dis- 9640 general purpose tags;for the depth camera,we use the tinguish among different objects according to the depth values Microsoft Kinect for windows.They are both connected to from the depth camera. a laptop placed on the mobile robot.The mobile robot can perform a 360 degree rotation along with the rotation axis.In Experiment Observations the following sections,without loss of generality,we evaluate We first conduct experiment to evaluate the characteristics of the depth.We arbitrarily place three objects A.B,and C the performance using the above configurations.By attaching in front of the depth camera,i.e..Microsoft Kinect,object the RFID tags to the specified objects,we propose a continuous scanning-based scheme to scan the objects,i.e.,the system A is a box at distance 68cm,object B is a can at distance continuously rotates and samples the depth of field and RF- 95cm and object C is a tripod at distance 150cm.We then signals from these tagged objects.In this way,we can obtain collect the depth histogram from the depth sensor.As shown in the depth of specified objects from the depth sensor inside the Figure 3(a),the X-axis denotes the depth value,and the Y-axis depth camera,we can also extract the signal features such as denotes the number of pixels at the specified depth.We find the RSSI and phase values from the RF-signals of the RFID that,as A and B are regular-shaped objects,there are respective tags.By accurately pairing these information,the tags and the peaks in the depth histogram for object A and B,meaning that objects can be effectively bound together. many pixels are detected from this distance.Therefore,A and B can be easily distinguished according to the distance Figure 2(b)further shows the software framework.The system However,there exist two peaks in the corresponding distance is mainly composed of three layers:the sensor data collection of object C,because object C is an irregularly-shaped object layer,the middleware layer,and the application layer.For the (the concave shape of the tripod).there might be a number of sensor data collection layer,the depth camera recognizes mul- pixels at different distances.Moreover,we can also find some tiple objects and collects the corresponding depth distribution, background noises past the distance of 175 cm,which can be while the RFID system collects multiple tag IDs and extracts produced by background objects.such as the wall and floor. the corresponding RSSIs or phases from the RF-signals of This implies that,for the object with a continuous surface. RFID tags.For the middleware layer,we aim to sample and the depth sensor usually detects a peak in the vicinity of its extract some features from the raw sensor data,and conduct an distance,for an irregularly-shaped object,the depth sensor accurate matching among the objects and RFID tags.For the detects multiple peaks with intermittent depths.Nevertheless application layer,the AR applications can use the matching we find that these peaks are usually very close in distance. results directly to realize various objectives. In order to further validate the relationship between the depth FEATURE SAMPLING AND EXTRACTION and distance,we set multiple horizontal lines with different In this section,we investigate the feature sampling and extrac- distances to the Kinect(from 500 mm to 2500 mm).For each tion based on the observations from empirical studies.Without horizontal line,we then move a certain object along the line loss of generality,in the following each experiment observa- and respectively obtain the depth value from the Kinect.We tion is summarized from the statistical properties of 100 re- show the experiment results in Figure 3(b).Here we find

augmented system”, by leveraging RFID tags to label different objects. In order to achieve this goal, we need to collect the responses from multiple tags and objects, and then pair the RFID tags to the corresponding objects, according to the correlations between the depth of field and RF-signals. Therefore, we need to consider the following metrics in regard to system performance: 1) Accuracy: Since the objects are usually placed in very close proximity, there is a high accuracy requirement in distinguishing these objects, i.e., the average match ratios should be greater than a certain value, e.g., 85%. 2) Time-efficiency: Since the AR applications are usually executed in a real-time approach, it is essential to reduce the time delay in identifying and distinguishing the multiple objects. 3) Robustness: The environmental factors, like the multi-path effect and partial occlusion, may cause the responses from the tagged objects to be missing or distorted. Besides, the tagged objects could be partially hidden behind each other due to the randomness in the deployment. The solution should be robust to these noises and distractions. System Framework We design a prototype system as shown in Figure 2(a). We deploy one or two additional RFID antennas to the COTS depth camera. The RFID antenna(s) and the depth camera are fixed to a rotating shaft so that they can rotate simultaneously. For the RFID system, we use the COTS ImpinJ R420 reader [10], one or two Laird S9028 antennas, and multiple Alien 9640 general purpose tags; for the depth camera, we use the Microsoft Kinect for windows. They are both connected to a laptop placed on the mobile robot. The mobile robot can perform a 360 degree rotation along with the rotation axis. In the following sections, without loss of generality, we evaluate the performance using the above configurations. By attaching the RFID tags to the specified objects, we propose a continuous scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RFsignals from these tagged objects. In this way, we can obtain the depth of specified objects from the depth sensor inside the depth camera, we can also extract the signal features such as the RSSI and phase values from the RF-signals of the RFID tags. By accurately pairing these information, the tags and the objects can be effectively bound together. Figure 2(b) further shows the software framework. The system is mainly composed of three layers: the sensor data collection layer, the middleware layer, and the application layer. For the sensor data collection layer, the depth camera recognizes multiple objects and collects the corresponding depth distribution, while the RFID system collects multiple tag IDs and extracts the corresponding RSSIs or phases from the RF-signals of RFID tags. For the middleware layer, we aim to sample and extract some features from the raw sensor data, and conduct an accurate matching among the objects and RFID tags. For the application layer, the AR applications can use the matching results directly to realize various objectives. FEATURE SAMPLING AND EXTRACTION In this section, we investigate the feature sampling and extraction based on the observations from empirical studies. Without loss of generality, in the following each experiment observation is summarized from the statistical properties of 100 re- 3DCamera RFID Antennas Rotating Module RFID Reader Laptop Rotation Axis (a) Prototype System Applications Application Matching Algorithm Feature Sampling and Extraction Middleware Depth 3D Camera RFID System RSSI Phase Sensor data collection (b) Software framework Figure 2. System Framework peatable experiments. We set a typical indoor environment, i.e., a 10m× 8m lobby, as the testing environment. Extract the Depth of Field from Depth-Camera Depth cameras, such as the Microsoft Kinect, are a kind of range camera, which produces a 2D image showing the distance to points in a scene from a specific point, normally associated with a depth sensor. Therefore, the depth camera can effectively estimate the distance to a specified object according to the depth, because the depth is linearly increasing with the distance. If multiple objects are placed in different positions in the scene, they are usually at different distances away from the depth camera. Therefore, it is possible to distinguish among different objects according to the depth values from the depth camera. Experiment Observations We first conduct experiment to evaluate the characteristics of the depth. We arbitrarily place three objects A, B, and C in front of the depth camera, i.e., Microsoft Kinect, object A is a box at distance 68cm, object B is a can at distance 95cm and object C is a tripod at distance 150cm. We then collect the depth histogram from the depth sensor. As shown in Figure 3(a), the X-axis denotes the depth value, and the Y-axis denotes the number of pixels at the specified depth. We find that, as A and B are regular-shaped objects, there are respective peaks in the depth histogram for object A and B, meaning that many pixels are detected from this distance. Therefore, A and B can be easily distinguished according to the distance. However, there exist two peaks in the corresponding distance of object C, because object C is an irregularly-shaped object (the concave shape of the tripod), there might be a number of pixels at different distances. Moreover, we can also find some background noises past the distance of 175 cm, which can be produced by background objects, such as the wall and floor. This implies that, for the object with a continuous surface, the depth sensor usually detects a peak in the vicinity of its distance, for an irregularly-shaped object, the depth sensor detects multiple peaks with intermittent depths. Nevertheless, we find that these peaks are usually very close in distance. In order to further validate the relationship between the depth and distance, we set multiple horizontal lines with different distances to the Kinect (from 500 mm to 2500 mm). For each horizontal line, we then move a certain object along the line and respectively obtain the depth value from the Kinect. We show the experiment results in Figure 3(b). Here we find

4000 that.as the distance between the tag and the reader increases from 50 cm to 150 cm,the RSSI decreases rapidly;when the distance further increases,the RSSI then decreases slowly. Moreover,in regard to a certain distance,the RSSI from the 1000 tag always reaches the maximum value when the antenna is 0事n市男0 directly facing towards the tag.As we further increase the 150-105005010150 offset degree in rotation,the RSSI gradually decreases.This is Depth (cm) The horizontal coordinate:x (cm) because the antenna outputs the maximum transmitting power (a)The depth histogram of multiple (b)The depth value of objects in in the central area of the beam,and thus the RSSI of the objects different horizontal lines Figure 3.Experiment results of depth value backscattered RF-signals reaches the maximum value when the tag is in the center.As the tag's position is deviated from that,for each horizontal line,the depth values of the object the center of the antenna beam,the RSSI of the backscattered keep nearly constant,with rather small deviation;for different RF-signals thus decreases.We call the position of achieving horizontal lines,these depth values have obvious variations. the peak value in RSSI the perpendicular point,since the Due to the limitation of the Kinect's view,the Kinect has perpendicular bisector of the RFID antenna crosses this point. smaller view angle in closer distance.This observation implies that,the depth value collected from the depth cameras depicts the vertical distance rather than the absolute distance between the objects and the depth camera. Depth Feature Extraction To extract the depth of specified objects from the depth his- togram of multiple objects,we set a threshold t to detect the -30-20-i0o9n020 peaks in regard to the number of pixels.We thus iterate from Figure 4.The variation of RSSI via rotating the RFID antenna the minimum depth to the maximum depth in the histogram. Although the RSSI can only be used to measure the vertical if the number of pixels for a certain depth is larger than t,we distance between the tag and the antenna in a coarse granu identify it as a peak p(di,n;)with the depth d;and the number larity,nevertheless,with different offset degrees from the tag of pixels ni.In order to address the multiple-peaks problem to the center of antenna beam,the RSSI changes in a convex of irregularly-shaped objects,we set another threshold Ad.If curve with the peak value at the perpendicular point.We can the differences of these peaks'depth values are smaller than further leverage this property to differentiate the positions of Ad,we then combine them as one peak.Both the value of various objects in the horizontal aspect. t and Ad are selected based on the empirical value from a Extract the Phase Value from RF-Signals number of experimental studies (t=200 and Ad=10cm in our Background implementation).Then,each peak actually represents a speci- fied object.For each peak,we respectively find the leftmost Phase is a basic attribute of a signal along with amplitude and frequency.The phase value of an RF signal describes the depth d and the rightmost depth d,with the number of pixels degree that the received signal offsets from the sent signal, n>0.We then compute the average depth for the specified ranging from 0 to 360 degrees.Let d be the distance between object as follows:d(d).The average depth the RFID antenna and the tag,the signal traverses a round- is calculated in a weighted average approach according to the trip with a distance of 2d in each backscatter communication. number of pixels for each depth around the peak. Therefore,the phase value 0 output by the RFID reader can be expressed as [25,4]: Extract the Received Signal Strength from RF-signals The received signal strength(RSSI)measures the power of 2π =（元x2+四 mod2π. (1) received radio signal,which is inversely proportional to the distance between the tag and the reader.However,according to where A is the wave length.Besides the RF phase rotation the previous study [34],the RSSI is impacted by various issues over distance,the reader's transmitter,the tag's reflection char- like multi-path effect,and path loss,etc.This indicates that acteristic,and the reader's receiver will also introduce some the RSSI does not always have a monotonic relationship with additional phase rotation,denoted as er,Og and erAG respec- the distance.Therefore,with the RSSI from a specified tag, tively.We use u =er+eg+erac to denote this diversity the RFID system can roughly estimate the distance between term in Eq.(1).Since u is rather stable according to the previ- the reader and the tag. ous results [321.and it is only related to the physical properties Experiment Observations of the specified tag-antenna pair,we can record u for different It is found that,inside the RFID antenna's effective scanning tags in advance.Then,according to each tag's response,we can calibrate the phase value by offsetting the diversity term. range,the RSSI from the tag is also impacted by its position Thus,the phase value can be used as an accurate and stable offset from the center of the antenna beam.In order to validate metric to measure distance. the above judgment,we separate the RFID reader and the tag with a distance d,and then we evaluate the average RSSI value Estimate the Vertical Distance from Phase Value by gradually rotating the antenna from an offset degree of According to the definition in Eq.(1),the phase value is a -40 to +40.Figure 4 shows the experiment results.We find periodical function of the distance.Hence,given a specified

Depth (cm) 100 150 200 250 Number of pixels 0 1000 2000 3000 4000 Depth value:x(cm) 140 145 150 155 160 Number of pixels 0 100 200 300 400 500 Noise A B C (a) The depth histogram of multiple objects The horizontal coordinate: x (cm) -150 -100 -50 0 50 100 150 Depth(cm) 0 50 100 150 200 250 300 (b) The depth value of objects in different horizontal lines Figure 3. Experiment results of depth value that, for each horizontal line, the depth values of the object keep nearly constant, with rather small deviation; for different horizontal lines, these depth values have obvious variations. Due to the limitation of the Kinect’s view, the Kinect has smaller view angle in closer distance. This observation implies that, the depth value collected from the depth cameras depicts the vertical distance rather than the absolute distance between the objects and the depth camera. Depth Feature Extraction To extract the depth of specified objects from the depth histogram of multiple objects, we set a threshold t to detect the peaks in regard to the number of pixels. We thus iterate from the minimum depth to the maximum depth in the histogram, if the number of pixels for a certain depth is larger than t, we identify it as a peak p(di ,ni) with the depth di and the number of pixels ni . In order to address the multiple-peaks problem of irregularly-shaped objects, we set another threshold ∆d. If the differences of these peaks’ depth values are smaller than ∆d, we then combine them as one peak. Both the value of t and ∆d are selected based on the empirical value from a number of experimental studies (t=200 and ∆d=10cm in our implementation). Then, each peak actually represents a speci- fied object. For each peak, we respectively find the leftmost depth dl and the rightmost depth dr with the number of pixels nr > 0. We then compute the average depth for the specified object as follows: d = ∑ r i=l (di × ni ∑ r i=l ni ). The average depth is calculated in a weighted average approach according to the number of pixels for each depth around the peak. Extract the Received Signal Strength from RF-signals The received signal strength (RSSI) measures the power of received radio signal, which is inversely proportional to the distance between the tag and the reader. However, according to the previous study [34], the RSSI is impacted by various issues like multi-path effect, and path loss, etc. This indicates that the RSSI does not always have a monotonic relationship with the distance. Therefore, with the RSSI from a specified tag, the RFID system can roughly estimate the distance between the reader and the tag. Experiment Observations It is found that, inside the RFID antenna’s effective scanning range, the RSSI from the tag is also impacted by its position offset from the center of the antenna beam. In order to validate the above judgment, we separate the RFID reader and the tag with a distance d, and then we evaluate the average RSSI value by gradually rotating the antenna from an offset degree of −40◦ to +40◦ . Figure 4 shows the experiment results. We find that, as the distance between the tag and the reader increases from 50 cm to 150 cm, the RSSI decreases rapidly; when the distance further increases, the RSSI then decreases slowly. Moreover, in regard to a certain distance, the RSSI from the tag always reaches the maximum value when the antenna is directly facing towards the tag. As we further increase the offset degree in rotation, the RSSI gradually decreases. This is because the antenna outputs the maximum transmitting power in the central area of the beam, and thus the RSSI of the backscattered RF-signals reaches the maximum value when the tag is in the center. As the tag’s position is deviated from the center of the antenna beam, the RSSI of the backscattered RF-signals thus decreases. We call the position of achieving the peak value in RSSI the perpendicular point, since the perpendicular bisector of the RFID antenna crosses this point. −40 −30 −20 −10 0 10 20 30 40 −70 −65 −60 −55 −50 −45 −40 −35 Rotation angle RSSI value(dBm) Distance=50cm Distance=100cm Distance=150cm Distance=200cm Figure 4. The variation of RSSI via rotating the RFID antenna Although the RSSI can only be used to measure the vertical distance between the tag and the antenna in a coarse granularity, nevertheless, with different offset degrees from the tag to the center of antenna beam, the RSSI changes in a convex curve with the peak value at the perpendicular point. We can further leverage this property to differentiate the positions of various objects in the horizontal aspect. Extract the Phase Value from RF-Signals Background Phase is a basic attribute of a signal along with amplitude and frequency. The phase value of an RF signal describes the degree that the received signal offsets from the sent signal, ranging from 0 to 360 degrees. Let d be the distance between the RFID antenna and the tag, the signal traverses a roundtrip with a distance of 2d in each backscatter communication. Therefore, the phase value θ output by the RFID reader can be expressed as [25, 4]: θ = (2π λ ×2d + µ) mod 2π, (1) where λ is the wave length. Besides the RF phase rotation over distance, the reader’s transmitter, the tag’s reflection characteristic, and the reader’s receiver will also introduce some additional phase rotation, denoted as θT , θR and θTAG respectively. We use µ = θT + θR + θTAG to denote this diversity term in Eq. (1). Since µ is rather stable according to the previous results [32], and it is only related to the physical properties of the specified tag-antenna pair, we can record µ for different tags in advance. Then, according to each tag’s response, we can calibrate the phase value by offsetting the diversity term. Thus, the phase value can be used as an accurate and stable metric to measure distance. Estimate the Vertical Distance from Phase Value According to the definition in Eq. (1), the phase value is a periodical function of the distance. Hence, given a specified

phase value from the RF-signal,there can be multiple solu- the extracted phase value from RF-signals.Suppose the RFID tions for estimating the distance between the tag and antenna system respectively obtains two phase values 0 and 2 from Therefore,we can deploy an RFID antenna array to scan the two separated RFID antennas,then,according to the definition tags from slightly different positions.so as to figure out the in Eg.(1).the possible distances from the tag to the two unique solution of the distance.Without loss of generality,in this paper,we separate two RFID antennas with a distance of antennas are:d=克（景+k1）入，and da=(经+ka)元， Here,k and k2 are integers ranging from 0 to +o.Due to d,and use them to scan the RFID tags and respectively obtain the multiple solutions of k and k2,there could be multiple their phase values from the RF-signals,as shown in Figure 5. candidate positions for the tag.However,since the difference Since the depth value from the depth cameras like Kinect of the lengths of two sides is smaller than the length of the measures the vertical distance,instead of the absolute distance third side in a triangle,i.e.,d-d2<d,we can leverage this between the objects and the depth camera,in order to achieve a constraint to effectively eliminate many infeasible solutions perfect match between the collected RF-signals and the depth of k and k2.Besides,due to the limited scanning range of of field,it is essential to measure the vertical distance between the RFID system (the maximum scanning range is usually the tags and RFID antennas.However,it is rather difficult smaller than 10 m),the value of k and k2 should be upper to directly measure the vertical distance via the phase value. bounded by a certain threshold,i.. Figure 5 shows the relationship between the vertical distance Figure 6 shows an example of feasible positions of the target and the absolute distance.In regard to a specified RFID tag, tag according to the obtained phase values 0 and 6.The fea- suppose its absolute distances to Antenna 1 and Antenna 2 sible solutions include multiple positions like A~D,which are respectively di and d2,then we need to derive its vertical respectively belong to two hyperbolas Hi and H2.Due to distance h to the antenna pairs. the existence of multiple solutions,we can use these hyper- Y-axis bolas to denotes a superset of these feasible positions in a straightforward approach. HyPerbola H: HyPerbola H Vertical d A2 X-axis A中1 Amenna 1 Amenna2 Figure 5.Compute the (x,y)coordinate of the tag 9 9 If we respectively use A and A2 to denote the midpoint of Figure 6.Estimate the distance from phase value of RF signals Antenna I and Antenna 2,and use T to denote the position of the tag,as a matter of fact,the three sides of (T,A1). MATCHING ALGORITHM VIA CONTINUOUS SCANNING (T,A2),and (A1,A2)form a triangle.Since Antenna Al and Antenna A2 are separated with a fixed distance d,ac- Motivation cording to Heron's formula [12],the area of this triangle is To identify and distinguish the multiple tagged objects,a A=Vs(s-d1)(s-d2)(s-d),where s is the semiperimeter straightforward solution is to scan the tags in a static approach, of the triangle,i.e.s=.Moreover,since the area of where both the depth camera and RFID antenna(s)are de- ployed in a fixed position without moving.The system scans this triangle can also be computed as A=hx d,we can thus the objects and tags simultaneously and respectively collect compute the vertical distanceThen. the depth value and RF-signals from these tagged objects.We according to the Apollonius'theorem [7],for a triangle com- can further pair the tags with the objects accordingly.However, when multiple tagged objects are placed at close vertical dis- posed of point A1,A2 and T,the length of median TObisecting tance to the system,this solution cannot effectively distinguish the side AA2 is equal to m=2d+2d-d2.Hence.the multiple tagged objects in different horizontal distances. horizontal distance between the tag and the midpoint of the To address this problem,we propose a continuous scanning- two antennas,i.e.,T'O,should be vm2-h2.Therefore,if based solution as follows:we continuously rotate the scanning we build a local coordinate system with the origin set to the system(including the depth camera and RFID antennas),and the midpoint of the two antennas,the coordinate (,y)is simultaneously sample the depth of field and RF-signals from computed as follows: multiple tagged objects.Hence,we are able to collect a con- tinuous series of features like depth,RSSI and phase values Vid+id-id2-h2 d≥d2 during continuous scanning.While the scanning system is ro- (2) -(√+-d-h2) tating,the vertical distances between multiple objects and the d <d scanning system are continuously changing,from which we y=h. (3) can further derive the differences of multiple tagged objects in different horizontal distances.In this way,we are able to Therefore,the next problem we need to address is to estimate further distinguish multiple tagged objects with close vertical the absolute distance between the tag and antenna according to distance but in different positions

phase value from the RF-signal, there can be multiple solutions for estimating the distance between the tag and antenna. Therefore, we can deploy an RFID antenna array to scan the tags from slightly different positions, so as to figure out the unique solution of the distance. Without loss of generality, in this paper, we separate two RFID antennas with a distance of d, and use them to scan the RFID tags and respectively obtain their phase values from the RF-signals, as shown in Figure 5. Since the depth value from the depth cameras like Kinect measures the vertical distance, instead of the absolute distance between the objects and the depth camera, in order to achieve a perfect match between the collected RF-signals and the depth of field, it is essential to measure the vertical distance between the tags and RFID antennas. However, it is rather difficult to directly measure the vertical distance via the phase value. Figure 5 shows the relationship between the vertical distance and the absolute distance. In regard to a specified RFID tag, suppose its absolute distances to Antenna 1 and Antenna 2 are respectively d1 and d2, then we need to derive its vertical distance h to the antenna pairs. Antenna 1 Antenna 2 d Tag G G O m h X-axis Y-axis Vertical distance T A1 A2 Tÿ Figure 5. Compute the (x, y) coordinate of the tag If we respectively use A1 and A2 to denote the midpoint of Antenna 1 and Antenna 2, and use T to denote the position of the tag, as a matter of fact, the three sides of hT,A1i, hT,A2i, and hA1,A2i form a triangle. Since Antenna A1 and Antenna A2 are separated with a fixed distance d, according to Heron’s formula [12], the area of this triangle is A = p s(s−d1)(s−d2)(s−d), where s is the semiperimeter of the triangle, i.e., s = (d1+d2+d) 2 . Moreover, since the area of this triangle can also be computed as A = 1 2 h×d, we can thus compute the vertical distance h = 2 √ s(s−d1)(s−d2)(s−d) d . Then, according to the Apollonius’ theorem [7], for a triangle composed of point A1,A2 and T, the length of median TO bisecting the side A1A2 is equal to m = 1 2 q 2d 2 1 +2d 2 2 −d 2 . Hence, the horizontal distance between the tag and the midpoint of the two antennas, i.e., T 0O, should be √ m2 −h 2 . Therefore, if we build a local coordinate system with the origin set to the the midpoint of the two antennas, the coordinate (x 0 , y 0 ) is computed as follows: x 0 =    q 1 2 d 2 1 + 1 2 d 2 2 − 1 4 d 2 −h 2 d1 ≥ d2 −( q 1 2 d 2 1 + 1 2 d 2 2 − 1 4 d 2 −h 2) d1 < d2 (2) y 0 = h. (3) Therefore, the next problem we need to address is to estimate the absolute distance between the tag and antenna according to the extracted phase value from RF-signals. Suppose the RFID system respectively obtains two phase values θ1 and θ2 from two separated RFID antennas, then, according to the definition in Eq. (1), the possible distances from the tag to the two antennas are: d1 = 1 2 ·( θ1 2π +k1)· λ, and d2 = 1 2 ·( θ2 2π +k2)· λ. Here, k1 and k2 are integers ranging from 0 to +∞. Due to the multiple solutions of k1 and k2, there could be multiple candidate positions for the tag. However, since the difference of the lengths of two sides is smaller than the length of the third side in a triangle, i.e., |d1 −d2| < d, we can leverage this constraint to effectively eliminate many infeasible solutions of k1 and k2. Besides, due to the limited scanning range of the RFID system (the maximum scanning range l is usually smaller than 10 m), the value of k1 and k2 should be upper bounded by a certain threshold, i.e., 2l λ . Figure 6 shows an example of feasible positions of the target tag according to the obtained phase values θ1 and θ2. The feasible solutions include multiple positions like A ∼ D, which respectively belong to two hyperbolas H1 and H2. Due to the existence of multiple solutions, we can use these hyperbolas to denotes a superset of these feasible positions in a straightforward approach. Antenna 1 Antenna 2 d<Ȝ/2 Target A B D C Ȝ/2 G T T T T T HyPerbola H1 HyPerbola H2 G Figure 6. Estimate the distance from phase value of RF signals MATCHING ALGORITHM VIA CONTINUOUS SCANNING Motivation To identify and distinguish the multiple tagged objects, a straightforward solution is to scan the tags in a static approach, where both the depth camera and RFID antenna(s) are deployed in a fixed position without moving. The system scans the objects and tags simultaneously and respectively collect the depth value and RF-signals from these tagged objects. We can further pair the tags with the objects accordingly. However, when multiple tagged objects are placed at close vertical distance to the system, this solution cannot effectively distinguish multiple tagged objects in different horizontal distances. To address this problem, we propose a continuous scanningbased solution as follows: we continuously rotate the scanning system (including the depth camera and RFID antennas), and simultaneously sample the depth of field and RF-signals from multiple tagged objects. Hence, we are able to collect a continuous series of features like depth, RSSI and phase values during continuous scanning. While the scanning system is rotating, the vertical distances between multiple objects and the scanning system are continuously changing, from which we can further derive the differences of multiple tagged objects in different horizontal distances. In this way, we are able to further distinguish multiple tagged objects with close vertical distance but in different positions

● ●0pbr8 民otatiee5 cale:-8.+ RFID angle (a)The deployment of multiple tagged objects (b)Variation of the depth value (c)Variation of the RSSI value Figure 7.The experiment results of continuous scanning Extract Depth via Continuous Scanning multi-path fading and path loss.However,since most mid-and In this section,we present our approach to extract the depth low-end COTS RFID systems can only extract the RSSI from series via continuous scanning,so as to derive both the vertical RF-signals,we need to figure out a solution based on RSSI. distance and the horizontal distance of the tagged objects. In this section,we present our approach to pair the tags with objects according to the correlations between the depth and During the continuous scanning,we continuously rotate the RSSI in continuous scanning. depth camera from the angle of-0 to +0 and use it to scan the multiple tagged objects.During this process,as the vertical According to the observations from Figure 4,with different distance between the specified objects and the depth camera is offset degrees from the tag to the center of antenna beam.the continuously changing,the depth values collected from these RSSI changes with a fixed variation pattern.This implies that, objects are also continuously changing.We conduct exper- if we conduct the continuous scanning to the tagged objects. iments to validate this judgment.As shown in Figure 7(a). the RSSI from the tag always reaches the maximum value we arbitrarily deploy multiple tagged objects within the ef- when the antenna is right facing towards the tag.This varia- fective scanning range,the coordinates of these objects are tion pattern of RSSI is quite similar to the depth value,since also labeled.We continuously rotate the depth camera from they both reach the peak value when the tagged object is at the angle of-40 to +40 and collect the depth values from the perpendicular point of depth camera/RFID antenna.We multiple tagged objects for every 5~6 degrees.Figure 7(b) further conduct experiments to validate the above judgment. shows the experiment results.We use the method of quadratic Using the delpoyment in Figure 7(a),we continuously rotate curve fitting to connect the depth values as a curve for a certain the RFID antenna from the angle of-40 to +40 and collect object.We find that the series of depth values for each object the RSSI from multiple tags.As shown in Figure 7(c),the vari- actually form a convex curve with a peak value.This peak ation of RSSI for each tag has very similar features as depth: value denotes the snapshot when the vertical distance reaches during the continuous scanning,the RSSI first increases to a the maximum value.It appears only when the perpendicular maximum value.and then further decreases to a certain value. bisector of the depth camera crosses the specified object,since The only difference is that the RSSI is inversely corresponding the vertical distance reaches the value of the absolute distance with the depth value for any specified object.i.e.,the larger the between the object and the depth camera,which is the theo- RSSI,the smaller the depth.Therefore,we can also label each retical upper bound it can achieve.In other words,the peak tag with the coordinate of its peak value,i.e.,(0,r),where value appears when the depth camera is right facing towards represents the rotation angle and r represents the RSSI.We the object,we call this perpendicular point. can respectively use the RSSI r and the rotation angle 0 to distinguish the tags in vertical and horizontal dimensions In this way,according to the peak value of depth,we are able to further distinguish multiple objects with the same vertical Therefore,in order to pair multiple tags with multiple objects, distance but different positions.The solution is as follows: we propose a matching solution in Algorithm 1.Our goal is to After the system finishes continuous scanning,it extracts the find a matching between two disjoint sets O and 7 according peak value from the curve of each object's depth value.Then, to the correlation of their measurements.After we extract we label each object with the coordinate of its peak value,i.e.. the vector from the measured data,for any object O;with (0,d),where 0 represents the rotation angle and d represents vector (0,di),we first select the candidate tags for pairing the depth value.Therefore,as the depth d denotes the verti according to the angle 0.We set all tags as pairing candidates cal distance of objects,we can use the depth to distinguish with their angles in the range[a:-δ，a+6](δ=5°in our the objects in the vertical dimension:as the rotation angle 0 implementation).Then we further compare their values in denotes the angle for the camera to meet the perpendicular RSSI and depth.As the RSSI and the depth are measured in point,we can use the angle to distinguish the objects in the different dimensions,e.g.,the depth value is linearly correlated horizontal dimension.They can be easily distinguished from to the distance,while the RSSI is nonlinearly correlated to the horizontal dimension. the distance,it is not reasonable to compare them directly We thus match each object to a candidate tag based on their Pair the Tags with Objects according to Depth and RSSI relative rank in RSSI and depth.After that,since multiple It is known that the RSSI is not a very reliable metric to accu- objects may be matched to one tag,we make the tag select the rately measure the distance between the tags and the antennas. object with the closest rank as the final pair.This process then iterates until all the objects and tags are paired. as it is easy to be impacted by the environmental factors like

RFID Antenna y(m) 0.5 1.5 1 2 2.5 x(m) Object1 (0,0.85) Object2 (-0.23,1.2) Object3 (0.35,1.3) Object4 (-0.6,1.8) Object5 (0.6,1.8) 3D-Camera Rotation Scale:[-ș,+ș] (a) The deployment of multiple tagged objects −40 −20 0 20 40 500 1000 1500 2000 Rotation angle Depth value(mm) Object1 Object2 Object3 Object4 Object5 (b) Variation of the depth value −40 −20 0 20 40 −65 −60 −55 −50 −45 −40 Rotation angle RSSI value(dBm) Tag1 Tag2 Tag3 Tag4 Tag5 (c) Variation of the RSSI value Figure 7. The experiment results of continuous scanning Extract Depth via Continuous Scanning In this section, we present our approach to extract the depth series via continuous scanning, so as to derive both the vertical distance and the horizontal distance of the tagged objects. During the continuous scanning, we continuously rotate the depth camera from the angle of −θ to +θ and use it to scan the multiple tagged objects. During this process, as the vertical distance between the specified objects and the depth camera is continuously changing, the depth values collected from these objects are also continuously changing. We conduct experiments to validate this judgment. As shown in Figure 7(a), we arbitrarily deploy multiple tagged objects within the effective scanning range, the coordinates of these objects are also labeled. We continuously rotate the depth camera from the angle of −40◦ to +40◦ and collect the depth values from multiple tagged objects for every 5∼6 degrees. Figure 7(b) shows the experiment results. We use the method of quadratic curve fitting to connect the depth values as a curve for a certain object. We find that the series of depth values for each object actually form a convex curve with a peak value. This peak value denotes the snapshot when the vertical distance reaches the maximum value. It appears only when the perpendicular bisector of the depth camera crosses the specified object, since the vertical distance reaches the value of the absolute distance between the object and the depth camera, which is the theoretical upper bound it can achieve. In other words, the peak value appears when the depth camera is right facing towards the object, we call this perpendicular point. In this way, according to the peak value of depth, we are able to further distinguish multiple objects with the same vertical distance but different positions. The solution is as follows: After the system finishes continuous scanning, it extracts the peak value from the curve of each object’s depth value. Then, we label each object with the coordinate of its peak value, i.e., hθ,di, where θ represents the rotation angle and d represents the depth value. Therefore, as the depth d denotes the vertical distance of objects, we can use the depth to distinguish the objects in the vertical dimension; as the rotation angle θ denotes the angle for the camera to meet the perpendicular point, we can use the angle to distinguish the objects in the horizontal dimension. They can be easily distinguished from the horizontal dimension. Pair the Tags with Objects according to Depth and RSSI It is known that the RSSI is not a very reliable metric to accurately measure the distance between the tags and the antennas, as it is easy to be impacted by the environmental factors like multi-path fading and path loss. However, since most mid-and low-end COTS RFID systems can only extract the RSSI from RF-signals, we need to figure out a solution based on RSSI. In this section, we present our approach to pair the tags with objects according to the correlations between the depth and RSSI in continuous scanning. According to the observations from Figure 4, with different offset degrees from the tag to the center of antenna beam, the RSSI changes with a fixed variation pattern. This implies that, if we conduct the continuous scanning to the tagged objects, the RSSI from the tag always reaches the maximum value when the antenna is right facing towards the tag. This variation pattern of RSSI is quite similar to the depth value, since they both reach the peak value when the tagged object is at the perpendicular point of depth camera/RFID antenna. We further conduct experiments to validate the above judgment. Using the delpoyment in Figure 7(a), we continuously rotate the RFID antenna from the angle of −40◦ to +40◦ and collect the RSSI from multiple tags. As shown in Figure 7(c), the variation of RSSI for each tag has very similar features as depth: during the continuous scanning, the RSSI first increases to a maximum value, and then further decreases to a certain value. The only difference is that the RSSI is inversely corresponding with the depth value for any specified object, i.e., the larger the RSSI, the smaller the depth. Therefore, we can also label each tag with the coordinate of its peak value, i.e., hθ,ri, where θ represents the rotation angle and r represents the RSSI. We can respectively use the RSSI r and the rotation angle θ to distinguish the tags in vertical and horizontal dimensions. Therefore, in order to pair multiple tags with multiple objects, we propose a matching solution in Algorithm 1. Our goal is to find a matching between two disjoint sets O and T according to the correlation of their measurements. After we extract the vector from the measured data, for any object Oi with vector hθi ,dii, we first select the candidate tags for pairing according to the angle θi . We set all tags as pairing candidates with their angles in the range [θi −δ,θi +δ] (δ = 5 ◦ in our implementation). Then we further compare their values in RSSI and depth. As the RSSI and the depth are measured in different dimensions, e.g., the depth value is linearly correlated to the distance, while the RSSI is nonlinearly correlated to the distance, it is not reasonable to compare them directly. We thus match each object to a candidate tag based on their relative rank in RSSI and depth. After that, since multiple objects may be matched to one tag, we make the tag select the object with the closest rank as the final pair. This process then iterates until all the objects and tags are paired

Algorithm 1 Match multiple objects to multiple tags of multiple feasible solutions according to different pairs of 1:Extract the vector:After continuous scanning,we respectively identify phase values.It is found that the multiple hyperbolas of differ- the peak value from the quadratic fitting curve of depth and RSSI.For ent feasible solutions all intersect at a small area which is very each object Oi,we label it with a vector (di),and add the vector to close to the target tag's real position.We thus set the central a set O;for each tag Ti,we label it with a vector (ri),and add the vector to a set T. point of the intersection region as the estimate value of the 2:while O+2 and T≠odd tag's position. 3: Match the objects and tags:For each object OiE O with vector (.di),respectively add those objects O;EO and those tags T;E T Canddse solution of l with6;∈[e-δ，6+into the set O and Te.In regard to the depth value di,compute the rank of Oi in the set O as k.Select the tag T;ETe with the rank of k in regard to the RSSI ri,and pair the object anddste Solution of (A. O;with the tag T Real Position(-60,180) 4 Calibrate the matching results:For any tag Ti ET paired with mul- tiple objects,select the object O;from these objects with the closest rank similarly,and pair the object O;with the tag Ti.Respectively 0 remove the object O;and the tag T;from set O and T 5:end while A2 90 6:Output the matched pairs of objects and tags. Figure 8.Multiple solutions of the tag's position After deriving the target tag's position,we can further derive Pair the Tags with Objects according to Depth and Phase the angle when the tag is at the perpendicular point of the Since a new brand of COTS RFID systems,like the ImpinJ, RFID antennas,that is the moment when the perpendicular are able to extract the phase value from the RF-signals of tags, bisector of the midpoint of the antenna pairs crosses the tag. it provides us a new opportunity to differentiate the positions We use the pair(0,8)to denote this situation,here e denotes of the tagged objects with a more accurate approach.In this the offset angle of the antenna,and o denotes the vertical dis- section,we present our approach to pair the tags with objects tance.The pair (0,is computed as follows:0=arctan according to the correlations between the depth and phase in and 8=Vx2+y2.Therefore,we can further leverage an continuous scanning. algorithm like Algorithm 1 to match multiple tags to multiple objects.The only difference is that we can directly pair the According to the analysis shown in Figure 6,given the two phase values of RF-signals extracted from two antennas sep- objects OiO with the tags Ti ET according to the distance arated with a distance d(d=25cm in our implementation), between the vector (0i,di)and the vector (0j,6j),since they can accurately estimate the positions of the objects/tags. there could be multiple solutions for the tag's position,which could be represented with multiple hyperbolas in the two- Discussion dimensional space.In fact,we can leverage continuous scan- Robustness ning to figure out a unique solution by filtering out those un- Due to the environmental issues like the multi-path fading and qualified solutions.The idea is as follows:for each snapshot object occlusion,the system may fail to identify some of the ti(i=1 ~m)of the continuous scanning,for a specified tag objects and the tags.Moreover,in some situations,it is essen- T,we can respectively extract the phase values(01,02)from tial to isolate the recognizable object with non-recognizable the two antennas,then compute the feasible distances (di,d2) ones.Hence,it is possible that the cardinality of objects identi- between the tag and two antennas.We further compute the fied by the depth camera is not equal to the cardinality of tags set of feasible positions in a global coordinate system as Si. identified by the RFID antenna.This leads to imperfect match- Then,by computing the intersection of different sets Si for all ing between the objects and tags.Our solution is able to tackle snapshots,we are able to figure out a unique solution for the this problem by using the two-dimensional matching method tag's position as follows:S=Si. with regression analysis.By means of continuous scanning via rotation,in regard to the tags and objects,we can derive their As a matter of fact,as long as two pairs of phase values are vertical distances in the horizontal dimension and horizontal obtained,we are able to further derive the unique solution of distances in the horizontal dimension,respectively from the the tag's position by computing the intersection of multiple depth camera and the RFID antenna.Then we perform the feasible solutions.Figure 8 shows an example of deriving the regression analysis on the vertical distances and horizontal unique solution.Suppose a target tag is deployed at the coordi- distances between the tags and objects,and filter out those nate (-60.180).We first obtain the phase values (2.58.5.81) outliers according to the regression model.After that.we from the two antennas when they are respectively at the posi- pair the tags with objects according to their two dimensional tion of A and A2.After the antenna pairs are rotated with a de- positions.This approach effectively mitigates the interference gree of 40,we then obtain the phase values(5.56,2.49)from from those tags and objects which fail to be identified and the two antennas when they are respectively at the position isolates the recognizable object with non-recognizable ones. of A and A2.In this way,we can obtain three pairs of phase values(2.58,5.81),(2.58,5.56)and(5.81,2.49),which are Scalability respectively collected from antenna pairs (A1,A2),(A1,A). Our technical solution is primarily based on the distinction and(A2,A2).We can respectively use them to compute the of depth(vertical distance)from the tagged objects.How- feasible solutions in a unified coordinate system.As shown ever,even if multiple objects are of the same depth,our so- in Figure 8,we use different colors to label the hyperbolas lution is still able to distinguish these objects via continuous

Algorithm 1 Match multiple objects to multiple tags 1: Extract the vector: After continuous scanning, we respectively identify the peak value from the quadratic fitting curve of depth and RSSI. For each object Oi , we label it with a vector hθi ,dii, and add the vector to a set O; for each tag Tj , we label it with a vector hθj ,rji, and add the vector to a set T. 2: while O , ∅ and T , ∅ do 3: Match the objects and tags: For each object Oi ∈ O with vector hθi ,dii, respectively add those objects Oj ∈ O and those tags Tj ∈ T with θj ∈ [θi −δ,θi +δ] into the set Oc and Tc. In regard to the depth value di , compute the rank of Oi in the set Oc as k. Select the tag T ∗ j ∈ Tc with the rank of k in regard to the RSSI rj , and pair the object Oj with the tag T ∗ j . 4: Calibrate the matching results: For any tag Tj ∈ T paired with multiple objects, select the object Oi from these objects with the closest rank similarly, and pair the object Oi with the tag Tj∗. Respectively remove the object Oi and the tag Tj from set O and T. 5: end while 6: Output the matched pairs of objects and tags. Pair the Tags with Objects according to Depth and Phase Since a new brand of COTS RFID systems, like the ImpinJ, are able to extract the phase value from the RF-signals of tags, it provides us a new opportunity to differentiate the positions of the tagged objects with a more accurate approach. In this section, we present our approach to pair the tags with objects according to the correlations between the depth and phase in continuous scanning. According to the analysis shown in Figure 6, given the two phase values of RF-signals extracted from two antennas separated with a distance d (d=25cm in our implementation), there could be multiple solutions for the tag’s position, which could be represented with multiple hyperbolas in the twodimensional space. In fact, we can leverage continuous scanning to figure out a unique solution by filtering out those unqualified solutions. The idea is as follows: for each snapshot ti(i = 1 ∼ m) of the continuous scanning, for a specified tag T, we can respectively extract the phase values (θ1,θ2) from the two antennas, then compute the feasible distances (d1,d2) between the tag and two antennas. We further compute the set of feasible positions in a global coordinate system as Si . Then, by computing the intersection of different sets Si for all snapshots, we are able to figure out a unique solution for the tag’s position as follows: S = ∩ m i=1 Si . As a matter of fact, as long as two pairs of phase values are obtained, we are able to further derive the unique solution of the tag’s position by computing the intersection of multiple feasible solutions. Figure 8 shows an example of deriving the unique solution. Suppose a target tag is deployed at the coordinate (−60,180). We first obtain the phase values (2.58,5.81) from the two antennas when they are respectively at the position of A1 and A2. After the antenna pairs are rotated with a degree of 40◦ , we then obtain the phase values (5.56,2.49) from the two antennas when they are respectively at the position of A 0 1 and A 0 2 . In this way, we can obtain three pairs of phase values (2.58,5.81), (2.58,5.56) and (5.81,2.49), which are respectively collected from antenna pairs hA1,A2i, hA1,A 0 1 i, and hA2,A 0 2 i. We can respectively use them to compute the feasible solutions in a unified coordinate system. As shown in Figure 8, we use different colors to label the hyperbolas of multiple feasible solutions according to different pairs of phase values. It is found that the multiple hyperbolas of different feasible solutions all intersect at a small area which is very close to the target tag’s real position. We thus set the central point of the intersection region as the estimate value of the tag’s position. The horizontal coordinate:x(cm) -100 -50 0 50 100 The vertical coordinate:y(cm) 0 50 100 150 200 250 300 Candidate Solution of (A1 , A2 ) Candidate Solution of (A1 , A2 ) Candidate Solution of (A1 , A2 ) Candidate Solution of (A1 , A2 ) Candidate Solution of (A1 , A'1 ) Candidate Solution of (A2 , A'2 ) Real Position(-60,180) Candidate Region A1 A2 A1’ A2’ Figure 8. Multiple solutions of the tag’s position After deriving the target tag’s position, we can further derive the angle when the tag is at the perpendicular point of the RFID antennas, that is the moment when the perpendicular bisector of the midpoint of the antenna pairs crosses the tag. We use the pair hθ,δi to denote this situation, here θ denotes the offset angle of the antenna, and δ denotes the vertical distance. The pair hθ,δi is computed as follows: θ = arctan| x y |, and δ = p x 2 +y 2 . Therefore, we can further leverage an algorithm like Algorithm 1 to match multiple tags to multiple objects. The only difference is that we can directly pair the objects Oi ∈ O with the tags Tj ∈ T according to the distance between the vector hθi ,dii and the vector hθj ,δji, since they can accurately estimate the positions of the objects/tags. Discussion Robustness Due to the environmental issues like the multi-path fading and object occlusion, the system may fail to identify some of the objects and the tags. Moreover, in some situations, it is essential to isolate the recognizable object with non-recognizable ones. Hence, it is possible that the cardinality of objects identi- fied by the depth camera is not equal to the cardinality of tags identified by the RFID antenna. This leads to imperfect matching between the objects and tags. Our solution is able to tackle this problem by using the two-dimensional matching method with regression analysis. By means of continuous scanning via rotation, in regard to the tags and objects, we can derive their vertical distances in the horizontal dimension and horizontal distances in the horizontal dimension, respectively from the depth camera and the RFID antenna. Then we perform the regression analysis on the vertical distances and horizontal distances between the tags and objects, and filter out those outliers according to the regression model. After that, we pair the tags with objects according to their two dimensional positions. This approach effectively mitigates the interference from those tags and objects which fail to be identified and isolates the recognizable object with non-recognizable ones. Scalability Our technical solution is primarily based on the distinction of depth (vertical distance) from the tagged objects. However, even if multiple objects are of the same depth, our solution is still able to distinguish these objects via continuous

scanning.By leveraging continuous scanning,our solution generate 10 types of deployments for the tagged objects,and is able to effectively distinguish the tagged objects in terms evaluate the average match ratio for successful pairing in the of both vertical distance and horizontal distance.Moreover, above four schemes. in real applications,since the tagged objects are deployed in 3-dimensional space,two tagged objects can be at the same Accuracy for different cardinalities of tagged objects coordinate (x,y)but at different heights in z.Our solution can Our solution achieves good performance in accuracy when effectively scale to this situation by conducting continuous the cardinalities of tagged objects are varied from 3 to 15.We scanning in a 2-dimensional space,i.e.,continuously scan the first deploy 10 tagged objects in the scanning area,and set tagged objects by rotating up and down,and from left to right. the average horizontal/vertical distance among the objects to In this way,the system is able to distinguish multiple objects 30 cm,thus the average density is 11 objects/m2,which is a of different positions in 3-dimensional space. fairly large density for conventional applications.As shown in Figure 9(a),we find that both CS-RSSI and CS-Phase achieve Time Delay much better performance than SS-RSSI and HS-Phase,e.g., Since the number of tagged objects cannot be too large in real the match ratios of CS-RSSI and CS-Phase are respectively applications,the computation complexity of our algorithm is 75%and 91%.while the match ratios of SS-RSSI and HS- fairly low,hence the time delay of our solution mainly lies in Phase are only 25%and 40%.We further evaluate the match the process of continuous scanning.Therefore,we can reduce ratio for pairing different cardinalities of tagged objects,by the time delay via the following two approaches:1)Increase varying the cardinality of tagged objects from 3 to 15.As the rotation speed of the continuous scanning system,such shown in Figure 9(b),as the cardinality increases from 3 to that the time delay in rotation is reduced.2)Appropriately 15,the match ratios of SS-RSSI and HS-Phase decrease in a decrease the number of samples during the continuous scan- rapid approach,whereas the match ratios of CS-RSSI and CS- ning,without too much loss in the accuracy of distinguishing Phase decrease slowly.Nevertheless,CS-RSSI and CS-Phase multiple tagged objects,such that the time delay in sampling respectively achieve a match ratio of 60%and 77%when the is reduced.In fact,our depth-phase-based pairing approach cardinality of tagged objects is 15. only requires to sample twice during the continuous scanning, Accuracy for different vertical/horizontal distances which greatly reduces the time-delay in scanning. Our solution achieves good performance in accuracy when PERFORMANCE EVALUATION the vertical/horizontal distances are varied from 10 cm to 50 Experiment Settings cm.We respectively vary the average vertical distances and We evaluated our system using one Microsoft Kinect for win- horizontal distances among the tagged objects,thus to further dows,one ImpinJ R420 reader,two Laird S9028 RFID an- evaluate the performance in accuracy.We fix the average hori- tennas,and multiple ImpinJ E41-B general purpose tags.We zontal (vertical)distance among the objects to 30 cm,and vary deploy multiple objects in an area of about 3mx 3m,and the average vertical (horizontal)distance from 10 cm to 50 attach each tag to an object.We use the Kinect as the depth- cm.Figure 9(c)and Figure 9(d)show the match ratios with camera and use the RFID reader to scan the tags.The average different vertical distances and horizontal distances,respec- distance between the tagged objects and the system is 2 m.We tively.We find that as the average vertical/horizontal distance implement four schemes for performance comparison: decreases.the match ratios of all schemes gradually decrease. 1)Static Scanning via Depth-RSSI Pairing(SS-RSSI):The Besides,for the same vertical distance and horizontal distance. system scans the tagged objects once at a fixed position,and the match ratio of the former situation is apparently less than pairs the tags with the objects according to their partial orders the latter situation.since the vertical distance is more difficult respectively in collected depth and RSSI. to estimate than the horizontal distance.Nevertheless,CS- 2)Hybrid Scanning via Depth-Phase Pairing(HS-Phase):The Phase respectively achieves a match ratio of 68%and 72% depth camera continuously rotates and scans the tagged ob- when the average vertical/horizontal distance is 10cm,which jects,while the RFID antennas scan the tagged objects once at is corresponding to a rather high density for the tagged objects. a fixed position,and pairs the tags with the objects according i.e..33 objects/m2 to the extracted depth and phase. 3)Continuous Scanning via Depth-RSSI Pairing(CS-RSSI): Evaluate the Robustness in Pairing the Tags with Objects The system continuously scans the tagged objects while it is Robustness to missing tags/objects rotating,and pairs the tags with the objects according to the Our solution achieves good performance in robustness with extracted series of depth and RSSI. different ratios of missing objects/tags ranging from 10%to 4)Continuous Scanning via Depth-Phase Pairing (CS-Phase): 50%.We run experiments to evaluate the robustness to missing The system continuously scans the tagged objects while it tags/objects,when there exist several objects or tags which is rotating,and pairs the tags with objects according to the fail to be identified.Here we measure the match ratio for the extracted series of depth and phase. remaining objects or tags.Figure 9(e)and Figure 9(f)show the experiment results for different ratios of missing objects Evaluate the Accuracy in Pairing the Tags with Objects and tags,respectively.As the ratio of missing objects/tags We run experiments to evaluate the accuracy in pairing the increases from 10%to 50%,the match ratios for all schemes tags with the objects.Without loss of generality,by default decrease in most cases,except that in some cases,the match we deploy 10 tagged objects in the scanning area.We vary ratio of SS-RSSI and HS-Phases slightly increase,since the the settings of the average horizontal/vertical distance,and the number of objects/tags for pairing is reduced.Nevertheless. cardinality of tagged objects.For each setting,we randomly CS-RSSI and CS-Phase respectively achieve a match ratio of

scanning. By leveraging continuous scanning, our solution is able to effectively distinguish the tagged objects in terms of both vertical distance and horizontal distance. Moreover, in real applications, since the tagged objects are deployed in 3-dimensional space, two tagged objects can be at the same coordinate (x, y) but at different heights in z. Our solution can effectively scale to this situation by conducting continuous scanning in a 2-dimensional space, i.e., continuously scan the tagged objects by rotating up and down, and from left to right. In this way, the system is able to distinguish multiple objects of different positions in 3-dimensional space. Time Delay Since the number of tagged objects cannot be too large in real applications, the computation complexity of our algorithm is fairly low, hence the time delay of our solution mainly lies in the process of continuous scanning. Therefore, we can reduce the time delay via the following two approaches: 1) Increase the rotation speed of the continuous scanning system, such that the time delay in rotation is reduced. 2) Appropriately decrease the number of samples during the continuous scanning, without too much loss in the accuracy of distinguishing multiple tagged objects, such that the time delay in sampling is reduced. In fact, our depth-phase-based pairing approach only requires to sample twice during the continuous scanning, which greatly reduces the time-delay in scanning. PERFORMANCE EVALUATION Experiment Settings We evaluated our system using one Microsoft Kinect for windows, one ImpinJ R420 reader, two Laird S9028 RFID antennas, and multiple ImpinJ E41-B general purpose tags. We deploy multiple objects in an area of about 3m× 3m, and attach each tag to an object. We use the Kinect as the depthcamera and use the RFID reader to scan the tags. The average distance between the tagged objects and the system is 2 m. We implement four schemes for performance comparison: 1) Static Scanning via Depth-RSSI Pairing (SS-RSSI): The system scans the tagged objects once at a fixed position, and pairs the tags with the objects according to their partial orders respectively in collected depth and RSSI. 2) Hybrid Scanning via Depth-Phase Pairing (HS-Phase): The depth camera continuously rotates and scans the tagged objects, while the RFID antennas scan the tagged objects once at a fixed position, and pairs the tags with the objects according to the extracted depth and phase. 3) Continuous Scanning via Depth-RSSI Pairing (CS-RSSI): The system continuously scans the tagged objects while it is rotating, and pairs the tags with the objects according to the extracted series of depth and RSSI. 4) Continuous Scanning via Depth-Phase Pairing (CS-Phase): The system continuously scans the tagged objects while it is rotating, and pairs the tags with objects according to the extracted series of depth and phase. Evaluate the Accuracy in Pairing the Tags with Objects We run experiments to evaluate the accuracy in pairing the tags with the objects. Without loss of generality, by default we deploy 10 tagged objects in the scanning area. We vary the settings of the average horizontal/vertical distance, and the cardinality of tagged objects. For each setting, we randomly generate 10 types of deployments for the tagged objects, and evaluate the average match ratio for successful pairing in the above four schemes. Accuracy for different cardinalities of tagged objects Our solution achieves good performance in accuracy when the cardinalities of tagged objects are varied from 3 to 15. We first deploy 10 tagged objects in the scanning area, and set the average horizontal/vertical distance among the objects to 30 cm, thus the average density is 11 objects/m 2 , which is a fairly large density for conventional applications. As shown in Figure 9(a), we find that both CS-RSSI and CS-Phase achieve much better performance than SS-RSSI and HS-Phase, e.g., the match ratios of CS-RSSI and CS-Phase are respectively 75% and 91%, while the match ratios of SS-RSSI and HSPhase are only 25% and 40%. We further evaluate the match ratio for pairing different cardinalities of tagged objects, by varying the cardinality of tagged objects from 3 to 15. As shown in Figure 9(b), as the cardinality increases from 3 to 15, the match ratios of SS-RSSI and HS-Phase decrease in a rapid approach, whereas the match ratios of CS-RSSI and CSPhase decrease slowly. Nevertheless, CS-RSSI and CS-Phase respectively achieve a match ratio of 60% and 77% when the cardinality of tagged objects is 15. Accuracy for different vertical/horizontal distances Our solution achieves good performance in accuracy when the vertical/horizontal distances are varied from 10 cm to 50 cm. We respectively vary the average vertical distances and horizontal distances among the tagged objects, thus to further evaluate the performance in accuracy. We fix the average horizontal (vertical) distance among the objects to 30 cm, and vary the average vertical (horizontal) distance from 10 cm to 50 cm. Figure 9(c) and Figure 9(d) show the match ratios with different vertical distances and horizontal distances, respectively. We find that as the average vertical/horizontal distance decreases, the match ratios of all schemes gradually decrease. Besides, for the same vertical distance and horizontal distance, the match ratio of the former situation is apparently less than the latter situation, since the vertical distance is more difficult to estimate than the horizontal distance. Nevertheless, CSPhase respectively achieves a match ratio of 68% and 72% when the average vertical/horizontal distance is 10cm, which is corresponding to a rather high density for the tagged objects, i.e., 33 objects/m 2 . Evaluate the Robustness in Pairing the Tags with Objects Robustness to missing tags/objects Our solution achieves good performance in robustness with different ratios of missing objects/tags ranging from 10% to 50%. We run experiments to evaluate the robustness to missing tags/objects, when there exist several objects or tags which fail to be identified. Here we measure the match ratio for the remaining objects or tags. Figure 9(e) and Figure 9(f) show the experiment results for different ratios of missing objects and tags, respectively. As the ratio of missing objects/tags increases from 10% to 50%, the match ratios for all schemes decrease in most cases, except that in some cases, the match ratio of SS-RSSI and HS-Phases slightly increase, since the number of objects/tags for pairing is reduced. Nevertheless, CS-RSSI and CS-Phase respectively achieve a match ratio of

R53IHS-P (a)The match ratio for pairing 10(b)The match ratio for pairing dif-(c)The match ratio with different(d)The match ratio with different tagged objects ferent numbers of tagged objects vertical distances horizontal distances (e)The match ratio with different(f)The match ratio with different(g)The match ratio with different(h)The time delay for different ratios of missing objects ratios of missing tags numbers of samplings schemes Figure 9.The experiment results near 60%and 72%when the ratio of missing objects/tags is as shown in Figure 10.Thus,we have implemented an appli- even 50%. cation which were executed on a SAMSUNG PC equipped with an Intel(R)Core(TM)I5 1.4GHz CPU and 4G RAM.The Robustness to different numbers of samplings PC is remotely connected to the system via WiFi.Figure 11 Our solution achieves good performance in robustness with shows an example user interface of our application.The left different numbers of samplings ranging from 3 to 15 during window shows the camera's view from the Kinect,while the continuous scanning.Figure 9(g)shows the experiment re- right window shows the detailed description of the specified sults.As the number of samplings increases from 1 to 15,we object.Once the button "Scan"is pressed,the system runs find that the match ratio of CS-RSSI rapidly increases from our continuous scanning-based algorithm to match the objects 30%to 75%,while the match ratio of CS-Phase first rapidly with the tags,and then draws multiple bounding boxes on the increases to 91%when the number of samplings is 3,then camera's view based on the scanning results.Each bounding slowly increases to 96%when the number of samplings is 15. box (in blue color)is actually a rectangle which distinguishes This implies that CS-Phase is more robust to the low sampling the object from the background based on the color/depth gra- situation than CS-RSSI,since CS-Phase requires only a few dient between the object and background.When the specified phase-pair samples to figure out the position according to the bounding box is further clicked,the detailed information such intersections of multiple hyperbolas. as the ID photo,name,age,job and interests are displayed in the right window. Evaluate the Time Efficiency The time efficiency mainly depends on the number of sam- plings and the rotation speed in continuous scanning.Since the rotation speed is device-dependent,we thus evaluate the time-efficiency via the number of samplings.As shown in Figure 9(h).SS-RSSI achieves the least time delay,as it only requires to scan once,whereas CS-RSSI achieves the most time delay,as it requires to scan multiple times to find the peak point via continuous scanning,HS-Phase and CS-Phase achieve the medium time delay,as basically 3~4 samplings is enough for them to estimate the position of tagged objects. Figure 10.Example deployment of multiple human subjects wearing RFID badges in the cafe CASE STUDY:RECOGNIZE MULTIPLE TAGGED HUMAN SUBJECTS IN THE CAFE In order to further evaluate the real performance of our system by considering more practical issues (e.g.indoor multi-path and energy absorption),we do more thorough experiments in a more realistic setting. User Interface:In this case study,a major task of our system is to recognize multiple tagged human subjects in the cafe and further show their inherent information in the camera's view, Figure 11.Example application interface

SS−RSSI HS−Phase CS−RSSI CS−Phase 0 0.2 0.4 0.6 0.8 1 The match ratio (a) The match ratio for pairing 10 tagged objects 3 5 10 15 0 0.2 0.4 0.6 0.8 1 The cardinality of tagged objects The match ratio SS−RSSI HS−Phase CS−RSSI CS−Phase (b) The match ratio for pairing different numbers of tagged objects 10cm 30cm 50cm 0 0.2 0.4 0.6 0.8 1 The average vertical distance among tagged objects The match ratio SS−RSSI HS−Phase CS−RSSI CS−Phase (c) The match ratio with different vertical distances 10cm 30cm 50cm 0 0.2 0.4 0.6 0.8 1 The average horizontal distance among tagged objects The match ratio SS−RSSI HS−Phase CS−RSSI CS−Phase (d) The match ratio with different horizontal distances 10% 30% 50% 0 0.2 0.4 0.6 0.8 1 The ratio of missing objects The match ratio SS−RSSI HS−Phase CS−RSSI CS−Phase (e) The match ratio with different ratios of missing objects 10% 30% 50% 0 0.2 0.4 0.6 0.8 1 The ratio of missing tags The match ratio SS−RSSI HS−Phase CS−RSSI CS−Phase (f) The match ratio with different ratios of missing tags 1 3 5 7 10 15 0 0.2 0.4 0.6 0.8 1 The number of samplings during continuous scanning The match ratio CS−RSSI CS−Phase (g) The match ratio with different numbers of samplings SS−RSSI HS−Phase CS−RSSI CS−Phase 0 5 10 15 The required number of samplings (h) The time delay for different schemes Figure 9. The experiment results near 60% and 72% when the ratio of missing objects/tags is even 50%. Robustness to different numbers of samplings Our solution achieves good performance in robustness with different numbers of samplings ranging from 3 to 15 during continuous scanning. Figure 9(g) shows the experiment results. As the number of samplings increases from 1 to 15, we find that the match ratio of CS-RSSI rapidly increases from 30% to 75%, while the match ratio of CS-Phase first rapidly increases to 91% when the number of samplings is 3, then slowly increases to 96% when the number of samplings is 15. This implies that CS-Phase is more robust to the low sampling situation than CS-RSSI, since CS-Phase requires only a few phase-pair samples to figure out the position according to the intersections of multiple hyperbolas. Evaluate the Time Efficiency The time efficiency mainly depends on the number of samplings and the rotation speed in continuous scanning. Since the rotation speed is device-dependent, we thus evaluate the time-efficiency via the number of samplings. As shown in Figure 9(h), SS-RSSI achieves the least time delay, as it only requires to scan once, whereas CS-RSSI achieves the most time delay, as it requires to scan multiple times to find the peak point via continuous scanning, HS-Phase and CS-Phase achieve the medium time delay, as basically 3∼4 samplings is enough for them to estimate the position of tagged objects. CASE STUDY: RECOGNIZE MULTIPLE TAGGED HUMAN SUBJECTS IN THE CAFE In order to further evaluate the real performance of our system by considering more practical issues (e.g. indoor multi-path and energy absorption), we do more thorough experiments in a more realistic setting. User Interface: In this case study, a major task of our system is to recognize multiple tagged human subjects in the cafe and further show their inherent information in the camera’s view, as shown in Figure 10. Thus, we have implemented an application which were executed on a SAMSUNG PC equipped with an Intel(R) Core(TM) I5 1.4GHz CPU and 4G RAM. The PC is remotely connected to the system via WiFi. Figure 11 shows an example user interface of our application. The left window shows the camera’s view from the Kinect, while the right window shows the detailed description of the specified object. Once the button “Scan” is pressed, the system runs our continuous scanning-based algorithm to match the objects with the tags, and then draws multiple bounding boxes on the camera’s view based on the scanning results. Each bounding box (in blue color) is actually a rectangle which distinguishes the object from the background based on the color/depth gradient between the object and background. When the specified bounding box is further clicked, the detailed information such as the ID photo, name, age, job and interests are displayed in the right window. Figure 10. Example deployment of multiple human subjects wearing RFID badges in the cafe Experiment Results Scan Button Detailed Description Object’s Image Camera View Match Algorithm Name: Liang Job: Engineer Age: 25 Interest: Football, Music Figure 11. Example application interface

Experiment Settings:As shown in Figure 10,we let multiple human subjects (4~8 people)stand or sit freely in the cafe, while wearing the RFID tagged badges.These"tagged"hu- man subjects are thus different in terms of heights,horizontal distance and vertical distance.Besides,they can be slightly 0 mer moving or turning with a limited speed or angle.It raises CS-Phase CS-RSSI CS-Phase more challenges than the free-space testing,since the human (a)Stationary situation (b)Slightly moving situation body may lead to many interferences like multi-path effect and energy absorption.We conducted experiments to evaluate the performance of match ratios,by varying the factors like the number of human subjects,the spacing between human subjects,and the moving state.We deploy our system in front of the human subjects with a distance of 1.5~3m.The default number of human subjects and the default average spacing is 26m0.9m people 5 people 8 people respectively 6 and 60 cm. (c)Different average spacing (d)Different number of people Performance Evaluation:Our solution can achieve fairly Figure 12.Evaluate the match ratios good matching accuracy to recognize multiple tagged human backgrounds.For the accuracy,most of the people have posi- subjects of different factors like the height,spacing,moving tive/very positive evaluation,since in most cases our solution state,etc.Figure 12(a)-(d)respectively shows the match ra- can achieve very good performance in accuracy.For the friend- tios with different configurations.Without loss of generality. liness of user interface,most of the people have positive/very we show the matching results of 5 randomly generated de- positive evaluation.due to the interesting vet simple design ployments with different spacing and heights of the human 40 subjects.In the first experiment,we let the human subjects remain stationary,i.e.,standing or sitting still,and evaluate the match ratios.As shown in Figure 12(a),our solution achieves a match ratio of 50%and 80%respectively with CS-RSSI and CS-Phase.In the second experiment,we let the human sub- jects keep in slightly moving state,i.e.,they may be moving or turning with a limited speed(<40cm/s)or angle (<30/s). Figure 13.Evaluation of the user experience:1)application meaning,2) As shown in Figure 12(b),our solution achieves a match ratio technical complexity,3)accuracy,4)friendliness of user interface. of 60%and 74%respectively with CS-RSSI and CS-Phase CONCLUSION AND FUTURE WORK In the third experiment,we vary the average spacing between In this paper,we design an RFID-based system to identify and the human subjects from 60cm to 90cm.As shown in Figure distinguish multiple RFID tagged objects in an augmented 12(c),our solution achieves an average match ratio of over reality system.We deploy additional RFID antennas to the 50%and 75%respectively with CS-RSSI and CS-Phase.In COTS depth camera,and propose a continuous scanning-based the fourth experiment,we vary the number of human subjects scheme to distinguish multiple tagged objects.The current from 4 to 8.As shown in Figure 12(d),our solution achieves implementation is a proof-of concept prototype for the"tell an average match ratio of over 45%and 70%respectively me what I see"vision.The size of the system is huge for wear- with CS-RSSI and CS-Phase.The performance reduction of able usages,and the battery usage is high for conventional CS-RSSI in the above experiments is mainly due to the energy applications.In the future design,we consider to miniatur- absorption of human bodies,which distracts the conventional ize the technical solution and integrate it into the wearable distribution of RSSI in RF-signals.Nevertheless.CS-Phase devices.For example,we can miniaturize the RFID antennas always achieves fairly good performance since the phase in and the 3D camera,and integrate them into the wearable hel- RF-signals is irrelevant to the energy absorption problems. mets/glasses for augmented reality applications.In this way, in order to perform the continuous scanning,the user only User Experience Evaluation:We invite a total of 44 people need to continuously turn her head from one side to the other (28 males and 16 females with different technical backgrounds. side with a certain angle.All the inherent information of the their ages range from 20 to 58)to use our system in the aug- detected objects can be shown on the screen of glasses mented reality applications,and evaluate their user experience Acknowledgments via the questionnaire surveys,including 1)application mean- ing,2)technical complexity,3)accuracy and 4)friendliness This work is supported in part by National Natural Science Foundation of China under Grant Nos.61472185,61373129 of user interface.Figure 13 shows the evaluation results.For 61321491,91218302,61502224:JiangSu Natural Science the application meaning,most of the people have positive/very Foundation,No.BK20151390:EU FP7 IRSES MobileCloud positive evaluation,they believe it is a promising approach Project under Grant No.612212;CCF-Tencent Open Fund for future augmented reality application.For the technical This work is partially supported by Collaborative Innovation complexity,several people have some negative evaluation, Center of Novel Software Technology and Industrialization. this is mainly because the current prototype system is fairly huge in size,and the RSSI/phase-based continuous scanning The work of Jie Wu was supported in part by NSF grants CNS 1449860,CNS1461932,CNS1460971,CNS1439672,CNS method may not be so intuitive for users with various technical 1301774.and ECCS1231461

Experiment Settings: As shown in Figure 10, we let multiple human subjects (4∼8 people) stand or sit freely in the cafe, while wearing the RFID tagged badges. These “tagged” human subjects are thus different in terms of heights, horizontal distance and vertical distance. Besides, they can be slightly moving or turning with a limited speed or angle. It raises more challenges than the free-space testing, since the human body may lead to many interferences like multi-path effect and energy absorption. We conducted experiments to evaluate the performance of match ratios, by varying the factors like the number of human subjects, the spacing between human subjects, and the moving state. We deploy our system in front of the human subjects with a distance of 1.5∼3m. The default number of human subjects and the default average spacing is respectively 6 and 60 cm. Performance Evaluation: Our solution can achieve fairly good matching accuracy to recognize multiple tagged human subjects of different factors like the height, spacing, moving state, etc. Figure 12(a)-(d) respectively shows the match ratios with different configurations. Without loss of generality, we show the matching results of 5 randomly generated deployments with different spacing and heights of the human subjects. In the first experiment, we let the human subjects remain stationary, i.e., standing or sitting still, and evaluate the match ratios. As shown in Figure 12(a), our solution achieves a match ratio of 50% and 80% respectively with CS-RSSI and CS-Phase. In the second experiment, we let the human subjects keep in slightly moving state, i.e., they may be moving or turning with a limited speed (<40cm/s) or angle (<30◦ /s). As shown in Figure 12(b), our solution achieves a match ratio of 60% and 74% respectively with CS-RSSI and CS-Phase. In the third experiment, we vary the average spacing between the human subjects from 60cm to 90cm. As shown in Figure 12(c), our solution achieves an average match ratio of over 50% and 75% respectively with CS-RSSI and CS-Phase. In the fourth experiment, we vary the number of human subjects from 4 to 8. As shown in Figure 12(d), our solution achieves an average match ratio of over 45% and 70% respectively with CS-RSSI and CS-Phase. The performance reduction of CS-RSSI in the above experiments is mainly due to the energy absorption of human bodies, which distracts the conventional distribution of RSSI in RF-signals. Nevertheless, CS-Phase always achieves fairly good performance since the phase in RF-signals is irrelevant to the energy absorption problems. User Experience Evaluation: We invite a total of 44 people (28 males and 16 females with different technical backgrounds, their ages range from 20 to 58) to use our system in the augmented reality applications, and evaluate their user experience via the questionnaire surveys, including 1) application meaning, 2) technical complexity, 3) accuracy and 4) friendliness of user interface. Figure 13 shows the evaluation results. For the application meaning, most of the people have positive/very positive evaluation, they believe it is a promising approach for future augmented reality application. For the technical complexity, several people have some negative evaluation, this is mainly because the current prototype system is fairly huge in size, and the RSSI/phase-based continuous scanning method may not be so intuitive for users with various technical CS-RSSI CS-Phase The match ratio 0 0.2 0.4 0.6 0.8 1 Deployment 1 Deployment 2 Deployment 3 Deployment 4 Deployment 5 (a) Stationary situation CS-RSSI CS-Phase The match ratio 0 0.2 0.4 0.6 0.8 1 Deployment 1 Deployment 2 Deployment 3 Deployment 4 Deployment 5 (b) Slightly moving situation CS-RSSI CS-Phase The match ratio 0 0.5 1 0.6m 0.9m (c) Different average spacing CS-RSSI CS-Phase The match ratio 0 0.5 1 4 people 6 people 8 people (d) Different number of people Figure 12. Evaluate the match ratios backgrounds. For the accuracy, most of the people have positive/very positive evaluation, since in most cases our solution can achieve very good performance in accuracy. For the friendliness of user interface, most of the people have positive/very positive evaluation, due to the interesting yet simple design. 1 2 3 4 Number of users 0 10 20 30 40 Negative Neutral Positive Very Positive Figure 13. Evaluation of the user experience: 1) application meaning, 2) technical complexity, 3) accuracy, 4) friendliness of user interface. CONCLUSION AND FUTURE WORK In this paper, we design an RFID-based system to identify and distinguish multiple RFID tagged objects in an augmented reality system. We deploy additional RFID antennas to the COTS depth camera, and propose a continuous scanning-based scheme to distinguish multiple tagged objects. The current implementation is a proof-of concept prototype for the “tell me what I see” vision. The size of the system is huge for wearable usages, and the battery usage is high for conventional applications. In the future design, we consider to miniaturize the technical solution and integrate it into the wearable devices. For example, we can miniaturize the RFID antennas and the 3D camera, and integrate them into the wearable helmets/glasses for augmented reality applications. In this way, in order to perform the continuous scanning, the user only need to continuously turn her head from one side to the other side with a certain angle. All the inherent information of the detected objects can be shown on the screen of glasses. Acknowledgments This work is supported in part by National Natural Science Foundation of China under Grant Nos. 61472185, 61373129, 61321491, 91218302, 61502224; JiangSu Natural Science Foundation, No. BK20151390; EU FP7 IRSES MobileCloud Project under Grant No. 612212; CCF-Tencent Open Fund. This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. The work of Jie Wu was supported in part by NSF grants CNS 1449860, CNS 1461932, CNS 1460971, CNS 1439672, CNS 1301774, and ECCS 1231461

点击进入文档下载页（PDF格式）

共12页，试读已结束，阅读完整版请下载

点击下载（PDF格式）

浏览记录