计算机科学与技术（参考文献）TaggedAR - An RFID-based Approach for Recognition of Multiple Tagged Objects in Augmented Reality Systems

团购合买资源类别：文库，文档格式：PDF，文档页数：15，文件大小：11MB

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 1 TaggedAR: An RFID-based Approach for Recognition of Multiple Tagged Objects in Augmented Reality Systems Lei Xie, Member, IEEE, Chuyu Wang, Student Member, IEEE, Yanling Bu, Student Member, IEEE, Jianqiang Sun, Qingliang Cai, Jie Wu, Fellow, IEEE, and Sanglu Lu, Member, IEEE Abstract—With computer vision-based technologies, current Augmented reality (AR) systems can effectively recognize multiple objects with different visual characteristics. However, only limited degrees of distinctions can be offered among different objects with similar natural features, and inherent information about these objects cannot be effectively extracted. In this paper, we propose TaggedAR, i.e., an RFID-based approach to assist the recognition of multiple tagged objects in AR systems, by deploying additional RFID antennas to the COTS depth camera. By sufficiently exploring the correlations between the depth of field and the received RF-signal, we propose a rotate scanning-based scheme to distinguish multiple tagged objects in the stationary situation, and propose a continuous scanning-based scheme to distinguish multiple tagged human subjects in the mobile situation. By pairing the tags with the objects according to the correlations between the depth of field and RF-signals, we can accurately identify and distinguish multiple tagged objects to realize the vision of “tell me what I see” from the AR system. We have implemented a prototype system to evaluate the actual performance with case studies in real-world environment. The experiment results show that our solution achieves an average match ratio of 91% in distinguishing up to dozens of tagged objects with a high deployment density. Index Terms—Passive RFID; Augmented Reality System; Object Recognition; Prototype Design ✦ 1 INTRODUCTION Augmented Reality (AR) systems (e.g., Microsoft Kinect, Google Glass) are nowadays increasingly used to obtain an augmented view in a real-world environment. For example, by leveraging the computer vision and pattern recognition, depth camera-based devices like the Microsoft Kinect [1] can effectively perform object recognition. Hence, the users can distinguish multiple objects of different categories, e.g., a specified object in the camera can be recognized as a vase, a laptop, or a pillow based on its visual characteristics. However, these techniques can only offer a limited degree of distinctions, since multiple objects of the same type may have very similar physical features, e.g., the system cannot effectively distinguish between two laptops of the same brand, even if they belong to different product models. Moreover, they cannot indicate more inherent information about these objects, e.g., the specific configurations, the manufacturers, and production date of the laptop. It is rather difficult to provide these functions by purely leveraging the computer vision-based technology. • Lei Xie, Chuyu Wang, Yanling Bu, Jianqiang Sun, Qingliang Cai and Sanglu Lu are with the State Key Laboratory for Novel Software Technology, Nanjing University, China. E-mail: lxie@nju.edu.cn, wangcyu217@dislab.nju.edu.cn, yanling@smail.nju.edu.cn, {SunJQ,caiqingliang}@dislab.nju.edu.cn, sanglu@nju.edu.cn. • Jie Wu is with the Department of Computer Information and Sciences, Temple University, USA. E-mail: jiewu@temple.edu. Lei Xie and Sanglu Lu are the co-corresponding authors. (a) Scenario 1: Recognize different human subjects in the cafe (b) Scenario 2: Recognize different cultural relics in the museum Fig. 1. Typical scenarios of “Tell me what I see” from the AR system Nevertheless, the RFID technology has brought new opportunities to meet the new demands [2, 3]. The RFID tags can be used to label different objects, and store inherent information of these objects in their onboard memory. In comparison to the optical markers such as QR code, the COTS RFID tag has an onboard memory with up to 4K or 8K bytes, and it can be effectively identified even if it is hidden in/under the object. This provides us with an opportunity to effectively distinguish these objects, even if they have very similar natural features from the visual sense. Fig. 1 shows two typical application scenarios. The first scenario is to recognize different human subjects in the cafe, as shown in Fig. 1(a). In this scenario, multiple people are standing or sitting together in the cafe, while they are wearing the RFID tagged badges. From the camera’s view, the depth camera such as Kinect can recognize multiple human subjects, and capture the depth from its embedded depth sensor, which is associated with the distance to the camera. The RFID reader can identify multiple tags within the scanning range, moreover, it is able to extract the signal features like the

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing 2 phase and RSSI from the RFID tags.By pairing these in- interference.The second challenge is mitigating the interfer- formation together,the vision of "tell me what I see"can ences from the multi-path effect,object occlusion in real set- be effectively realized in the AR system.In comparison tings.These issues lead to nonnegligible interference to pair to the pure AR system,which can only show some basic the tags with the objects,such as the missing tags/objects information like the gender and race according to the vision- which fail to be identified as well as extra objects which based pattern recognition,by leveraging this novel RFID are untagged.The third challenge is designing an efficient assisted AR technology,the inherent information such as solution without any additional assistance,like the anchor their names,jobs and titles can be directly extracted from nodes.It is impractical to intentionally deploy anchor nodes the RFID tags and associated with the corresponding human in real AR applications due to intensive deployment costs subjects in the camera's view.For example,when we are on manpower and time. meeting multiple unknown people wearing RFID badges in This paper presents the first study of using RFID to assist public events,the system can effectively help us recognize recognizing multiple objects in AR systems(a preliminary these people by illustrating the detailed information on the version of this work appeared in [8)).Specifically,we make camera's view in a smart glass.The second scenario is to three key contributions:1)We propose TaggedAR to real- recognize different cultural relics in the museum,as shown ize the vision "tell me what I see"from AR systems.By in Fig.1(b).In this scenario,multiple cultural relics like the sufficiently exploring the correlations between the depth of ancient potteries are placed on the display racks.Due to the field and the RF-signal,we propose a rotate scanning-based same craftsmanship,they might have very similar natural scheme to distinguish multiple tagged objects in the sta- features like the color and shape from the visual sense.This tionary situation,and propose a continuous scanning-based prohibits the pure AR system from distinguishing different scheme to distinguish multiple tagged human subjects in objects when they have very similar physical features.In the mobile situation.2)We efficiently tackle the interference contrast,using our RFID assisted AR technology,these ob- from the multi-path effect,object occlusion in real settings, jects can be easily distinguished according to the differences by reducing this problem to a stable marriage problem in the labeling tags.In summary,the advantages of RFID and propose a stable-matching-based solution to mitigate assisted AR systems over the pure AR systems lie in the the interferences from the outliers.3)We implemented a essential capability of identification and localization in RFID. prototype system and evaluated the performance with case Although many schemes for RFID-based localization studies in real-world environment.Our solution achieves an [4,5]have been proposed,they mainly focus on the ab- average match ratio of 91%in distinguishing up to dozens solute object localization,and usually require anchor nodes of RFID tagged objects with a high deployment density. like reference tags for accurate localization.They are not suitable for distinguishing multiple tagged objects because 2 RELATED WORK of two reasons.First,we only require distinguishing the Pattern recognition via depth camera:Pattern recognition relative location instead of absolute location of multiple via depth camera mainly leverages the depth and RGB tagged objects,by pairing the tags to the objects based on captured from the camera to recognize objects in a computer the correlation between the depth of field and RE-signals. vision-based approach.Based on the depth processing [9],a Second,the depth camera cannot effectively use the anchor number of technologies are proposed in object recognition nodes,and it is impractical to deploy multiple anchor nodes [10]and gesture recognition [11,12].Nirjon et al.solve the in most AR applications. problem of localizing and tracking household objects using In this paper,we leverage the RFID technology [6,7]to depth-camera sensors [13].The Kinect-based pose estima- further label different objects with RFID tags.We deploy tion method [11]is proposed in the context of physical additional RFID antennas to the COTS depth camera.To exercise,examining the accuracy of joint localization and recognize the stationary tagged objects,we propose a rotate robustness of pose estimation with respect to the orientation scanning-based scheme to scan the objects,i.e.,the system and occlusions. continuously rotates and samples the depth of field and RF- Batteryless sensing via RFID:RFID has recently been signals from these tagged objects.We extract the phase value investigated as a new scheme of batteryless sensing,includ- from RF-signal,and pair the tags with the objects according ing indoor localization [14],activity sensing [15],physical to the correlation between the depth value and phase value. object search [16],etc.Prior work on RFID-based localization Similarly,to recognize the mobile tagged human subjects, primarily relied on Received Signal Strength [14]or Angle we propose a continuous scanning-based scheme to scan of Arrival [17]to acquire the absolute location of an object. the human subjects,i.e.,the system continuously samples The state-of-the-art systems use the phase value to estimate the depth of field and RF-signals from these tagged human the absolute or relative location of an object with higher subjects.In this way,we can accurately identify and distin- accuracy [6,18-20].RF-IDraw uses a 2-dimensional array guish multiple tagged objects,by sufficiently exploring the of RFID antennas to track the movement trajectory of one correlations between the depth of field and the RF-signal. finger attached with an RFID tag so that it can reconstruct However,there are several challenges in distinguishing the trajectory shape of the specified finger [21].Tagoram multiple tagged objects in AR systems.The first challenge exploits tag mobility to build a virtual antenna array,and is conducting accurate paring between the objects and the uses differential augmented hologram to facilitate the in- tags.In real applications,the tagged objects are usually stant tracking of a mobile RFID tag [4]. placed in very close proximity,and the number of objects Combined use in augmented reality environment: is usually in the order of dozens.It is difficult to realize Recent works further consider using both depth camera accurate paring due to the large cardinality and mutual and RFID for indoor localization and object recognition in 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 2 phase and RSSI from the RFID tags. By pairing these information together, the vision of “tell me what I see” can be effectively realized in the AR system. In comparison to the pure AR system, which can only show some basic information like the gender and race according to the visionbased pattern recognition, by leveraging this novel RFID assisted AR technology, the inherent information such as their names, jobs and titles can be directly extracted from the RFID tags and associated with the corresponding human subjects in the camera’s view. For example, when we are meeting multiple unknown people wearing RFID badges in public events, the system can effectively help us recognize these people by illustrating the detailed information on the camera’s view in a smart glass. The second scenario is to recognize different cultural relics in the museum, as shown in Fig. 1(b). In this scenario, multiple cultural relics like the ancient potteries are placed on the display racks. Due to the same craftsmanship, they might have very similar natural features like the color and shape from the visual sense. This prohibits the pure AR system from distinguishing different objects when they have very similar physical features. In contrast, using our RFID assisted AR technology, these objects can be easily distinguished according to the differences in the labeling tags. In summary, the advantages of RFID assisted AR systems over the pure AR systems lie in the essential capability of identification and localization in RFID. Although many schemes for RFID-based localization [4, 5] have been proposed, they mainly focus on the absolute object localization, and usually require anchor nodes like reference tags for accurate localization. They are not suitable for distinguishing multiple tagged objects because of two reasons. First, we only require distinguishing the relative location instead of absolute location of multiple tagged objects, by pairing the tags to the objects based on the correlation between the depth of field and RF-signals. Second, the depth camera cannot effectively use the anchor nodes, and it is impractical to deploy multiple anchor nodes in most AR applications. In this paper, we leverage the RFID technology [6, 7] to further label different objects with RFID tags. We deploy additional RFID antennas to the COTS depth camera. To recognize the stationary tagged objects, we propose a rotate scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RFsignals from these tagged objects. We extract the phase value from RF-signal, and pair the tags with the objects according to the correlation between the depth value and phase value. Similarly, to recognize the mobile tagged human subjects, we propose a continuous scanning-based scheme to scan the human subjects, i.e., the system continuously samples the depth of field and RF-signals from these tagged human subjects. In this way, we can accurately identify and distinguish multiple tagged objects, by sufficiently exploring the correlations between the depth of field and the RF-signal. However, there are several challenges in distinguishing multiple tagged objects in AR systems. The first challenge is conducting accurate paring between the objects and the tags. In real applications, the tagged objects are usually placed in very close proximity, and the number of objects is usually in the order of dozens. It is difficult to realize accurate paring due to the large cardinality and mutual interference. The second challenge is mitigating the interferences from the multi-path effect, object occlusion in real settings. These issues lead to nonnegligible interference to pair the tags with the objects, such as the missing tags/objects which fail to be identified as well as extra objects which are untagged. The third challenge is designing an efficient solution without any additional assistance, like the anchor nodes. It is impractical to intentionally deploy anchor nodes in real AR applications due to intensive deployment costs on manpower and time. This paper presents the first study of using RFID to assist recognizing multiple objects in AR systems (a preliminary version of this work appeared in [8]). Specifically, we make three key contributions : 1) We propose TaggedAR to realize the vision “tell me what I see” from AR systems. By sufficiently exploring the correlations between the depth of field and the RF-signal, we propose a rotate scanning-based scheme to distinguish multiple tagged objects in the stationary situation, and propose a continuous scanning-based scheme to distinguish multiple tagged human subjects in the mobile situation. 2) We efficiently tackle the interference from the multi-path effect, object occlusion in real settings, by reducing this problem to a stable marriage problem and propose a stable-matching-based solution to mitigate the interferences from the outliers. 3) We implemented a prototype system and evaluated the performance with case studies in real-world environment. Our solution achieves an average match ratio of 91% in distinguishing up to dozens of RFID tagged objects with a high deployment density. 2 RELATED WORK Pattern recognition via depth camera: Pattern recognition via depth camera mainly leverages the depth and RGB captured from the camera to recognize objects in a computer vision-based approach. Based on the depth processing [9], a number of technologies are proposed in object recognition [10] and gesture recognition [11, 12]. Nirjon et al. solve the problem of localizing and tracking household objects using depth-camera sensors [13]. The Kinect-based pose estimation method [11] is proposed in the context of physical exercise, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions. Batteryless sensing via RFID: RFID has recently been investigated as a new scheme of batteryless sensing, including indoor localization [14] , activity sensing [15], physical object search [16], etc. Prior work on RFID-based localization primarily relied on Received Signal Strength [14] or Angle of Arrival [17] to acquire the absolute location of an object. The state-of-the-art systems use the phase value to estimate the absolute or relative location of an object with higher accuracy [6, 18–20]. RF-IDraw uses a 2-dimensional array of RFID antennas to track the movement trajectory of one finger attached with an RFID tag so that it can reconstruct the trajectory shape of the specified finger [21]. Tagoram exploits tag mobility to build a virtual antenna array, and uses differential augmented hologram to facilitate the instant tracking of a mobile RFID tag [4]. Combined use in augmented reality environment: Recent works further consider using both depth camera and RFID for indoor localization and object recognition in

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing 3 augmented reality environment [22-26].Wang et al.propose multiple Alien 9640 general purpose tags;for the depth an indoor real-time location system combined with active camera,we use the Microsoft Kinect for windows.They RFID and Kinect by leveraging the positioning feature of are both connected to a laptop placed on the mobile robot. identified RFID and the object extraction ability of Kinect. The mobile robot can perform a 360 degree rotation along Klompmaker et al.use RFID and depth-sensing cameras with the rotation axis.By attaching the RFID tags to the to enable personalized authenticated tangible interactions specified objects,to recognize the stationary tagged objects, on a tabletop [23].Galatas et al.propose a multimodal we propose a rotate scanning-based scheme to scan the context-aware localization system,by using RFID and 3D objects,i.e.,the system continuously rotates and samples audio-visual information from 2 Kinect sensors deployed the depth of field and RF-signals from these tagged objects. at various locations [24].Cerrada et al.present a method In this way,we can obtain the depth of the specified objects to improve the object recognition by combining the vision- from the depth sensor inside the depth camera,we can also based techniques applied to the range-sensor captured 3D extract the signal features such as the RSSI and phase values data,and object identification obtained from RFID tags from the RF-signals of the RFID tags.Similarly,to recognize [25].Li et al.present a hybrid computer vision and RFID the mobile tagged human subjects,we propose a continuous system ID-Match,it uses a novel reverse synthetic aperture scanning-based scheme to scan the human subjects,i.e.,the technique to recover the relative motion paths of RFID tags system continuously samples the depth of field and RF- worn by people,and correlate that to physical motion paths signals from these tagged human subjects.By accurately of individuals as measured with a 3D depth camera [26]. pairing these information,the tags and the objects can be Duan et al.present TagVision,a hybrid RFID and computer effectively bound together. vision system for fine-grained localization and tracking of tagged objects [27].Instead of simply performing indoor 3.2.2 Software Framework localization or object recognition,in this paper,we aim to The software framework is mainly composed of three layers, identify and distinguish multiple tagged objects with depth i.e.,the sensor data collection layer,the middleware layer, camera and RFID antennas.Our solution does not require and the application layer,as shown in Fig.2(b).For the any anchor nodes for assistance,and only leverages at most sensor data collection layer,the depth camera recognizes two RFID antennas for rotate/continuous scanning,which multiple objects and collects the corresponding depth dis- greatly relieves the intensive deployment cost and makes tribution,while the RFID system collects multiple tag IDs our solution more practical in various scenarios. and extracts the corresponding RSSIs or phases from the RF-signals of RFID tags.For the middleware layer,we aim SYSTEM OVERVIEW to sample and extract some features from the raw sensor 3.1 Design Goals data,and conduct an accurate matching among the objects To realize the vision of "tell me what I see from the aug- and RFID tags.For the application layer,the AR applications mented system,we aim to propose an RFID-based approach can use the matching results directly to realize various ob- to use RFID tags to label different objects.Therefore,we jectives.In the following sections,without loss of generality, need to collect the responses from multiple tags and objects, we evaluate the performance using the Microsoft Kinect for and then pair the RFID tags to the corresponding objects, windows,the ImpinJ R420 reader,two Laird S9028 RFID according to the correlations between the depth of field and antennas,and multiple Alien 9640 general purpose tags.We RF-signals,such that the information stored in the RFID tag attach each tags to one object,and use the Kinect as the can be used to illustrate the specified objects in a detailed depth-camera and use the RFID reader to scan the tags. approach.Hence,we need to consider the following metrics 3D. Applications in regard to system performance:1)Accuracy:Since the Camera RFID objects are usually placed in very close proximity,there is Matching ntennas Algorithm a high accuracy requirement in distinguishing these objects, Rotating 1 i.e.,the average match ratios should be greater than a Module Feature Sampling and Extraction certain value,e.g.,85%.2)Robustness:The environmental Laptop factors,like the multi-path effect and partial occlusion,may RFID Reader cause the responses from the tagged objects to be missing or distorted.Besides,the tagged objects could be partially 3D hidden behind each other due to the randomness in the deployment.The solution should be robust to these noises (a)Prototype System (b)Software framework and distractions Fig.2.System Framework 3.2 System Framework 4 FEATURE SAMPLING AND EXTRACTION 3.2.1 System Prototype 4.1 Extract the Depth of Field from Depth-Camera We design a system prototype as shown in Fig.2(a).We Depth cameras,such as the Microsoft Kinect,are a kind deploy one or two additional RFID antennas to the COTS of range camera,which produces a 2D image showing depth camera.The RFID antenna(s)and the depth camera the distance to points in a scene from a specific point, are fixed to a rotating shaft so that they can rotate simul- normally associated with a depth sensor.The depth sensor taneously.For the RFID system,we use the COTS Impin] usually consists of an infrared laser projector combined with R420 reader [28],one or two Laird S9028 antennas,and a monochrome CMOS sensor,which captures the depth. 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 3 augmented reality environment [22–26]. Wang et al. propose an indoor real-time location system combined with active RFID and Kinect by leveraging the positioning feature of identified RFID and the object extraction ability of Kinect. Klompmaker et al. use RFID and depth-sensing cameras to enable personalized authenticated tangible interactions on a tabletop [23]. Galatas et al. propose a multimodal context-aware localization system, by using RFID and 3D audio-visual information from 2 Kinect sensors deployed at various locations [24]. Cerrada et al. present a method to improve the object recognition by combining the visionbased techniques applied to the range-sensor captured 3D data, and object identification obtained from RFID tags [25]. Li et al. present a hybrid computer vision and RFID system ID-Match, it uses a novel reverse synthetic aperture technique to recover the relative motion paths of RFID tags worn by people, and correlate that to physical motion paths of individuals as measured with a 3D depth camera [26]. Duan et al. present TagVision, a hybrid RFID and computer vision system for fine-grained localization and tracking of tagged objects [27]. Instead of simply performing indoor localization or object recognition, in this paper, we aim to identify and distinguish multiple tagged objects with depth camera and RFID antennas. Our solution does not require any anchor nodes for assistance, and only leverages at most two RFID antennas for rotate/continuous scanning, which greatly relieves the intensive deployment cost and makes our solution more practical in various scenarios. 3 SYSTEM OVERVIEW 3.1 Design Goals To realize the vision of “tell me what I see ” from the augmented system, we aim to propose an RFID-based approach to use RFID tags to label different objects. Therefore, we need to collect the responses from multiple tags and objects, and then pair the RFID tags to the corresponding objects, according to the correlations between the depth of field and RF-signals, such that the information stored in the RFID tag can be used to illustrate the specified objects in a detailed approach. Hence, we need to consider the following metrics in regard to system performance: 1) Accuracy: Since the objects are usually placed in very close proximity, there is a high accuracy requirement in distinguishing these objects, i.e., the average match ratios should be greater than a certain value, e.g., 85%. 2) Robustness: The environmental factors, like the multi-path effect and partial occlusion, may cause the responses from the tagged objects to be missing or distorted. Besides, the tagged objects could be partially hidden behind each other due to the randomness in the deployment. The solution should be robust to these noises and distractions. 3.2 System Framework 3.2.1 System Prototype We design a system prototype as shown in Fig. 2(a). We deploy one or two additional RFID antennas to the COTS depth camera. The RFID antenna(s) and the depth camera are fixed to a rotating shaft so that they can rotate simultaneously. For the RFID system, we use the COTS ImpinJ R420 reader [28], one or two Laird S9028 antennas, and multiple Alien 9640 general purpose tags; for the depth camera, we use the Microsoft Kinect for windows. They are both connected to a laptop placed on the mobile robot. The mobile robot can perform a 360 degree rotation along with the rotation axis. By attaching the RFID tags to the specified objects, to recognize the stationary tagged objects, we propose a rotate scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RF-signals from these tagged objects. In this way, we can obtain the depth of the specified objects from the depth sensor inside the depth camera, we can also extract the signal features such as the RSSI and phase values from the RF-signals of the RFID tags. Similarly, to recognize the mobile tagged human subjects, we propose a continuous scanning-based scheme to scan the human subjects, i.e., the system continuously samples the depth of field and RFsignals from these tagged human subjects. By accurately pairing these information, the tags and the objects can be effectively bound together. 3.2.2 Software Framework The software framework is mainly composed of three layers, i.e., the sensor data collection layer, the middleware layer, and the application layer, as shown in Fig. 2(b). For the sensor data collection layer, the depth camera recognizes multiple objects and collects the corresponding depth distribution, while the RFID system collects multiple tag IDs and extracts the corresponding RSSIs or phases from the RF-signals of RFID tags. For the middleware layer, we aim to sample and extract some features from the raw sensor data, and conduct an accurate matching among the objects and RFID tags. For the application layer, the AR applications can use the matching results directly to realize various objectives. In the following sections, without loss of generality, we evaluate the performance using the Microsoft Kinect for windows, the ImpinJ R420 reader, two Laird S9028 RFID antennas, and multiple Alien 9640 general purpose tags. We attach each tags to one object, and use the Kinect as the depth-camera and use the RFID reader to scan the tags. 3DCamera RFID Antennas Rotating Module RFID Reader Laptop Rotation Axis (a) Prototype System Applications Application Matching Algorithm Feature Sampling and Extraction Middleware Depth 3D Camera RFID System RSSI Phase Sensor data collection (b) Software framework Fig. 2. System Framework 4 FEATURE SAMPLING AND EXTRACTION 4.1 Extract the Depth of Field from Depth-Camera Depth cameras, such as the Microsoft Kinect, are a kind of range camera, which produces a 2D image showing the distance to points in a scene from a specific point, normally associated with a depth sensor. The depth sensor usually consists of an infrared laser projector combined with a monochrome CMOS sensor, which captures the depth

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing 4000 300 250 3000 6 2000 150 100 1000 50 150 200 0 150-100 50 050100150 40 Depth (cm) The horizontal coordinate:x(cm) (a)Depth histogram of multiple objects (b)Depth of objects in different horizontal lines (c)Depth histogram of the same object at differ- ent distances Fig.3.Experiment results of depth value Therefore,the depth camera can effectively estimate the To extract the depth of specified objects from the depth distance to a specified object according to the depth,because histogram of multiple objects,we set a threshold t to detect the depth is linearly increasing with the distance.If multiple the peaks in regard to the number of pixels.We thus iterate objects are placed at different positions in the scene,they are from the minimum depth to the maximum depth in the usually at different distances away from the depth camera. histogram,if the number of pixels for a certain depth is Therefore,it is possible to distinguish among different ob- larger than t,we identify it as a peak p(di,ni)with the jects according to the depth values from the depth camera. depth di and the number of pixels ni.It is found that for an In order to understand the characteristics of the depth in- irregularly-shaped object,the depth sensor usually detects formation collected from the depth camera,we conduct real multiple peaks with intermittent depths.In order to address experiments to obtain more observations.We first conduct the multiple-peaks problem of irregularly-shaped objects, an experiment to evaluate the characteristics of the depth. we set another threshold Ad.If the differences of these Without loss of generality,each experiment observation is peaks'depth values are smaller than Ad,we then combine summarized from the statistic properties of 100 repeatable them as one peak.Both the value of t and Ad are selected observations.We arbitrarily place three objects A,B,and C based on the empirical value from a number of experimental in front of the depth camera,i.e.,Microsoft Kinect,object A studies(t=200 and Ad=10cm in our implementation).Then, is a box at distance 68cm,object B is a can at distance 95cm, each peak actually represents a specified object.For each and object C is a tripod at distance 150cm.We then collect peak,we respectively find the leftmost depth d and the the depth histogram from the depth sensor.As shown in rightmost depth dr with the number of pixels nr>0.We Fig.3(a),the X-axis denotes the depth value,and the y- then compute the average depth for the specified object axis denotes the number of pixels at the specified depth. as follows:d(d).The average depth is We find that,as A and B are regular-shaped objects,there calculated in a weighted average approach according to the are respective peaks in the depth histogram for objects A number of pixels for each depth around the peak. and B,meaning that many pixels are detected from this Moreover,in Fig.3(a),we also find some background distance.Therefore,A and B can be easily distinguished noises past the distance of 175 cm,which are produced according to the distance.However,there exist two peaks by background objects,such as the wall and floor.To ad- in the corresponding distance of object C,because object dress the background noise problem,we note that these C is an irregularly-shaped object (the concave shape of background noises always lead to a continuous range of the tripod),there might be a number of pixels at different depth value,with a very close amount of pixels in the depth distances.This implies that,for the object with a continuous histogram.Therefore,we can use a specified pattern to surface,the depth sensor usually detects a peak in the detect and eliminate this range of depth values.Specifically, vicinity of its distance,for an irregularly-shaped object, we respectively set a threshold tr for the length of the the depth sensor detects multiple peaks with intermittent continuous range and a threshold tp for the number of pixels depths.Nevertheless,we find that these peaks are usually corresponding to each depth(t=50cm and tp=500 in our very close in distance.If multiple objects are placed with implementation).Then,for a certain range of depth value in a rather close proximity,it may increase the difficulty to the depth histogram,if the range is greater than t and the distinguish these objects. number of pixels for each depth value is greater than tp,we In order to further validate the relationship between the can determine this range as background noise. depth and distance,we set multiple horizontal lines with The effective scanning distance of the depth camera is different distances to the Kinect(from 500 mm to 2500 mm). very important to the potential range of AR applications, For each horizontal line,we then move a certain object along otherwise the potential application scenario should be very the line and respectively obtain the depth value from the limited.In fact,the effective scanning distance of the depth Kinect.We show the experiment results in Fig.3(b).Here camera,such as Kinect,can be as far as 475cm.To validate we find that,for each horizontal line,the depth values of the that,we perform a set of experiments in regard to the object keep nearly constant,with rather small deviations;for effective scanning distance of the depth camera,e.g.,Kinect. different horizontal lines,these depth values have obvious We deploy a cardboard of size 20cmx20cmx5cm on the top variations.Due to the limitation of the Kinect's view,the of a tripod,and evaluate the corresponding depth histogram Kinect has a smaller view angle in a closer distance.This when the cardboard is separated from the depth camera observation implies that,the depth value collected from the (i.e.,Kinect)with the distance of 50cm,150cm,300cm and depth cameras depicts the vertical distance rather than the 450cm,respectively.We plot the experiment results in Fig absolute distance between the objects and the depth camera. 3(c).Note that,when the object is deployed at different 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 4 Depth (cm) 100 150 200 250 Number of pixels 0 1000 2000 3000 4000 Depth value:x(cm) 140 145 150 155 160 Number of pixels 0 100 200 300 400 500 Background Objects A B C (a) Depth histogram of multiple objects The horizontal coordinate: x (cm) -150 -100 -50 0 50 100 150 Depth(cm) 0 50 100 150 200 250 300 (b) Depth of objects in different horizontal lines # of pixels 0 50 100 150 200 250 300 350 400 450 500 ×105 0 0.5 1 1.5 2 # of pixels 0 50 100 150 200 250 300 350 400 450 500 ×105 0 0.5 1 1.5 2 # of pixels 0 50 100 150 200 250 300 350 400 450 500 ×105 0 0.5 1 1.5 2 Depth(cm) # of pixels 0 50 100 150 200 250 300 350 400 450 500 ×105 0 0.5 1 1.5 2 (c) Depth histogram of the same object at different distances Fig. 3. Experiment results of depth value Therefore, the depth camera can effectively estimate the distance to a specified object according to the depth, because the depth is linearly increasing with the distance. If multiple objects are placed at different positions in the scene, they are usually at different distances away from the depth camera. Therefore, it is possible to distinguish among different objects according to the depth values from the depth camera. In order to understand the characteristics of the depth information collected from the depth camera, we conduct real experiments to obtain more observations. We first conduct an experiment to evaluate the characteristics of the depth. Without loss of generality, each experiment observation is summarized from the statistic properties of 100 repeatable observations. We arbitrarily place three objects A, B, and C in front of the depth camera, i.e., Microsoft Kinect, object A is a box at distance 68cm, object B is a can at distance 95cm, and object C is a tripod at distance 150cm. We then collect the depth histogram from the depth sensor. As shown in Fig. 3(a), the X-axis denotes the depth value, and the Y - axis denotes the number of pixels at the specified depth. We find that, as A and B are regular-shaped objects, there are respective peaks in the depth histogram for objects A and B, meaning that many pixels are detected from this distance. Therefore, A and B can be easily distinguished according to the distance. However, there exist two peaks in the corresponding distance of object C, because object C is an irregularly-shaped object (the concave shape of the tripod), there might be a number of pixels at different distances. This implies that, for the object with a continuous surface, the depth sensor usually detects a peak in the vicinity of its distance, for an irregularly-shaped object, the depth sensor detects multiple peaks with intermittent depths. Nevertheless, we find that these peaks are usually very close in distance. If multiple objects are placed with a rather close proximity, it may increase the difficulty to distinguish these objects. In order to further validate the relationship between the depth and distance, we set multiple horizontal lines with different distances to the Kinect (from 500 mm to 2500 mm). For each horizontal line, we then move a certain object along the line and respectively obtain the depth value from the Kinect. We show the experiment results in Fig. 3(b). Here we find that, for each horizontal line, the depth values of the object keep nearly constant, with rather small deviations; for different horizontal lines, these depth values have obvious variations. Due to the limitation of the Kinect’s view, the Kinect has a smaller view angle in a closer distance. This observation implies that, the depth value collected from the depth cameras depicts the vertical distance rather than the absolute distance between the objects and the depth camera. To extract the depth of specified objects from the depth histogram of multiple objects, we set a threshold t to detect the peaks in regard to the number of pixels. We thus iterate from the minimum depth to the maximum depth in the histogram, if the number of pixels for a certain depth is larger than t, we identify it as a peak p(di , ni) with the depth di and the number of pixels ni . It is found that for an irregularly-shaped object, the depth sensor usually detects multiple peaks with intermittent depths. In order to address the multiple-peaks problem of irregularly-shaped objects, we set another threshold ∆d. If the differences of these peaks’ depth values are smaller than ∆d, we then combine them as one peak. Both the value of t and ∆d are selected based on the empirical value from a number of experimental studies (t=200 and ∆d=10cm in our implementation). Then, each peak actually represents a specified object. For each peak, we respectively find the leftmost depth dl and the rightmost depth dr with the number of pixels nr > 0. We then compute the average depth for the specified object as follows: d = Pr i=l (di × P ni r i=l ni ). The average depth is calculated in a weighted average approach according to the number of pixels for each depth around the peak. Moreover, in Fig. 3(a), we also find some background noises past the distance of 175 cm, which are produced by background objects, such as the wall and floor. To address the background noise problem, we note that these background noises always lead to a continuous range of depth value, with a very close amount of pixels in the depth histogram. Therefore, we can use a specified pattern to detect and eliminate this range of depth values. Specifically, we respectively set a threshold tl for the length of the continuous range and a threshold tp for the number of pixels corresponding to each depth (tl=50cm and tp=500 in our implementation). Then, for a certain range of depth value in the depth histogram, if the range is greater than tl and the number of pixels for each depth value is greater than tp, we can determine this range as background noise. The effective scanning distance of the depth camera is very important to the potential range of AR applications, otherwise the potential application scenario should be very limited. In fact, the effective scanning distance of the depth camera, such as Kinect, can be as far as 475cm. To validate that, we perform a set of experiments in regard to the effective scanning distance of the depth camera, e.g., Kinect. We deploy a cardboard of size 20cm×20cm×5cm on the top of a tripod, and evaluate the corresponding depth histogram when the cardboard is separated from the depth camera (i.e., Kinect) with the distance of 50cm, 150cm, 300cm and 450cm, respectively. We plot the experiment results in Fig. 3(c). Note that, when the object is deployed at different

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing 5 distances,the profiles of the correspond depth histogram is the semiperimeter of the triangle,i.e.,s (d1+d+d) are very similar to each other in most cases.In particular, Moreover,since the area of this triangle can also be com- when the object is deployed at a distance very close to the puted as A =h x d,we can thus compute the vertical depth camera,e.g.,50cm,the profile may be distorted to a distanceh Then,according to the certain degree.When the object is deployed at a distance of 450cm,the depths over 475cm are no longer illustrated since Apollonius'theorem [31],for a triangle composed of point they are out of the effective scanning distance.Therefore, A1,A2,and T,the length of median TO bisecting the the experiment results show that the depth camera is able to side A1A2 is equal to m =v2d+2d2-d2.Hence,the horizontal distance between the tag and the midpoint of the extract the depth information of the objects at a distance as two antennas,i.e.,T'O,should be vm2-h2.Therefore,if far as 475cm. we build a local coordinate system with the origin set to 4.2 Extract the Phase Value from RF-Signals the midpoint of the two antennas,the coordinate (z',y)is Phase is a basic attribute of a signal along with amplitude computed as follows: and frequency.The phase value of an RF signal describes the degree that the received signal offsets from the sent signal, √d+-d2-h2 d1≥d2 (2) ranging from 0 to 360 degrees.Let d be the distance between -(V2+-d2-h2) d1<d2 the RFID antenna and the tag,the signal traverses a round- trip with a distance of 2d in each backscatter communica- y=h. (3) tion.Therefore,the phase value output by the RFID reader Therefore,the next problem we need to address is to es- can be expressed as [20,29]: timate the absolute distance between the tag and antenna 0=0 2m according to the extracted phase value from RF-signals. ,×2d+4)）mod2m, (1) Suppose the RFID system respectively obtains two phase where A is the wave length.u is a diversity term which values 01 and 02 from two separated RFID antennas,then, is related with additional phase rotation introduced by according to the definition in Eq.(1),the possible distances the reader's transmitter/receiver and the tag's reflection from the tag to the two antennas are:di=支·(+&i):入， characteristic.According to the previous study [4],as u is and d2=支·（经+2）·X.Here,ki andk2 are integers rather stable,we can record u for different tags in advance. ranging from 0 to +o0.Due to the multiple solutions of k Then,according to each tag's response,we can calibrate the and k2,there could be multiple candidate positions for the phase by offsetting the diversity term.Thus,the phase value tag.However,since the difference of the lengths of two sides can be used as an accurate and stable metric to measure is smaller than the length of the third side in a triangle,i.e., distance. ld-d2<d,we can leverage this constraint to effectively According to the definition in Eq.(1),the phase is a eliminate many infeasible solutions of k and k2.Besides, periodical function of the distance.Hence,given a speci- due to the limited scanning range of the RFID system(the fied phase value from the RF-signal,there can be multiple maximum scanning range l is usually smaller than 10 m), solutions for estimating the distance between the tag and the value of ki and k2 should be upper bounded by a certain antenna.Therefore,we can deploy an RFID antenna array threshold,ie,头 to scan the tags from slightly different positions,so as to Fig.5 shows an example of feasible positions of the target figure out the unique solution of the distance.Without loss tag according to the obtained phase values 01 and 02.The of generality,in this paper,we separate two RFID antennas feasible solutions include multiple positions like A ~D, with a distance of d,use them to scan the RFID tags,and which respectively belong to two hyperbolas Hi and H2. respectively obtain their phase values from the RF-signals, Due to the existence of multiple solutions,we can use these as shown in Fig.4. hyperbolas to denote a superset of these feasible positions in a straightforward approach. -axis HyPerbola H: HyPerbola H: Vertical distance A2 -axis 2 2 Fig.4.Compute the(,y)coordinate of the tag Fig.5.Estimate the distance from phase values If we respectively use A1 and A2 to denote the midpoint of Antenna 1 and Antenna 2,and use T to denote the 5 MATCH THE STATIONARY TAGGED OBJECTS VIA position of the tag,as a matter of fact,the three sides ROTATE SCANNING of (T,A1),(T,A2),and (A1,A2)form a triangle.Since Antenna A1 and Antenna A2 are separated with a fixed 5.1 Motivation distance d,according to Heron's formula [30],the area of To identify and distinguish the multiple tagged objects, this triangle is A =Vs(s-d1)(s-d2)(s-d),where s a straightforward solution is to scan the tags in a static 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 5 distances, the profiles of the correspond depth histogram are very similar to each other in most cases. In particular, when the object is deployed at a distance very close to the depth camera, e.g., 50cm, the profile may be distorted to a certain degree. When the object is deployed at a distance of 450cm, the depths over 475cm are no longer illustrated since they are out of the effective scanning distance. Therefore, the experiment results show that the depth camera is able to extract the depth information of the objects at a distance as far as 475cm. 4.2 Extract the Phase Value from RF-Signals Phase is a basic attribute of a signal along with amplitude and frequency. The phase value of an RF signal describes the degree that the received signal offsets from the sent signal, ranging from 0 to 360 degrees. Let d be the distance between the RFID antenna and the tag, the signal traverses a roundtrip with a distance of 2d in each backscatter communication. Therefore, the phase value θ output by the RFID reader can be expressed as [20, 29]: θ = (2π λ × 2d + µ) mod 2π, (1) where λ is the wave length. µ is a diversity term which is related with additional phase rotation introduced by the reader’s transmitter/receiver and the tag’s reflection characteristic. According to the previous study [4], as µ is rather stable, we can record µ for different tags in advance. Then, according to each tag’s response, we can calibrate the phase by offsetting the diversity term. Thus, the phase value can be used as an accurate and stable metric to measure distance. According to the definition in Eq. (1), the phase is a periodical function of the distance. Hence, given a speci- fied phase value from the RF-signal, there can be multiple solutions for estimating the distance between the tag and antenna. Therefore, we can deploy an RFID antenna array to scan the tags from slightly different positions, so as to figure out the unique solution of the distance. Without loss of generality, in this paper, we separate two RFID antennas with a distance of d, use them to scan the RFID tags, and respectively obtain their phase values from the RF-signals, as shown in Fig. 4. Antenna 1 Antenna 2 d Tag G G O m h X-axis Y-axis Vertical distance T A1 A2 Tÿ Fig. 4. Compute the (x, y) coordinate of the tag If we respectively use A1 and A2 to denote the midpoint of Antenna 1 and Antenna 2, and use T to denote the position of the tag, as a matter of fact, the three sides of hT, A1i, hT, A2i, and hA1, A2i form a triangle. Since Antenna A1 and Antenna A2 are separated with a fixed distance d, according to Heron’s formula [30], the area of this triangle is A = p s(s − d1)(s − d2)(s − d), where s is the semiperimeter of the triangle, i.e., s = (d1+d2+d) 2 . Moreover, since the area of this triangle can also be computed as A = 1 2 h × d, we can thus compute the vertical distance h = 2 √ s(s−d1)(s−d2)(s−d) d . Then, according to the Apollonius’ theorem [31], for a triangle composed of point A1, A2, and T, the length of median T O bisecting the side A1A2 is equal to m = 1 2 p 2d 2 1 + 2d 2 2 − d 2. Hence, the horizontal distance between the tag and the midpoint of the two antennas, i.e., T 0O, should be √ m2 − h 2. Therefore, if we build a local coordinate system with the origin set to the midpoint of the two antennas, the coordinate (x 0 , y0 ) is computed as follows: x 0 =    q 1 2 d 2 1 + 1 2 d 2 2 − 1 4 d 2 − h 2 d1 ≥ d2 −( q 1 2 d 2 1 + 1 2 d 2 2 − 1 4 d 2 − h 2) d1 < d2 (2) y 0 = h. (3) Therefore, the next problem we need to address is to estimate the absolute distance between the tag and antenna according to the extracted phase value from RF-signals. Suppose the RFID system respectively obtains two phase values θ1 and θ2 from two separated RFID antennas, then, according to the definition in Eq. (1), the possible distances from the tag to the two antennas are: d1 = 1 2 · ( θ1 2π + k1) · λ, and d2 = 1 2 · ( θ2 2π + k2) · λ. Here, k1 and k2 are integers ranging from 0 to +∞. Due to the multiple solutions of k1 and k2, there could be multiple candidate positions for the tag. However, since the difference of the lengths of two sides is smaller than the length of the third side in a triangle, i.e., |d1 − d2| < d, we can leverage this constraint to effectively eliminate many infeasible solutions of k1 and k2. Besides, due to the limited scanning range of the RFID system (the maximum scanning range l is usually smaller than 10 m), the value of k1 and k2 should be upper bounded by a certain threshold, i.e., 2l λ . Fig. 5 shows an example of feasible positions of the target tag according to the obtained phase values θ1 and θ2. The feasible solutions include multiple positions like A ∼ D, which respectively belong to two hyperbolas H1 and H2. Due to the existence of multiple solutions, we can use these hyperbolas to denote a superset of these feasible positions in a straightforward approach. Antenna 1 Antenna 2 d<Ȝ/2 Target A B D C Ȝ/2 G T T T T T HyPerbola H1 HyPerbola H2 G Fig. 5. Estimate the distance from phase values 5 MATCH THE STATIONARY TAGGED OBJECTS VIA ROTATE SCANNING 5.1 Motivation To identify and distinguish the multiple tagged objects, a straightforward solution is to scan the tags in a static

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing 6 approach,where both the depth camera and the RFID value.Then,we label each object with the coordinate of antenna(s)are deployed in a fixed position without moving. its peak value,i.e.,(0,d),where 6 represents the rotation The system scans the objects and tags simultaneously and angle and d represents the depth value.Therefore,as the respectively collects the depth value and RF-signals from depth d denotes the vertical distance of the objects,we these tagged objects.We can further pair the tags with can use the depth to distinguish the objects in the vertical the objects accordingly.However,when multiple tagged dimension;as the rotation angle 6 denotes the angle for the objects are placed at a close vertical distance to the system, camera to meet the perpendicular point,we can use the angle this solution cannot effectively distinguish multiple tagged to distinguish the objects in the horizontal dimension.For objects in different horizontal distances. example,in Fig.6(a),we deploy the object 4 and object To address this problem,we propose a rotate scanning- 5 with the same vertical distance to the depth camera, based solution as follows:we continuously rotate the scan- according to the results in Fig.6(b),these two objects can ning system (including the depth camera and RFID anten- be distinguished since the peak values of their depth exist nas),and simultaneously sample the depth of field and RF- in different angles,i.e.,-17 and +220 respectively.They signals from multiple tagged objects.Hence,we are able to can be easily distinguished from the horizontal dimension. collect a continuous series of features like depth,RSSI and phase values during rotate scanning.While the scanning 200 system is rotating,the vertical distances between multiple objects and the scanning system are continuously changing, from which we can further derive the differences of multiple tagged objects in different horizontal distances.In this way, 0.3气13 we are able to further distinguish multiple tagged objects with a close vertical distance but in different positions. Seale:1-0. 3Cmn 507 -20 5.2 Pair the Tags with Objects via Rotate Scanning RFID Anirnma xm (a)The deployment of multi- (b)Variation of the depth value 5.2.1 Extract Depth via Rotate Scanning ple tagged objects During the rotate scanning,we continuously rotate the Fig.6.The experiment results of rotate scanning depth camera from the angle of-0 to +0 and use it to scan the multiple tagged objects.During this process,as the 5.2.2 Estimate the tag's position with hyperbolas vertical distance between the specified objects and the depth According to the analysis shown in Fig.5,given the two camera is continuously changing,the depth values col- phase values of RF-signals extracted from two antennas lected from these objects are also continuously changing.We separated with a distance d(d=25cm in our implementa- conduct experiments to validate this judgment.As shown tion),there could be multiple solutions for the tag's posi- in Fig.6(a),we arbitrarily deploy multiple tagged objects tion,which could be represented with multiple hyperbolas in within the effective scanning range,the coordinates of these the two-dimensional space.In fact,we can leverage rotate objects are also labeled.We continuously rotate the depth camera from the angle of-40 to +40 and collect the depth scanning to figure out a unique solution by filtering out those unqualified solutions.The idea is as follows:for each values from multiple tagged objects for every 5~6 degrees. Fig.6(b)shows the experiment results.Note that the series snapshot ti(i-1 m)of the rotate scanning,for a specified tag T,we can respectively extract the phase values of depth values for each object actually form a convex curve (01,02)from the two antennas,then compute the feasible with a peak value.For each depth value obtained at a certain distances (d1,d2)between the tag and two antennas.We rotation angle,we can use k-NearestNeighbor(kNN)to clas- further compute the set of feasible positions in a global sify it into a corresponding curve according to the distance coordinate system as Si.Then,by computing the intersec- between the depth value and the other depth values in the tion of different sets Si for all snapshots,we are able to curve,and then use the method of quadratic curve fitting figure out a unique solution for the tag's position as follows: to connect the corresponding depth values as a curve.In S=n吧1S: this way,we are able to continuously identify and track these depth values for a specified object.The peak value of the convex curve denotes the snapshot when the vertical 5.2.3 Estimate the tag's position with angle of arrival distance reaches the maximum value.It appears only when In some situations,it could be difficult to directly derive the the perpendicular bisector of the depth camera crosses the tag's candidate position using the intersections of multiple specified object,since the vertical distance reaches the value hyperbolas,since the hyperbolas must be exactly plotted in of the absolute distance between the object and the depth the two-dimensional space,which might be computation- camera,which is the theoretical upper bound it can achieve. ally expensive for some mobile devices.Nevertheless,it is In other words,the peak value appears when the depth found that,as long as the tagged objects are relatively far camera is right facing towards the object,we call it the from the antenna pair,we can use the method of angle of perpendicular point. arrival at antenna pair [21]to simplify the solution.Specif- In this way,according to the peak value of depth,we ically,suppose the distance between the antenna pair A1 are able to further distinguish multiple objects with the and A2 is d,the distances between the tag and the antenna same vertical distance but different positions.The solution pair A1/A2 are respectively di and d2.As Fig.7 shows, is as follows:After the system finishes rotate scanning,it when the distance between the tag and the antenna pair is extracts the peak value from the curve of each object's depth significantly larger than the distance between the antenna 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 6 approach, where both the depth camera and the RFID antenna(s) are deployed in a fixed position without moving. The system scans the objects and tags simultaneously and respectively collects the depth value and RF-signals from these tagged objects. We can further pair the tags with the objects accordingly. However, when multiple tagged objects are placed at a close vertical distance to the system, this solution cannot effectively distinguish multiple tagged objects in different horizontal distances. To address this problem, we propose a rotate scanningbased solution as follows: we continuously rotate the scanning system (including the depth camera and RFID antennas), and simultaneously sample the depth of field and RFsignals from multiple tagged objects. Hence, we are able to collect a continuous series of features like depth, RSSI and phase values during rotate scanning. While the scanning system is rotating, the vertical distances between multiple objects and the scanning system are continuously changing, from which we can further derive the differences of multiple tagged objects in different horizontal distances. In this way, we are able to further distinguish multiple tagged objects with a close vertical distance but in different positions. 5.2 Pair the Tags with Objects via Rotate Scanning 5.2.1 Extract Depth via Rotate Scanning During the rotate scanning, we continuously rotate the depth camera from the angle of −θ to +θ and use it to scan the multiple tagged objects. During this process, as the vertical distance between the specified objects and the depth camera is continuously changing, the depth values collected from these objects are also continuously changing. We conduct experiments to validate this judgment. As shown in Fig. 6(a), we arbitrarily deploy multiple tagged objects within the effective scanning range, the coordinates of these objects are also labeled. We continuously rotate the depth camera from the angle of −40◦ to +40◦ and collect the depth values from multiple tagged objects for every 5∼6 degrees. Fig. 6(b) shows the experiment results. Note that the series of depth values for each object actually form a convex curve with a peak value. For each depth value obtained at a certain rotation angle, we can use k-NearestNeighbor(kNN) to classify it into a corresponding curve according to the distance between the depth value and the other depth values in the curve, and then use the method of quadratic curve fitting to connect the corresponding depth values as a curve. In this way, we are able to continuously identify and track these depth values for a specified object. The peak value of the convex curve denotes the snapshot when the vertical distance reaches the maximum value. It appears only when the perpendicular bisector of the depth camera crosses the specified object, since the vertical distance reaches the value of the absolute distance between the object and the depth camera, which is the theoretical upper bound it can achieve. In other words, the peak value appears when the depth camera is right facing towards the object, we call it the perpendicular point. In this way, according to the peak value of depth, we are able to further distinguish multiple objects with the same vertical distance but different positions. The solution is as follows: After the system finishes rotate scanning, it extracts the peak value from the curve of each object’s depth value. Then, we label each object with the coordinate of its peak value, i.e., hθ, di, where θ represents the rotation angle and d represents the depth value. Therefore, as the depth d denotes the vertical distance of the objects, we can use the depth to distinguish the objects in the vertical dimension; as the rotation angle θ denotes the angle for the camera to meet the perpendicular point, we can use the angle to distinguish the objects in the horizontal dimension. For example, in Fig. 6(a), we deploy the object 4 and object 5 with the same vertical distance to the depth camera, according to the results in Fig. 6(b), these two objects can be distinguished since the peak values of their depth exist in different angles, i.e., −17◦ and +22◦ respectively. They can be easily distinguished from the horizontal dimension. RFID Antenna y(m) 0.5 1.5 1 2 2.5 x(m) Object1 (0,0.85) Object2 (-0.23,1.2) Object3 (0.35,1.3) Object4 (-0.6,1.8) Object5 (0.6,1.8) 3D-Camera Rotation Scale:[-ș,+ș] (a) The deployment of multiple tagged objects −40 −20 0 20 40 500 1000 1500 2000 Rotation angle Depth value(mm) Object1 Object2 Object3 Object4 Object5 (b) Variation of the depth value Fig. 6. The experiment results of rotate scanning 5.2.2 Estimate the tag’s position with hyperbolas According to the analysis shown in Fig. 5, given the two phase values of RF-signals extracted from two antennas separated with a distance d (d=25cm in our implementation), there could be multiple solutions for the tag’s position, which could be represented with multiple hyperbolas in the two-dimensional space. In fact, we can leverage rotate scanning to figure out a unique solution by filtering out those unqualified solutions. The idea is as follows: for each snapshot ti(i = 1 ∼ m) of the rotate scanning, for a specified tag T, we can respectively extract the phase values (θ1, θ2) from the two antennas, then compute the feasible distances (d1, d2) between the tag and two antennas. We further compute the set of feasible positions in a global coordinate system as Si . Then, by computing the intersection of different sets Si for all snapshots, we are able to figure out a unique solution for the tag’s position as follows: S = ∩ m i=1Si . 5.2.3 Estimate the tag’s position with angle of arrival In some situations, it could be difficult to directly derive the tag’s candidate position using the intersections of multiple hyperbolas, since the hyperbolas must be exactly plotted in the two-dimensional space, which might be computationally expensive for some mobile devices. Nevertheless, it is found that, as long as the tagged objects are relatively far from the antenna pair, we can use the method of angle of arrival at antenna pair [21] to simplify the solution. Specifically, suppose the distance between the antenna pair A1 and A2 is d, the distances between the tag and the antenna pair A1/A2 are respectively d1 and d2. As Fig. 7 shows, when the distance between the tag and the antenna pair is significantly larger than the distance between the antenna

This article has been accepted for publication in a future issue of this journal,but has not been fully edited Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing pair,i.e.,di>d and d2>d,suppose that the angle of 00 arrival of the tagged objects is a,then Candidale Solution of (A,】 Candidale Solution of (AA Candidate Solution of (A △d=d1-d2=dcosa. (4) Cardidse Salution of (A A didate Solution of (A Furthermore,when the distance between the antenna pair is l (A.A less than half of the wavelength,i.e.,d <we can figure 150 ◆Real Position(-80,1B0) out a pair of symmetric solutions for the angle of arrival 100 of the tagged object.In this regard,we can further use the phase difference between the two antennas to depict the A2 value of△d,i.e,△d=ld1-d2l=△0=l91-f2l.Therefore, 5 we can figure out the angle of arrival of the tagged objects The horncoordinate(cm) using the equation: Fig.8.Figure out the unique solution of the tag's position △8 a arccos( (5) 5.2.4 Deriving the angle-distance pair After deriving the target tag's position,we can further de- As a matter of fact,by leveraging the method of angle of rive the angle when the tag is at the perpendicular point of the arrival at antenna pair,we are able to use the asymptotic lines RFID antennas,that is the moment when the perpendicular of the hyperbolas to approximate the candidate position of the bisector of the midpoint of the antenna pairs crosses the tagged object,as long as the tagged object is relatively far tag.We use the pair(0,6)to denote this situation,here from the antenna pair. denotes the offset angle of the antenna,and 6 denotes the RFID Tag vertical distance.The pair (0,6)is computed as follows: 0=arctan号，and6=√r2+y2.Therefore,we can further leverage an algorithm like Algorithm 2 to match multiple tags to multiple objects d1 d2 Algorithm 1 Match multiple objects to multiple tags d1-d2=01-e2=dcosa 1:Extract the vector:After continuous scanning,identify Antenna Paird the peak value from the depth curve and the crossing point of multiple hyperbolas derived from phase pairs. A1 A2 For each object Oi,label it with a vector (0i,di),respec- Fig.7.Angle of arrival at antenna pair tively normalize the angle and depth into the interval Fig.8 shows an example of deriving the unique solution [0,1]by dividing the maximum value of angle and of tag's position from the intersections.Suppose a target tag is deployed at the coordinate (-60,180).We first ob- depth,and add the vector to a set O;For each tag Tj, tain the phase values(2.58,5.81)from the two antennas label it with a vector(0j,6),normalize it and add the when they are respectively at the position of A1 and A2. vector to a set T. 2:while O≠0orT≠odo After the antenna pair is rotated with a degree of 40, Match the objects and tags:For each object O;EO we then obtain the phase values (5.56,2.49)from the two antennas when they are respectively at the position of A with vector(0i,di),compute the distance with each tagT,∈T with vector(g,d〉as follows: and A3.In this way,we can obtain three pairs of phase values(2.58,5.81),(2.58,5.56),and(5.81,2.49),which are △=V0:-,)2+(d-6)2 respectively collected from antenna pairs(A1,A2),(A1,A1), and(A2,A2).We can respectively use them to compute the Select the tag Ti with the minimum distance and pair feasible solutions in a unified coordinate system.We use the object O:with the tag Tjt. different colors to label the hyperbolas of multiple feasible 4: Calibrate the matching results:For any tag T;E T solutions according to different pairs of phase values.By us- paired with multiple objects,select the object O;from ing the method of angle of arrival,we use the asymptotic lines these objects with the minimum distance Ai.i,and to approximate the corresponding hyperbolas.For example, pair the object Oi with the tag Ti.Respectively re- as the distance between A1 and A2 is greater than half the move the object O:and the tag T from set O and wave length,two pairs of symmetric directions of the tagged T object are derived,marked with red color;as the distance 5:end while between A and A is less than half the wave length,one 6:Output the matched pairs of objects and tags. pair of symmetric directions of the tagged object are derived, marked with blue color;similarly,as the distance between 5.3 Tackle the Issues of Interferences A2 and A is less than half the wave length,one pair of symmetric directions of the tagged object are derived, 5.3.1 Impact of Interferences marked with black color.Moreover,the multiple hyperbolas Due to the environmental issues like the multi-path fading of different feasible solutions all intersect at a small area and object occlusion,the system may fail to identify some of which is very close to the target tag's real position.We thus the objects and the tags.For example,the multi-path fading set the central point of the intersection region as the estimate may cause the line-of-sight RF-signal and the reflected RF- value of the tag's position. signals to offset each other at the tag's position,such that 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 7 pair, i.e., d1 d and d2 d, suppose that the angle of arrival of the tagged objects is α, then ∆d = d1 − d2 = d cos α. (4) Furthermore, when the distance between the antenna pair is less than half of the wavelength, i.e., d ≤ λ 2 , we can figure out a pair of symmetric solutions for the angle of arrival of the tagged object. In this regard, we can further use the phase difference between the two antennas to depict the value of ∆d, i.e., ∆d = |d1−d2| = ∆θ = |θ1−θ2|. Therefore, we can figure out the angle of arrival of the tagged objects using the equation: α = arccos(∆θ d ). (5) As a matter of fact, by leveraging the method of angle of arrival at antenna pair, we are able to use the asymptotic lines of the hyperbolas to approximate the candidate position of the tagged object, as long as the tagged object is relatively far from the antenna pair. d1-d2=&1-&2=dcosα α A1 A2 d1 d2 d Antenna Pair RFID Tag Fig. 7. Angle of arrival at antenna pair Fig. 8 shows an example of deriving the unique solution of tag’s position from the intersections. Suppose a target tag is deployed at the coordinate (−60, 180). We first obtain the phase values (2.58, 5.81) from the two antennas when they are respectively at the position of A1 and A2. After the antenna pair is rotated with a degree of 40◦ , we then obtain the phase values (5.56, 2.49) from the two antennas when they are respectively at the position of A0 1 and A0 2 . In this way, we can obtain three pairs of phase values (2.58, 5.81), (2.58, 5.56), and (5.81, 2.49), which are respectively collected from antenna pairs hA1, A2i, hA1, A0 1 i, and hA2, A0 2 i. We can respectively use them to compute the feasible solutions in a unified coordinate system. We use different colors to label the hyperbolas of multiple feasible solutions according to different pairs of phase values. By using the method of angle of arrival, we use the asymptotic lines to approximate the corresponding hyperbolas. For example, as the distance between A1 and A2 is greater than half the wave length, two pairs of symmetric directions of the tagged object are derived, marked with red color; as the distance between A1 and A0 1 is less than half the wave length, one pair of symmetric directions of the tagged object are derived, marked with blue color; similarly, as the distance between A2 and A0 2 is less than half the wave length, one pair of symmetric directions of the tagged object are derived, marked with black color. Moreover, the multiple hyperbolas of different feasible solutions all intersect at a small area which is very close to the target tag’s real position. We thus set the central point of the intersection region as the estimate value of the tag’s position. The horizontal coordinate:x(cm) -100 -50 0 50 100 The vertical coordinate:y(cm) 0 50 100 150 200 250 300 Candidate Solution of (A1 , A2 ) Candidate Solution of (A1 , A2 ) Candidate Solution of (A1 , A2 ) Candidate Solution of (A1 , A2 ) Candidate Solution of (A1 , A'1 ) Candidate Solution of (A2 , A'2 ) Real Position(-60,180) Candidate Region A1 A2 A1’ A2’ Fig. 8. Figure out the unique solution of the tag’s position 5.2.4 Deriving the angle-distance pair After deriving the target tag’s position, we can further derive the angle when the tag is at the perpendicular point of the RFID antennas, that is the moment when the perpendicular bisector of the midpoint of the antenna pairs crosses the tag. We use the pair hθ, δi to denote this situation, here θ denotes the offset angle of the antenna, and δ denotes the vertical distance. The pair hθ, δi is computed as follows: θ = arctan x y , and δ = p x 2 + y 2. Therefore, we can further leverage an algorithm like Algorithm 2 to match multiple tags to multiple objects. Algorithm 1 Match multiple objects to multiple tags 1: Extract the vector: After continuous scanning, identify the peak value from the depth curve and the crossing point of multiple hyperbolas derived from phase pairs. For each object Oi , label it with a vector hθi , dii, respectively normalize the angle and depth into the interval [0, 1] by dividing the maximum value of angle and depth, and add the vector to a set O; For each tag Tj , label it with a vector hθj , δj i, normalize it and add the vector to a set T. 2: while O 6= ∅ or T 6= ∅ do 3: Match the objects and tags: For each object Oi ∈ O with vector hθi , dii, compute the distance with each tag Tj ∈ T with vector hθj , δj i as follows: ∆i,j = q (θi − θj ) 2 + (di − δj ) 2. Select the tag Tj∗ with the minimum distance and pair the object Oi with the tag Tj∗. 4: Calibrate the matching results: For any tag Tj ∈ T paired with multiple objects, select the object Oi from these objects with the minimum distance ∆i,j , and pair the object Oi with the tag Tj . Respectively remove the object Oi and the tag Tj from set O and T. 5: end while 6: Output the matched pairs of objects and tags. 5.3 Tackle the Issues of Interferences 5.3.1 Impact of Interferences Due to the environmental issues like the multi-path fading and object occlusion, the system may fail to identify some of the objects and the tags. For example, the multi-path fading may cause the line-of-sight RF-signal and the reflected RFsignals to offset each other at the tag’s position, such that

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing the tag cannot be effectively activated due to the reduced is free.In each subsequent round,each free object proposes to incident power from the RF-antennas.Besides,the object the most-preferred tag to which it has not yet proposed, occlusion may cause one specific object to be blocked by and each tag replies "maybe"if it is currently free or if another object placed in front of it,such that this object it prefers this object over its current partner object.This cannot be effectively identified from its depth histograms. scheme preserves the right of an already-engaged tag to This leads to the issue of missing tags or objects.Moreover, frade up for better choice.This process is repeated until all in some situations,it is essential to isolate the recognizable objects/tags are engaged or have no candidate partner to object with non-recognizable ones,e.g.,a number of tagged propose to. objects are placed on an untagged table,and the tagged objects are expected to be recognized instead of the table. Algorithm 2 Stable Matching-based Solution However,the table might have effects on the depth-camera 1:Initialize all O:∈O and T,∈T to free reading,but not in RFID-based scanning.This leads to the 2:Set the weight wi.j as the distance between each pair of issue of extra objects.The above two issues further lead to object O:and tag Tj.Compute the ordering of preferences imperfect matching between the objects and tags. for each object/tag according to wi.j.If the weight is greater than a threshold t,remove the corresponding tag/object 5.3.2 Tackle the Outliers in Bipartite Graph Matching from the object/tag's preference list. 3:while 3 free object o which still has a candidate tag t to Since we need to find a matching between the set of tags propose to do and the set of objects according to their estimated positions, 4 t=first tag on o's list to which o has not yet proposed. it is similar to finding a matching in a weighted bipartite 5 if t is free then graph,where the weight refers to the distance between the 6: (o,t)become engaged. tag-object pairs.However,due to the existence of the above 7: else interferences,they actually form the outliers in addition to $ some pair (o',t)already exists 9 if t prefers o to o'then the regular points of the tag set and object set.Specifically, 10: o'becomes free,(o,t)become engaged. these outliers are not essentially far from the regular points 1 else in regard to their relative distance,e.g.,the extra objects 12 (o',t)remain engaged. can be fairly close to the regular tagged objects.Therefore, 13: end if traditional solutions for the weighted bipartite graph matching 14: end if such as the Hungarian algorithm [32]cannot effectively tackle 15:end while the outlier issues in matching,since they seek to find a matching in a weighted bipartite graph which minimizes We further illustrate the above idea with an example, the overall weight (i.e.,the distance between points).They as shown in Fig.9.Fig.9 shows a scenario where 5 tagged aim to pursue the overall benefits of all members while objects are randomly placed in the 2-dimensional space.Due sacrificing the benefits of individuals.In this regard,to avoid to the impact of interferences,there exist some outliers such huge value in the overall weight,some specific regular as the missing tags/objects and extra interference objects points can be mismatched to the outliers for trade off,then In this case,the Hungarian Matching (HM)-based solution a cascade of mismatches between the regular points could can mismatch the tag T2 to the extra objects rather than appear frequently. the object O2,since it considers the overall benefit to make Hence,in order to tackle the outliers,we reduce this the tag T4 to be paired with its only adjacent object O2. problem to the stable marriage problem [33].Specifically,we Nevertheless,our Stable Marriage Matching(SMM)-based aim to find a stable matching between the set of tags and solution is able to effectively tackle the outliers,by giving the set of objects,given an ordering of preferences for priority to matching the tag-object pairs with best preference each element.The ordering of preferences can be computed in distance,e.g.,it matches the tag T2 to the object O2 rather according to the distance between each object-tag pair.We than the extra object,since O2 is in higher order of T2's aim to achieve the stable property for the matching,i.e., preference than the extra object,and T2 is in higher order of there does not exist any match (A,B)by which both A O2's preference than the tag T4. and B would be individually better off than they are with 350 the element to which they are currently matched.The basic -Hungarian Matching (HM) -Stable Marriage Matching (SMM) intuition for using the idea of stable matching is that,for any 300 o Scanned tags tagged object,the distance between the positions of the tag 0×5 o Scanned objects and the object is usually much smaller than the distance 5250 + Missing tags/objects from the outliers.So we can give priority to matching the specific pair of the tag and object according to their best Interference objects preferences in terms of distance.By considering the individ- ual benefits rather than the overall benefits of the object-tag 5 -100 -50 0 50 100 pairs,we can mitigate the impact from the outliers. X(cm) We use the Gale-Shapley algorithm [33]to solve this prob- Fig.9.Tackle the outliers with stable marriage matching lem,as shown in Algorithm 2.It involves a number of We further compare the performance of different so- iterations.Initially all objects and tags are set to free.In lutions under different settings,i.e.,the Greedy Matching the first round,each free object proposes to the tag it prefers in Algorithm 1(GM),the Hungarian Matching(HM),and most,and then each tag replies "maybe"to the object it the Stable Marriage Matching(SMM),as shown in Fig.10. most prefers and gets temporarily engaged to the object if it By default,the average cardinality and spacing of tagged 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 8 the tag cannot be effectively activated due to the reduced incident power from the RF-antennas. Besides, the object occlusion may cause one specific object to be blocked by another object placed in front of it, such that this object cannot be effectively identified from its depth histograms. This leads to the issue of missing tags or objects. Moreover, in some situations, it is essential to isolate the recognizable object with non-recognizable ones, e.g., a number of tagged objects are placed on an untagged table, and the tagged objects are expected to be recognized instead of the table. However, the table might have effects on the depth-camera reading, but not in RFID-based scanning. This leads to the issue of extra objects. The above two issues further lead to imperfect matching between the objects and tags. 5.3.2 Tackle the Outliers in Bipartite Graph Matching Since we need to find a matching between the set of tags and the set of objects according to their estimated positions, it is similar to finding a matching in a weighted bipartite graph, where the weight refers to the distance between the tag-object pairs. However, due to the existence of the above interferences, they actually form the outliers in addition to the regular points of the tag set and object set. Specifically, these outliers are not essentially far from the regular points in regard to their relative distance, e.g., the extra objects can be fairly close to the regular tagged objects. Therefore, traditional solutions for the weighted bipartite graph matching such as the Hungarian algorithm [32] cannot effectively tackle the outlier issues in matching, since they seek to find a matching in a weighted bipartite graph which minimizes the overall weight (i.e., the distance between points). They aim to pursue the overall benefits of all members while sacrificing the benefits of individuals. In this regard, to avoid huge value in the overall weight, some specific regular points can be mismatched to the outliers for trade off, then a cascade of mismatches between the regular points could appear frequently. Hence, in order to tackle the outliers, we reduce this problem to the stable marriage problem [33]. Specifically, we aim to find a stable matching between the set of tags and the set of objects, given an ordering of preferences for each element. The ordering of preferences can be computed according to the distance between each object-tag pair. We aim to achieve the stable property for the matching, i.e., there does not exist any match (A, B) by which both A and B would be individually better off than they are with the element to which they are currently matched. The basic intuition for using the idea of stable matching is that, for any tagged object, the distance between the positions of the tag and the object is usually much smaller than the distance from the outliers. So we can give priority to matching the specific pair of the tag and object according to their best preferences in terms of distance. By considering the individual benefits rather than the overall benefits of the object-tag pairs, we can mitigate the impact from the outliers. We use the Gale-Shapley algorithm [33] to solve this problem, as shown in Algorithm 2. It involves a number of iterations. Initially all objects and tags are set to free. In the first round, each free object proposes to the tag it prefers most, and then each tag replies “maybe” to the object it most prefers and gets temporarily engaged to the object if it is free. In each subsequent round, each free object proposes to the most-preferred tag to which it has not yet proposed, and each tag replies “maybe” if it is currently free or if it prefers this object over its current partner object. This scheme preserves the right of an already-engaged tag to trade up for better choice. This process is repeated until all objects/tags are engaged or have no candidate partner to propose to. Algorithm 2 Stable Matching-based Solution 1: Initialize all Oi ∈ O and Tj ∈ T to free. 2: Set the weight wi,j as the distance between each pair of object Oi and tag Tj . Compute the ordering of preferences for each object/tag according to wi,j . If the weight is greater than a threshold t, remove the corresponding tag/object from the object/tag’s preference list. 3: while ∃ free object o which still has a candidate tag t to propose to do 4: t=first tag on o’s list to which o has not yet proposed. 5: if t is free then 6: (o, t) become engaged. 7: else 8: some pair (o 0 , t) already exists. 9: if t prefers o to o 0 then 10: o 0 becomes free, (o, t) become engaged. 11: else 12: (o 0 , t) remain engaged. 13: end if 14: end if 15: end while We further illustrate the above idea with an example, as shown in Fig. 9. Fig. 9 shows a scenario where 5 tagged objects are randomly placed in the 2-dimensional space. Due to the impact of interferences, there exist some outliers such as the missing tags/objects and extra interference objects. In this case, the Hungarian Matching (HM)-based solution can mismatch the tag T2 to the extra objects rather than the object O2, since it considers the overall benefit to make the tag T4 to be paired with its only adjacent object O2. Nevertheless, our Stable Marriage Matching (SMM)-based solution is able to effectively tackle the outliers, by giving priority to matching the tag-object pairs with best preference in distance, e.g., it matches the tag T2 to the object O2 rather than the extra object, since O2 is in higher order of T2’s preference than the extra object, and T2 is in higher order of O2’s preference than the tag T4. Mismatch in HM Scanned tags Scanned objects Missing tags/ objects Interference objects Fig. 9. Tackle the outliers with stable marriage matching We further compare the performance of different solutions under different settings, i.e., the Greedy Matching in Algorithm 1 (GM), the Hungarian Matching (HM), and the Stable Marriage Matching (SMM), as shown in Fig. 10. By default, the average cardinality and spacing of tagged

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing objects are set to 10 and 50cm,respectively,the average traces of tags to the corresponding human subjects,we are missing ratio of tags/objects is set to 10%,the average able to match the mobile tagged human subjects.Therefore, cardinality and distance of extra interference objects are set to recognize multiple tagged human subjects in the mobile to 2 and 50cm,respectively.In Fig.10(a),we evaluate the situation,in this section,we propose a continuous scanning- match ratios by varying the cardinalities of tagged objects. based solution to pair the mobile tags with moving human It is found that SMM always achieves the best performance subjects via trace matching. than the other two solutions.In Fig.10(b),we evaluate the match ratio by varying the missing ratio of tags or objects for the tagged objects.As the missing ratio increases from 0% 6.2 Pair the Tags with Mobile Human Subjects via Trace to 40%,the matching ratio of HM gradually decreases from Matching 98%to 73%.This implies that HM cannot effectively tackle When deploying our system in front of multiple human the outliers due to the missing tags/objects.Nevertheless, subjects,where the human subjects wearing RFID badges SMM always achieves the match ratios greater than 92%, are moving around,it is known that the state-of-art depth it effectively tackles the outliers of missing tags/objects.In camera such as Kinect is able to extact the skeleton models Fig.10(c)and Fig.10(d),we evaluate the match ratios by from the human subjects.Based on the skeleton model,we varying the cardinalities of the extra interference objects, can further extract the spinemid point [1]from the skeleton to and the average distance between the interference objects represent the human subject,which is also very close to the and tagged objects,respectively.In all situations,SMM place of RFID badge worn by the human subject,as shown achieves the best performance over the other solutions. in Fig.11(a).According to the two-dimensional coordinate of the spinemid point in the horizontal plane,we can figure out the moving traces of different human subjects from the depth camera,as shown in Fig.11(b).Moreover,suppose the reader/depth camera is deployed in the origin O,for any spinemid point P,we can use the angle profile to denote the (HM angle between the vector OP and the X-axis OX,as shown in Fig.11(b). of Tagged Objects ( (a) Different As aforementioned,using the RFID antenna pair,our tagged objects tagged objects system can estimate the Angle of Arrival(AoA)of the RFID tag in the horizontal plane.Then,we can similarly use the angle profile to denote the angle between the AoA direction of the tag and the X-axis.Recall that according to the phase values collected from the RFID antenna pair,there could be multiple solutions for the angle of arrival of the RFID tag. ng (HM CHM Mamage Matching (SMM Hence,there could be multiple angle profiles corresponding 100 150 200 to the specified tag.Therefore,while the tagged human 0 4 Average Distance between Interference Cardinality of Interference Objects Objects and Tagged Objects(cm) subjects are moving from time to time,we can plot the (c)Different cardinalities of inter-(d)Different distances between angle profiles for both the human subjects and the tags over ference objects interference objects and tagged objects time.Fig.11(c)shows the corresponding angle profiles for the human subjects and RFID tags over time,where the Tag Fig.10.Performance Evaluation i is worn on the Body i.Note that for a specified tag,there 6 MATCH THE MOBILE TAGGED HUMAN SUB- are multiple solutions for its angle profile,we use the same JECTS VIA CONTINUOUS SCANNING color to label them.We can observe that the angle profile of the specified body has very close variation trend to one 6.1 Motivation of the angle profiles of the corresponding RFID tag,as they In most cases,the AR systems are designed towards a share very similar moving traces in the horizontal plane. mobile scenario,e.g.,multiple human subjects wearing Therefore,in order to evaluate the correlation of the angle RFID badges are continuously moving around.For this profile between the bodies and tags,we use the difference mobile situation,the rotate scanning-based solution for score to denote this correlation.Specifically,in a specified recognizing multiple stationary tagged objects is no longer sliding window W with length L,for the Ith snapshot suitable.Since the locations of the tagged human subjects (1<I<L),suppose the angle profiles of the body Oi are continuously changing,the scanning frequency of the and the tag Ti are ai(l)and [a ()}respectively.Then,the rotate scanning-based solution cannot be high enough to difference score sij between ai and a,in W is as follows: locate the positions of the tags and human subjects in a real-time manner.Nevertheless,we observe that,when multiple tagged human subjects are continuously moving, Si.j= min ajeta;)L =1 (a(0-ag0)2. (6) their moving traces in the two-dimensional space can be distinguishable among each other.Hence,according to the Here we enumerate all feasible angle profiles a;for the tag depth information and the phase information extracted from T;to compare with the angle profile of ai for the body multiple tagged human subjects,we are able to derive some Oi,and obtain the minimum value as the difference score metric to depict the moving traces for the tags and human si.j.Fig.11(d)shows the difference scores in angle profiles subjects,respectively.In this way,by matching the moving between various pairs of tags and bodies,it is found that 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 9 objects are set to 10 and 50cm, respectively, the average missing ratio of tags/objects is set to 10%, the average cardinality and distance of extra interference objects are set to 2 and 50cm, respectively. In Fig. 10(a), we evaluate the match ratios by varying the cardinalities of tagged objects. It is found that SMM always achieves the best performance than the other two solutions. In Fig. 10(b), we evaluate the match ratio by varying the missing ratio of tags or objects for the tagged objects. As the missing ratio increases from 0% to 40%, the matching ratio of HM gradually decreases from 98% to 73%. This implies that HM cannot effectively tackle the outliers due to the missing tags/objects. Nevertheless, SMM always achieves the match ratios greater than 92%, it effectively tackles the outliers of missing tags/objects. In Fig. 10(c) and Fig. 10(d), we evaluate the match ratios by varying the cardinalities of the extra interference objects, and the average distance between the interference objects and tagged objects, respectively. In all situations, SMM achieves the best performance over the other solutions. Cardinality of Tagged Objects 5 10 15 20 Matching Ratio 0 0.2 0.4 0.6 0.8 1 Greedy Matching (GM) Hungarian Matching (HM) Stable Marriage Matching (SMM) (a) Different cardinalities of tagged objects Missing Ratio of Tagged Objects (%) 0 10 20 30 40 Matching Ratio 0 0.2 0.4 0.6 0.8 1 Greedy Matching (GM) Hungarian Matching (HM) Stable Marriage Matching (SMM) (b) Different missing ratios of tagged objects Cardinality of Interference Objects 0 1 2 3 4 5 Matching Ratio 0 0.2 0.4 0.6 0.8 1 Greedy Matching (GM) Hungarian Matching (HM) Stable Marriage Matching (SMM) (c) Different cardinalities of interference objects Average Distance between Interference Objects and Tagged Objects (cm) 50 100 150 200 Matching Ratio 0 0.2 0.4 0.6 0.8 1 Greedy Matching (GM) Hungarian Matching (HM) Stable Marriage Matching (SMM) (d) Different distances between interference objects and tagged objects Fig. 10. Performance Evaluation 6 MATCH THE MOBILE TAGGED HUMAN SUBJECTS VIA CONTINUOUS SCANNING 6.1 Motivation In most cases, the AR systems are designed towards a mobile scenario, e.g., multiple human subjects wearing RFID badges are continuously moving around. For this mobile situation, the rotate scanning-based solution for recognizing multiple stationary tagged objects is no longer suitable. Since the locations of the tagged human subjects are continuously changing, the scanning frequency of the rotate scanning-based solution cannot be high enough to locate the positions of the tags and human subjects in a real-time manner. Nevertheless, we observe that, when multiple tagged human subjects are continuously moving, their moving traces in the two-dimensional space can be distinguishable among each other. Hence, according to the depth information and the phase information extracted from multiple tagged human subjects, we are able to derive some metric to depict the moving traces for the tags and human subjects, respectively. In this way, by matching the moving traces of tags to the corresponding human subjects, we are able to match the mobile tagged human subjects. Therefore, to recognize multiple tagged human subjects in the mobile situation, in this section, we propose a continuous scanningbased solution to pair the mobile tags with moving human subjects via trace matching. 6.2 Pair the Tags with Mobile Human Subjects via Trace Matching When deploying our system in front of multiple human subjects, where the human subjects wearing RFID badges are moving around, it is known that the state-of-art depth camera such as Kinect is able to extact the skeleton models from the human subjects. Based on the skeleton model, we can further extract the spinemid point [1] from the skeleton to represent the human subject, which is also very close to the place of RFID badge worn by the human subject, as shown in Fig.11(a). According to the two-dimensional coordinate of the spinemid point in the horizontal plane, we can figure out the moving traces of different human subjects from the depth camera, as shown in Fig.11(b). Moreover, suppose the reader/depth camera is deployed in the origin O, for any spinemid point P, we can use the angle profile to denote the angle between the vector OP and the X-axis OX, as shown in Fig.11(b). As aforementioned, using the RFID antenna pair, our system can estimate the Angle of Arrival (AoA) of the RFID tag in the horizontal plane. Then, we can similarly use the angle profile to denote the angle between the AoA direction of the tag and the X-axis. Recall that according to the phase values collected from the RFID antenna pair, there could be multiple solutions for the angle of arrival of the RFID tag. Hence, there could be multiple angle profiles corresponding to the specified tag. Therefore, while the tagged human subjects are moving from time to time, we can plot the angle profiles for both the human subjects and the tags over time. Fig.11(c) shows the corresponding angle profiles for the human subjects and RFID tags over time, where the Tag i is worn on the Body i. Note that for a specified tag, there are multiple solutions for its angle profile, we use the same color to label them. We can observe that the angle profile of the specified body has very close variation trend to one of the angle profiles of the corresponding RFID tag, as they share very similar moving traces in the horizontal plane. Therefore, in order to evaluate the correlation of the angle profile between the bodies and tags, we use the difference score to denote this correlation. Specifically, in a specified sliding window W with length L, for the lth snapshot (1 ≤ l ≤ L), suppose the angle profiles of the body Oi and the tag Tj are αi(l) and {α 0 j (l)}, respectively. Then, the difference score si,j between αi and α 0 j in W is as follows: si,j = min α0 j∈{α0 j } 1 L X L l=1 αi(l) − α 0 j (l) 2 . (6) Here we enumerate all feasible angle profiles α 0 j for the tag Tj to compare with the angle profile of αi for the body Oi , and obtain the minimum value as the difference score si,j . Fig.11(d) shows the difference scores in angle profiles between various pairs of tags and bodies, it is found that

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing 10 2 12 Tag2 (a)Extract the spinemid point (The moving traces extracted from (c)The angle profiles extracted from (d)The differencescore of angle pro- X(m) from Kinect skeleton the depth camera the depth camera and RFID system files between the objects and tags Fig.11.An example to illustrate the idea of matching the mobile tagged human subjects via continuous scanning the least difference score is achieved only for the correct tag- ning range close to the RFID reader/depth camera,since body pair.Based on the above analysis,we further propose the angle profiles change more rapidly than the radial distance Algorithm 3 to pair the tags with mobile human subjects via when the human subject is performing large movement. trace matching. However,when multiple tagged human subjects are close to each other in position,e.g.,the distances between Algorithm 3 Pair the Tags with Mobile Human Subjects via adjacent human subjects are less than 20~30cm,and they Trace Matching only have slight movements,e.g.,shaking body or turning 1:Perform continuous scanning on the human subjects around,for this situation,our trace-matching-based solution and the tags,respectively,with the depth camera and cannot further distinguish these tagged human subjects RFID antenna pair.Add the human subjects into set O purely based on the angle profiles,since the changes of angle and the tags into set T within a sliding window W. profiles from the tagged human subjects are rather small, 2:for each tag TET do which could be less than the inherent errors of the trace- 3: For each snapshot in W,extract the phases of T;from matching-based solution in usual multi-path environment the antenna pair,and figure out the angle of arrival In this situation of slight movement,we can use the radial dis- of Tj.Compute the feasible angle profiles {a () tance to distinguish the multiple tagged human subjects,by corresponding to Ti. referring to the previous solutions [27]261,since the radial 4:end for distance still has some sensitivities to the human movement. 5:for each human subject O;E O do 6: For each snapshot in W,capture the spinemid point 7 DISCUSSION from the skeleton of Oi,and calculate the angle profile i(1)of Oi. 7.1 Robustness to Environmental Variances 7:end for In the real-world environment,besides the environmental 8:while O≠②orT≠0do interferences such as the multi-path effect,path loss fading, 9: Match the objects and tags: the environmental variances like the material variances and 10: for each human subject O;EO do deployment variances could also impact the system perfor- 11: for each tag T∈Tdo mance.E.g.,when the RFID tags are deployed to different 12: Calculate the difference score sij between a;and materials like beverage can,human body or plastic toys,the {a. RF-signal features like RSSI could be totally different.This 13: end for could greatly impact the performance of the Depth-RSSI 14 Select the tag Ti.with the minimum difference pairing-based solution [8].Nevertheless,the RF-signal fea- score and pair the object Oi with the tag Ti.. tures like phase are irrelevant to these factors like different 15： end for materials,the phase describes the degree that the received 16: Calibrate the matching results:For any tag TiE T signal offsets from the sent signal,which is only correlated paired with multiple objects,select the object O;from to the relative distance and orientation between the antenna these objects with the minimum difference score si.j, and tag.Moreover,the RSSI variation is very sensitive to the and pair the object O:with the tag Ti.Respectively orientation change of the tag,whereas the phase variation is remove the object Oi and the tag Ti from set O and T. relatively insensitive to the orientation change of the tag. 17:end while In other words,as the tag orientation changes,the RSSI might be changing sharply,whereas the phase is changing Note that we use the angle profiles to depict the human relatively gently.Thus,based on the stability of the phase, movement in this paper,whereas the previous works such our Depth-Phase pairing-based solution is able to effectively as TagVision[27]and ID-Match[26]mainly use the metric address the variability of these environmental factors. radial distance,i.e.,the Euclidean distance between the reader Hence,we further evaluate the RSSI and phase values and the tagged human subject,to depict the human move- with different orientations of the tag and different materials ment.When multiple tagged human subjects are moving of the tagged objects.First,we continuously rotate the tag around in a large range,e.g.,greater than 100cm,which we to measure the RSSI and phase values with different tag call large movement,the angle profiles can depict the human orientations.Fig.12 shows the diagram of tag rotation.We movement in a more sensitive manner than radial distance, rotate the tag on two different axes,i.e.,the X-axis and especially when the human subjects are moving in the scan- Y-axis.While the tag is rotating on the Y-axis,we use 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 10 SpineMid (a) Extract the spinemid point from Kinect skeleton X (m) -2 -1 0 1 2 Z (m) 0 1 2 3 4 Body1 Body2 Body3 Angle Profile P O X (b) The moving traces extracted from the depth camera Time (second) 0 2 4 6 8 10 12 Angle Profile (degree) 0 50 100 150 200 Body1 Tag1 Body2 Tag2 Body3 Tag3 (c) The angle profiles extracted from the depth camera and RFID system Different tags Tag1 Tag2 Tag3 Difference score 0 100 200 300 400 500 600 Body1 Body2 Body3 (d) The difference score of angle pro- files between the objects and tags Fig. 11. An example to illustrate the idea of matching the mobile tagged human subjects via continuous scanning the least difference score is achieved only for the correct tagbody pair. Based on the above analysis, we further propose Algorithm 3 to pair the tags with mobile human subjects via trace matching. Algorithm 3 Pair the Tags with Mobile Human Subjects via Trace Matching 1: Perform continuous scanning on the human subjects and the tags, respectively, with the depth camera and RFID antenna pair. Add the human subjects into set O and the tags into set T within a sliding window W. 2: for each tag Tj ∈ T do 3: For each snapshot in W, extract the phases of Tj from the antenna pair, and figure out the angle of arrival of Tj . Compute the feasible angle profiles {α 0 j (l)} corresponding to Tj . 4: end for 5: for each human subject Oi ∈ O do 6: For each snapshot in W, capture the spinemid point from the skeleton of Oi , and calculate the angle profile αi(l) of Oi . 7: end for 8: while O 6= ∅ or T 6= ∅ do 9: Match the objects and tags: 10: for each human subject Oi ∈ O do 11: for each tag Tj ∈ T do 12: Calculate the difference score si,j between αi and {α 0 j }. 13: end for 14: Select the tag Tj∗ with the minimum difference score and pair the object Oi with the tag Tj∗. 15: end for 16: Calibrate the matching results: For any tag Tj ∈ T paired with multiple objects, select the object Oi from these objects with the minimum difference score si,j , and pair the object Oi with the tag Tj . Respectively remove the object Oi and the tag Tj from set O and T. 17: end while Note that we use the angle profiles to depict the human movement in this paper, whereas the previous works such as TagVision[27] and ID-Match[26] mainly use the metric radial distance, i.e., the Euclidean distance between the reader and the tagged human subject, to depict the human movement. When multiple tagged human subjects are moving around in a large range, e.g., greater than 100cm, which we call large movement, the angle profiles can depict the human movement in a more sensitive manner than radial distance, especially when the human subjects are moving in the scanning range close to the RFID reader/ depth camera, since the angle profiles change more rapidly than the radial distance when the human subject is performing large movement. However, when multiple tagged human subjects are close to each other in position, e.g., the distances between adjacent human subjects are less than 20∼30cm, and they only have slight movements, e.g., shaking body or turning around, for this situation, our trace-matching-based solution cannot further distinguish these tagged human subjects purely based on the angle profiles, since the changes of angle profiles from the tagged human subjects are rather small, which could be less than the inherent errors of the tracematching-based solution in usual multi-path environment. In this situation of slight movement, we can use the radial distance to distinguish the multiple tagged human subjects, by referring to the previous solutions [27][26], since the radial distance still has some sensitivities to the human movement. 7 DISCUSSION 7.1 Robustness to Environmental Variances In the real-world environment, besides the environmental interferences such as the multi-path effect, path loss fading, the environmental variances like the material variances and deployment variances could also impact the system performance. E.g., when the RFID tags are deployed to different materials like beverage can, human body or plastic toys, the RF-signal features like RSSI could be totally different. This could greatly impact the performance of the Depth-RSSI pairing-based solution [8]. Nevertheless, the RF-signal features like phase are irrelevant to these factors like different materials, the phase describes the degree that the received signal offsets from the sent signal, which is only correlated to the relative distance and orientation between the antenna and tag. Moreover, the RSSI variation is very sensitive to the orientation change of the tag, whereas the phase variation is relatively insensitive to the orientation change of the tag. In other words, as the tag orientation changes, the RSSI might be changing sharply, whereas the phase is changing relatively gently. Thus, based on the stability of the phase, our Depth-Phase pairing-based solution is able to effectively address the variability of these environmental factors. Hence, we further evaluate the RSSI and phase values with different orientations of the tag and different materials of the tagged objects. First, we continuously rotate the tag to measure the RSSI and phase values with different tag orientations. Fig.12 shows the diagram of tag rotation. We rotate the tag on two different axes, i.e., the X-axis and Y -axis. While the tag is rotating on the Y -axis, we use

点击进入文档下载页（PDF格式）

共15页，试读已结束，阅读完整版请下载

点击下载（PDF格式）

浏览记录