This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2018.2857812.IEEE Transactions on Mobile Computing 2 phase and RSSI from the RFID tags.By pairing these in- interference.The second challenge is mitigating the interfer- formation together,the vision of "tell me what I see"can ences from the multi-path effect,object occlusion in real set- be effectively realized in the AR system.In comparison tings.These issues lead to nonnegligible interference to pair to the pure AR system,which can only show some basic the tags with the objects,such as the missing tags/objects information like the gender and race according to the vision- which fail to be identified as well as extra objects which based pattern recognition,by leveraging this novel RFID are untagged.The third challenge is designing an efficient assisted AR technology,the inherent information such as solution without any additional assistance,like the anchor their names,jobs and titles can be directly extracted from nodes.It is impractical to intentionally deploy anchor nodes the RFID tags and associated with the corresponding human in real AR applications due to intensive deployment costs subjects in the camera's view.For example,when we are on manpower and time. meeting multiple unknown people wearing RFID badges in This paper presents the first study of using RFID to assist public events,the system can effectively help us recognize recognizing multiple objects in AR systems(a preliminary these people by illustrating the detailed information on the version of this work appeared in [8)).Specifically,we make camera's view in a smart glass.The second scenario is to three key contributions:1)We propose TaggedAR to real- recognize different cultural relics in the museum,as shown ize the vision "tell me what I see"from AR systems.By in Fig.1(b).In this scenario,multiple cultural relics like the sufficiently exploring the correlations between the depth of ancient potteries are placed on the display racks.Due to the field and the RF-signal,we propose a rotate scanning-based same craftsmanship,they might have very similar natural scheme to distinguish multiple tagged objects in the sta- features like the color and shape from the visual sense.This tionary situation,and propose a continuous scanning-based prohibits the pure AR system from distinguishing different scheme to distinguish multiple tagged human subjects in objects when they have very similar physical features.In the mobile situation.2)We efficiently tackle the interference contrast,using our RFID assisted AR technology,these ob- from the multi-path effect,object occlusion in real settings, jects can be easily distinguished according to the differences by reducing this problem to a stable marriage problem in the labeling tags.In summary,the advantages of RFID and propose a stable-matching-based solution to mitigate assisted AR systems over the pure AR systems lie in the the interferences from the outliers.3)We implemented a essential capability of identification and localization in RFID. prototype system and evaluated the performance with case Although many schemes for RFID-based localization studies in real-world environment.Our solution achieves an [4,5]have been proposed,they mainly focus on the ab- average match ratio of 91%in distinguishing up to dozens solute object localization,and usually require anchor nodes of RFID tagged objects with a high deployment density. like reference tags for accurate localization.They are not suitable for distinguishing multiple tagged objects because 2 RELATED WORK of two reasons.First,we only require distinguishing the Pattern recognition via depth camera:Pattern recognition relative location instead of absolute location of multiple via depth camera mainly leverages the depth and RGB tagged objects,by pairing the tags to the objects based on captured from the camera to recognize objects in a computer the correlation between the depth of field and RE-signals. vision-based approach.Based on the depth processing [9],a Second,the depth camera cannot effectively use the anchor number of technologies are proposed in object recognition nodes,and it is impractical to deploy multiple anchor nodes [10]and gesture recognition [11,12].Nirjon et al.solve the in most AR applications. problem of localizing and tracking household objects using In this paper,we leverage the RFID technology [6,7]to depth-camera sensors [13].The Kinect-based pose estima- further label different objects with RFID tags.We deploy tion method [11]is proposed in the context of physical additional RFID antennas to the COTS depth camera.To exercise,examining the accuracy of joint localization and recognize the stationary tagged objects,we propose a rotate robustness of pose estimation with respect to the orientation scanning-based scheme to scan the objects,i.e.,the system and occlusions. continuously rotates and samples the depth of field and RF- Batteryless sensing via RFID:RFID has recently been signals from these tagged objects.We extract the phase value investigated as a new scheme of batteryless sensing,includ- from RF-signal,and pair the tags with the objects according ing indoor localization [14],activity sensing [15],physical to the correlation between the depth value and phase value. object search [16],etc.Prior work on RFID-based localization Similarly,to recognize the mobile tagged human subjects, primarily relied on Received Signal Strength [14]or Angle we propose a continuous scanning-based scheme to scan of Arrival [17]to acquire the absolute location of an object. the human subjects,i.e.,the system continuously samples The state-of-the-art systems use the phase value to estimate the depth of field and RF-signals from these tagged human the absolute or relative location of an object with higher subjects.In this way,we can accurately identify and distin- accuracy [6,18-20].RF-IDraw uses a 2-dimensional array guish multiple tagged objects,by sufficiently exploring the of RFID antennas to track the movement trajectory of one correlations between the depth of field and the RF-signal. finger attached with an RFID tag so that it can reconstruct However,there are several challenges in distinguishing the trajectory shape of the specified finger [21].Tagoram multiple tagged objects in AR systems.The first challenge exploits tag mobility to build a virtual antenna array,and is conducting accurate paring between the objects and the uses differential augmented hologram to facilitate the in- tags.In real applications,the tagged objects are usually stant tracking of a mobile RFID tag [4]. placed in very close proximity,and the number of objects Combined use in augmented reality environment: is usually in the order of dozens.It is difficult to realize Recent works further consider using both depth camera accurate paring due to the large cardinality and mutual and RFID for indoor localization and object recognition in 1536-1233(c)2018 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2018.2857812, IEEE Transactions on Mobile Computing 2 phase and RSSI from the RFID tags. By pairing these information together, the vision of “tell me what I see” can be effectively realized in the AR system. In comparison to the pure AR system, which can only show some basic information like the gender and race according to the visionbased pattern recognition, by leveraging this novel RFID assisted AR technology, the inherent information such as their names, jobs and titles can be directly extracted from the RFID tags and associated with the corresponding human subjects in the camera’s view. For example, when we are meeting multiple unknown people wearing RFID badges in public events, the system can effectively help us recognize these people by illustrating the detailed information on the camera’s view in a smart glass. The second scenario is to recognize different cultural relics in the museum, as shown in Fig. 1(b). In this scenario, multiple cultural relics like the ancient potteries are placed on the display racks. Due to the same craftsmanship, they might have very similar natural features like the color and shape from the visual sense. This prohibits the pure AR system from distinguishing different objects when they have very similar physical features. In contrast, using our RFID assisted AR technology, these objects can be easily distinguished according to the differences in the labeling tags. In summary, the advantages of RFID assisted AR systems over the pure AR systems lie in the essential capability of identification and localization in RFID. Although many schemes for RFID-based localization [4, 5] have been proposed, they mainly focus on the absolute object localization, and usually require anchor nodes like reference tags for accurate localization. They are not suitable for distinguishing multiple tagged objects because of two reasons. First, we only require distinguishing the relative location instead of absolute location of multiple tagged objects, by pairing the tags to the objects based on the correlation between the depth of field and RF-signals. Second, the depth camera cannot effectively use the anchor nodes, and it is impractical to deploy multiple anchor nodes in most AR applications. In this paper, we leverage the RFID technology [6, 7] to further label different objects with RFID tags. We deploy additional RFID antennas to the COTS depth camera. To recognize the stationary tagged objects, we propose a rotate scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RFsignals from these tagged objects. We extract the phase value from RF-signal, and pair the tags with the objects according to the correlation between the depth value and phase value. Similarly, to recognize the mobile tagged human subjects, we propose a continuous scanning-based scheme to scan the human subjects, i.e., the system continuously samples the depth of field and RF-signals from these tagged human subjects. In this way, we can accurately identify and distinguish multiple tagged objects, by sufficiently exploring the correlations between the depth of field and the RF-signal. However, there are several challenges in distinguishing multiple tagged objects in AR systems. The first challenge is conducting accurate paring between the objects and the tags. In real applications, the tagged objects are usually placed in very close proximity, and the number of objects is usually in the order of dozens. It is difficult to realize accurate paring due to the large cardinality and mutual interference. The second challenge is mitigating the interferences from the multi-path effect, object occlusion in real settings. These issues lead to nonnegligible interference to pair the tags with the objects, such as the missing tags/objects which fail to be identified as well as extra objects which are untagged. The third challenge is designing an efficient solution without any additional assistance, like the anchor nodes. It is impractical to intentionally deploy anchor nodes in real AR applications due to intensive deployment costs on manpower and time. This paper presents the first study of using RFID to assist recognizing multiple objects in AR systems (a preliminary version of this work appeared in [8]). Specifically, we make three key contributions : 1) We propose TaggedAR to realize the vision “tell me what I see” from AR systems. By sufficiently exploring the correlations between the depth of field and the RF-signal, we propose a rotate scanning-based scheme to distinguish multiple tagged objects in the stationary situation, and propose a continuous scanning-based scheme to distinguish multiple tagged human subjects in the mobile situation. 2) We efficiently tackle the interference from the multi-path effect, object occlusion in real settings, by reducing this problem to a stable marriage problem and propose a stable-matching-based solution to mitigate the interferences from the outliers. 3) We implemented a prototype system and evaluated the performance with case studies in real-world environment. Our solution achieves an average match ratio of 91% in distinguishing up to dozens of RFID tagged objects with a high deployment density. 2 RELATED WORK Pattern recognition via depth camera: Pattern recognition via depth camera mainly leverages the depth and RGB captured from the camera to recognize objects in a computer vision-based approach. Based on the depth processing [9], a number of technologies are proposed in object recognition [10] and gesture recognition [11, 12]. Nirjon et al. solve the problem of localizing and tracking household objects using depth-camera sensors [13]. The Kinect-based pose estimation method [11] is proposed in the context of physical exercise, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions. Batteryless sensing via RFID: RFID has recently been investigated as a new scheme of batteryless sensing, including indoor localization [14] , activity sensing [15], physical object search [16], etc. Prior work on RFID-based localization primarily relied on Received Signal Strength [14] or Angle of Arrival [17] to acquire the absolute location of an object. The state-of-the-art systems use the phase value to estimate the absolute or relative location of an object with higher accuracy [6, 18–20]. RF-IDraw uses a 2-dimensional array of RFID antennas to track the movement trajectory of one finger attached with an RFID tag so that it can reconstruct the trajectory shape of the specified finger [21]. Tagoram exploits tag mobility to build a virtual antenna array, and uses differential augmented hologram to facilitate the instant tracking of a mobile RFID tag [4]. Combined use in augmented reality environment: Recent works further consider using both depth camera and RFID for indoor localization and object recognition in