计算机科学与技术（参考文献）Managing RFID Data - Challenges, Opportunities and Solutions

团购合买资源类别：文库，文档格式：PDF，文档页数：18，文件大小：1.4MB

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. IEEE COMMUNICATIONS SURVEYS TUTORIALS.ACCEPTED FOR PUBLICATION Managing RFID Data: Challenges,Opportunities and Solutions Lei Xie,Member.IEEE,Yafeng Yin,Student Member,IEEE,Athanasios V.Vasilakos,Senior Member,IEEE,and Sanglu Lu,Member IEEE Abstract-The advances of Radio-Frequency Identification In conventional RFID applications,the items are attached (RFID)technology have significantly enhanced the capability of with RFID tags and densely placed in a large scale.The RFID capturing data from pervasive space.It becomes a great challenge tag is a small mircochip attached with an antenna in a package in the information era to effectively understand human behavior, mobility and activity through the perceived RFID data.Focusing During the scanning process,the RFID reader powers up and on RFID data management,this article provides an overview of transmits a continuous wave to energize the tags.The tag then current challenges,emerging opportunities and recent progresses responds to the reader with tag-carried information by mod- in RFID.In particular,this article has described and analyzed ulating the backscattered signals.The reader further decodes the research work on three aspects:algorithm,protocol and performance evaluation.We investigate the research progress in the signal and obtains the corresponding information 678 RFID with anti-collision algorithms,authentication and privacy In this way,the RFID system can not only "identify"but protection protocols,localization and activity sensing,as well as also "locate"the labeled item,by providing the information performance tuning in realistic settings.We emphasize the basic including the "identity"as well as the "location".Here,the principles of RFID data management to understand the state- "identity"is precise information extracted from the tag,while of-the-art and to address directions of future research in RFID. the "location"is usually inaccurate information estimated according to environmental parameters.Furthermore,these Index Terms-RFID,data management,anti-collision algo- RFID applications often involve lots of human activities,in rithms,authentication and privacy protection,localization and order to precisely understand human behavior,mobility and activity sensing,performance evaluation. activity through the perceived RFID data,it is essential to ef- fectively manage RFID data to extract the useful information, I.INTRODUCTION while ensuring the overall performance.How to effectively manage RFID data?As a matter of fact,several properties are S A TECHNOLOGY for automated identification of ob- jects,RFID(Radio-Frequency IDentification)has gained essentially required for effective RFID data management.We elaborate on these properties as follows: significant momentum in the past few years.Recently RFID has been more and more frequently introduced in applications Efficiency:while interrogating a number of tags,two met- like retail and distribution,target monitoring and tracking,etc, rics are highly pertinent to efficiency,i.e.,time efficiency as elaborated in literatures [1,[2],and [3].The most impor- to reduce the total scanning time,and energy efficiency tant reason of such phenomenon is that,by leveraging the to reduce the total power consumption. RFID technology,we are able to embed the intelligence into Trustworthy:the communication between the reader each physical object in a low-cost approach.In this way,the and tags should be both privacy-preserving and anti- "passive intelligence"is effectively realized in the ubiquitous counterfeiting. space,which can be regarded as a right candidate for green Localizability:the objects or people should be accurately communications [45.As such a novel technology,RFID located within the specified range of time delay. has provided us with 1)the capability to uniquely identify Reliability:in realistic settings,the RFID system should the objects in the physical world and 2)a passive,battery- be able to tackle with issues like signal interference, free interconnection mechanism.Furthermore,the dropping multi-path effect and energy absorption. tag costs and vigorous RFID standardization have accelerated These properties basically belong to the nonfunctional the wide-spread usage of the RFID technology. requirements for RFID systems.According to the above descriptions,it is essential to provide some mechanisms Manuscript received May 8.2013:revised November 15,2013.L.Xie,Y. to effectively satisfy the above nonfunctional requirements Yin,and S.Lu are supported in part by National Natural Science Foundation of China under Grant No.61100196,61321491.91218302:JiangSu Natural Therefore,several new research topics are necessarily to Science Foundation under Grant No.BK2011559:Key Project of Jiangsu be addressed for RFID data management.Specifically,they Research Program under Grant No.BE2013116:EU FP7 IRSES MobileCloud include anti-collision algorithms,authentication and privacy Project under Grant No.612212.Lei Xie is the corresponding author. L.Xie,Y.Yin,and S.Lu are with the State Key Laboratory for Novel protection protocols,localization and activity sensing,as well Software Technology.Nanjing University,China (e-mail:Ixie@nju.edu.cn as performance tuning in realistic settings.Fig.I illustrates yyf@dislab.nju.edu.cn,sanglu@nju.edu.cn). A.Vasilakos is with University of Western Macedonia,Greece (e-mail: these research topics corresponding to the above properties.In vasilako@ath.forthnet.gr). fact,these research topics are not independent of each other, Digital Object Identifier 10.1109/SURV.2014.022614.00143 but are correlated with each other.Besides,since these topics 1553-877X/14/$31.00©20141EEE

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION 1 Managing RFID Data: Challenges, Opportunities and Solutions Lei Xie, Member, IEEE, Yafeng Yin, Student Member, IEEE, Athanasios V. Vasilakos, Senior Member, IEEE, and Sanglu Lu, Member, IEEE Abstract—The advances of Radio-Frequency Identification (RFID) technology have significantly enhanced the capability of capturing data from pervasive space. It becomes a great challenge in the information era to effectively understand human behavior, mobility and activity through the perceived RFID data. Focusing on RFID data management, this article provides an overview of current challenges, emerging opportunities and recent progresses in RFID. In particular, this article has described and analyzed the research work on three aspects: algorithm, protocol and performance evaluation. We investigate the research progress in RFID with anti-collision algorithms, authentication and privacy protection protocols, localization and activity sensing, as well as performance tuning in realistic settings. We emphasize the basic principles of RFID data management to understand the stateof-the-art and to address directions of future research in RFID. Index Terms—RFID, data management, anti-collision algorithms, authentication and privacy protection, localization and activity sensing, performance evaluation. I. INTRODUCTION AS A TECHNOLOGY for automated identification of objects, RFID (Radio-Frequency IDentification) has gained significant momentum in the past few years. Recently RFID has been more and more frequently introduced in applications like retail and distribution, target monitoring and tracking, etc, as elaborated in literatures [1], [2], and [3]. The most important reason of such phenomenon is that, by leveraging the RFID technology, we are able to embed the intelligence into each physical object in a low-cost approach. In this way, the “passive intelligence” is effectively realized in the ubiquitous space, which can be regarded as a right candidate for green communications [4][5]. As such a novel technology, RFID has provided us with 1) the capability to uniquely identify the objects in the physical world and 2) a passive, batteryfree interconnection mechanism. Furthermore, the dropping tag costs and vigorous RFID standardization have accelerated the wide-spread usage of the RFID technology. Manuscript received May 8, 2013; revised November 15, 2013. L. Xie, Y. Yin, and S. Lu are supported in part by National Natural Science Foundation of China under Grant No. 61100196, 61321491, 91218302; JiangSu Natural Science Foundation under Grant No. BK2011559; Key Project of Jiangsu Research Program under Grant No. BE2013116; EU FP7 IRSES MobileCloud Project under Grant No. 612212. Lei Xie is the corresponding author. L. Xie, Y. Yin, and S. Lu are with the State Key Laboratory for Novel Software Technology, Nanjing University, China (e-mail: lxie@nju.edu.cn, yyf@dislab.nju.edu.cn, sanglu@nju.edu.cn). A. Vasilakos is with University of Western Macedonia, Greece (e-mail: vasilako@ath.forthnet.gr). Digital Object Identifier 10.1109/SURV.2014.022614.00143 In conventional RFID applications, the items are attached with RFID tags and densely placed in a large scale. The RFID tag is a small mircochip attached with an antenna in a package. During the scanning process, the RFID reader powers up and transmits a continuous wave to energize the tags. The tag then responds to the reader with tag-carried information by modulating the backscattered signals. The reader further decodes the signal and obtains the corresponding information [6][7][8]. In this way, the RFID system can not only “identify” but also “locate” the labeled item, by providing the information including the “identity” as well as the “location”. Here, the “identity” is precise information extracted from the tag, while the “location” is usually inaccurate information estimated according to environmental parameters. Furthermore, these RFID applications often involve lots of human activities, in order to precisely understand human behavior, mobility and activity through the perceived RFID data, it is essential to effectively manage RFID data to extract the useful information, while ensuring the overall performance. How to effectively manage RFID data? As a matter of fact, several properties are essentially required for effective RFID data management. We elaborate on these properties as follows: • Efficiency: while interrogating a number of tags, two metrics are highly pertinent to efficiency, i.e., time efficiency to reduce the total scanning time, and energy efficiency to reduce the total power consumption. • Trustworthy: the communication between the reader and tags should be both privacy-preserving and anticounterfeiting. • Localizability: the objects or people should be accurately located within the specified range of time delay. • Reliability: in realistic settings, the RFID system should be able to tackle with issues like signal interference, multi-path effect and energy absorption. These properties basically belong to the nonfunctional requirements for RFID systems. According to the above descriptions, it is essential to provide some mechanisms to effectively satisfy the above nonfunctional requirements. Therefore, several new research topics are necessarily to be addressed for RFID data management. Specifically, they include anti-collision algorithms, authentication and privacy protection protocols, localization and activity sensing, as well as performance tuning in realistic settings. Fig. 1 illustrates these research topics corresponding to the above properties. In fact, these research topics are not independent of each other, but are correlated with each other. Besides, since these topics 1553-877X/14/$31.00 c 2014 IEEE This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. 2 IEEE COMMUNICATIONS SURVEYS TUTORIALS.ACCEPTED FOR PUBLICATION mainly focus on different layers of the protocol stack,the TABLE I relationship between these topics and the protocol stack is also CHALLENGES AND OPPORTUNITIES FOR RFID DATA MANAGEMENT shown in the figure.This figure provides us a clear structured (C DENOTES CHALLENGES,O DENOTES OPPORTUNITIES) framework of these research topics. PHY Layer MAC Layer APP Layer The main contributions of this article are summarized as Anti-collision Al- C2),(3) C(2) follows: gorithm 0:(2) 0:(2) Authentication C:2).3) We investigate the current challenges and emerging op- Privacy Protection 0:1).(4) 0:2).(3) portunities in managing RFID data,and provide some Localization Ac- C(1) guidance and substantial lessons learned in the research tivity Sensing 0:(4) 0:(1),5 Performance C:(1) C(3) of RFID data management. Tuning in Realistic We emphasize the basic principles of RFID data manage- Settings ment to understand the state-of-the-art research achieve- ments.Furthermore,we present comparative studies over unstable communication is really a challenging problem existing approaches to analyze their features specifically for passive RFID systems. and concretely. 2) Due to the scarce resource in the tag side,the logical We provide a long term vision for future research di- processing in the tag is required to be simple enough, rections,with more emphasis on insights into future challenges and opportunities in RFID research. the storage requirement in the tag should be limited to several kilobytes.Therefore,it is actually very difficult The rest of this article is organized as follows.We first to implement comprehensive and powerful functions in present the challenges and opportunities for RFID data man- agement in Section II.We then respectively introduce the the RFID systems while satisfying the scarce resource constraint. research progresses of anti-collision algorithms,authentication 3)There already exists an industry standard ISO 18000- and privacy protection protocols,as well as localization and 6C (also known as EPC Class 1 Generation 2)which activity sensing in Section III,IV,and V.After that,in specifies how RFID readers identify and read RFID Section VI we further provide a summative discussion from a tags,the design of protocols and algorithms over RFID more realistic perspective,i.e.,performance tuning in realistic systems should conform to the standard,or at least be settings.We envision the future research directions in Section subject to minor changes.In contrast with the other VⅡand conclude in Section VⅢ wireless devices like sensor nodes,smart phones,etc, since the chips in the tags conventionally cannot be II.CHALLENGES AND OPPORTUNITIES FOR RFID DATA reprogrammed,the only part we can freely reprogram is MANAGEMENT the RFID reader,still the standard should be conformed. Compared to the other intelligent systems like two- Any solutions which requires to substantially change the dimension code,sensor network,etc.,the current design of tag's execution logics are considered to be impractical RFID system has provided us the following opportunities for for RFID systems. effective RFID data management [9: Therefore,in order to effectively investigate into the tech- 1)RFID tags can be automatically identified in a non- niques of managing RFID data,one intuitive thinking is to contact approach. sufficiently leverage the potential opportunities to overcome 2)RFID tag contains a certain number of logical gates, the emerging challenges.In order to distinguish these opportu- which support simple logical processing in the tag side. nities and challenges more closely,Table I has categorized the 3)RFID tag is able to store some data permanently in its above challenges and opportunities according to the research tiny on-board memory. topics and involved layers in the protocol stack.In this way, 4)The backscattered signal from RFID tag has properties we can get a better understanding of how to deal with these like any other wireless devices,e.g.,the received signal challenges and opportunities. strength (RSS). 5)RFID tag is very cheap,which makes the large scale- III.ANTI-COLLISION ALGORITHMS IN RFID SYSTEMS deployment become possible. In conventional RFID applications,a large number of RFID Nevertheless,there exist some significant challenges that tags are widely deployed in the specified regions.In order to must be overcome before the benefits are realized.The major identify these tags in a fast approach,it is essential for the challenges of managing RFID data are summarized as follows: RFID reader to utilize an anti-collision algorithm to effectively 1)The communication link between the RFID reader and read these tags one by one.In wireless environment,conven- the tag is not stable,which is susceptible to issues tional wireless devices mainly leverage CSMA/CA to realize like signal interference,multi-path effect and energy the communication among multiple devices,e.g.,the 802.11 absorption.Unlike those battery-powered systems,the protocol series.Being different from the conventional wireless passive RFID system is especially vulnerable to the devices,the RFID tag is a rather simple wireless device with ambient noises and interferences due to the backscatter scarce resource.The tags are unable to self-regulate their radio property.For example,the missed reading phenomenon transmissions to avoid collisions.Specifically,the tag does is very common for RFID systems even in a relatively not have enough processing capacity and energy to realize ideal situation close to free space.How to deal with the the above competition mechanisms to avoid transmission

2 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION mainly focus on different layers of the protocol stack, the relationship between these topics and the protocol stack is also shown in the figure. This figure provides us a clear structured framework of these research topics. The main contributions of this article are summarized as follows: • We investigate the current challenges and emerging opportunities in managing RFID data, and provide some guidance and substantial lessons learned in the research of RFID data management. • We emphasize the basic principles of RFID data management to understand the state-of-the-art research achievements. Furthermore, we present comparative studies over existing approaches to analyze their features specifically and concretely. • We provide a long term vision for future research directions, with more emphasis on insights into future challenges and opportunities in RFID research. The rest of this article is organized as follows. We first present the challenges and opportunities for RFID data management in Section II. We then respectively introduce the research progresses of anti-collision algorithms, authentication and privacy protection protocols, as well as localization and activity sensing in Section III, IV, and V. After that, in Section VI we further provide a summative discussion from a more realistic perspective, i.e., performance tuning in realistic settings. We envision the future research directions in Section VII and conclude in Section VIII. II. CHALLENGES AND OPPORTUNITIES FOR RFID DATA MANAGEMENT Compared to the other intelligent systems like twodimension code, sensor network, etc., the current design of RFID system has provided us the following opportunities for effective RFID data management [9]: 1) RFID tags can be automatically identified in a noncontact approach. 2) RFID tag contains a certain number of logical gates, which support simple logical processing in the tag side. 3) RFID tag is able to store some data permanently in its tiny on-board memory. 4) The backscattered signal from RFID tag has properties like any other wireless devices, e.g., the received signal strength (RSS). 5) RFID tag is very cheap, which makes the large scaledeployment become possible. Nevertheless, there exist some significant challenges that must be overcome before the benefits are realized. The major challenges of managing RFID data are summarized as follows: 1) The communication link between the RFID reader and the tag is not stable, which is susceptible to issues like signal interference, multi-path effect and energy absorption. Unlike those battery-powered systems, the passive RFID system is especially vulnerable to the ambient noises and interferences due to the backscatter property. For example, the missed reading phenomenon is very common for RFID systems even in a relatively ideal situation close to free space. How to deal with the TABLE I CHALLENGES AND OPPORTUNITIES FOR RFID DATA MANAGEMENT (C DENOTES CHALLENGES, O DENOTES OPPORTUNITIES) PHY Layer MAC Layer APP Layer Anti-collision Algorithm C:(2),(3) O:(2) C:(2) O:(2) Authentication & Privacy Protection O:(1),(4) C:(2),(3) O:(2),(3) Localization & Activity Sensing C:(1) O:(4) O:(1),(5) Performance Tuning in Realistic Settings C:(1) C:(3) unstable communication is really a challenging problem for passive RFID systems. 2) Due to the scarce resource in the tag side, the logical processing in the tag is required to be simple enough, the storage requirement in the tag should be limited to several kilobytes. Therefore, it is actually very difficult to implement comprehensive and powerful functions in the RFID systems while satisfying the scarce resource constraint. 3) There already exists an industry standard ISO 18000- 6C (also known as EPC Class 1 Generation 2) which specifies how RFID readers identify and read RFID tags, the design of protocols and algorithms over RFID systems should conform to the standard, or at least be subject to minor changes. In contrast with the other wireless devices like sensor nodes, smart phones, etc, since the chips in the tags conventionally cannot be reprogrammed, the only part we can freely reprogram is the RFID reader, still the standard should be conformed. Any solutions which requires to substantially change the tag’s execution logics are considered to be impractical for RFID systems. Therefore, in order to effectively investigate into the techniques of managing RFID data, one intuitive thinking is to sufficiently leverage the potential opportunities to overcome the emerging challenges. In order to distinguish these opportunities and challenges more closely, Table I has categorized the above challenges and opportunities according to the research topics and involved layers in the protocol stack. In this way, we can get a better understanding of how to deal with these challenges and opportunities. III. ANTI-COLLISION ALGORITHMS IN RFID SYSTEMS In conventional RFID applications, a large number of RFID tags are widely deployed in the specified regions. In order to identify these tags in a fast approach, it is essential for the RFID reader to utilize an anti-collision algorithm to effectively read these tags one by one. In wireless environment, conventional wireless devices mainly leverage CSMA/CA to realize the communication among multiple devices, e.g., the 802.11 protocol series. Being different from the conventional wireless devices, the RFID tag is a rather simple wireless device with scarce resource. The tags are unable to self-regulate their radio transmissions to avoid collisions. Specifically, the tag does not have enough processing capacity and energy to realize the above competition mechanisms to avoid transmission This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. XIE er al:MANAGING RFID DATA:CHALLENGES.OPPORTUNITIES AND SOLUTIONS 3 Nonfunctional Research Topics in Protocol Stack Requirements RFID Data Management Efficiency Anti-collision Algorithms PHY Layer Authentication and Privacy Trustworthy Protection Protocols MAC Layer Localization and Activity Localizability Sensing APP Layer Reliability Performance Tuning in Realistic Settings Fig.1.New Research Topics in RFID Data Management collisions.Therefore,effective anti-collision algorithms should TABLE II be customized for RFID systems.Klair et al.provided an COMPARATIVE STUDY OF THE TWO KINDS OF ANTI-COLLISION ALGORITHMS overview of RFID anti-collision protocols [10].Here,to be more specifically,we further classify and elaborate the princi- Binary tree-based anti- Slotted ALOHA-based ples of the most recent research progresses.In the following collision algorithms anti-collision algorithms Advantages Normally deterministic al- The randomized algorithm subsections,we first introduce the anti-collision algorithms in gorithm is used. is used with a good perfor- tag identification,then,two kinds of ingenious techniques are The query tree-based al- mance on average. respectively introduced,i.e.,estimating the tag size and polling gorithms do not require The randomization makes to store the intermediate the results over slots meet the tags based on the anti-collision algorithms. state variable. a certain probability distri- bution,facilitating various statistical analysis A.Tag Identification Disadvantages For the query tree-based The randomness of the al- In regard to the features of RFID system,an effective algorithms,the delay for gorithm makes the "star- tag identification is sub. vation problem"possible. tag identification protocol is expected to have the following ject to the length and dis- some tags can never se properties:1)Simple:the processing logic of the protocol tribution of tag IDs. lect a singleton slot to re- (including the execution flow and the state transition)should spond.In the worst case. the delay approaches+. be as simple as possible,due to the scarce resource in the tag; 2)Efficient:when interrogating a large number of tags,the protocol should provide a light-weight communication mech- anism to avoid unnecessary transmissions of control messages. range.At the beginning of each frame,the reader broadcasts In this way,the high throughput and low transmission delay the frame size f to the surrounding tags,i.e.,the number can be guaranteed. of slots in the following frame.After each tag receives the The current anti-collision algorithms can be divided into frame size f,it randomly selects a slot from the Ist slot to two categories:the tree based anti-collision algorithms [11,12] the fth slot in this frame and responds to the reader.If the tag and the slotted ALOHA based anti-collision algorithms [13- successfully responds,i.e.,no collision happens in the selected 18].The former leverages the binary search tree to divide the slot,then the specified tag is kept silent in the following query collided tag set into two subsets in a recursive approach.In rounds;otherwise,the tag will continue to select another slot regard to those tag sets with opportunities of collisions,the in the next frame to resend the ID.Therefore,there exists one "keeping silent mechanism is used to resolve the collision of the following situations for each slot within the frame:1) problems.The method to divide the sets is comprised of empty slot:no tag responds in this slot;2)singleton slot:only the random binary tree algorithms and the query binary tree one unique tag responds in this slot;3)collision slot:multiple algorithms.Myung et al.[11]proposed an adaptive tree-based tags respond in this slot.Each kind of slot gives different anti-collision algorithm to efficiently identify the tags.Pan et information.Currently,the slotted ALOHA protocol has been al.[12]proposed a smart tree traversal mechanism based on adopted in the EPC C1G2 standard. the query tree,which conducts the tag identification with low In Table II,we respectively compare the two kinds of anti- delay. collision algorithms in terms of advantages and disadvantages. The ALOHA protocol is first used for random access in Generally speaking,there are pros and cons to both kinds of packet radio network.In order to improve the efficiency of tag algorithms. identification in RFID systems,the slotted ALOHA protocol In regard to the slotted ALOHA protocol,the frame size [13,14]is proposed to effectively resolve the collisions.The for each query round is of crucial importance to the overall slotted ALOHA protocol combines a certain number of time performance for identification.For a given tag set,if the frame slots into a"frame".During the interrogation,the reader sends size is too large,then most of the slots in this frame will a continuous wave to energize all tags in the effective scanning be empty slots,causing the waste of slots;if the frame size

XIE et al.: MANAGING RFID DATA: CHALLENGES, OPPORTUNITIES AND SOLUTIONS 3 Nonfunctional Requirements Efficiency Trustworthy Localizability Reliability Research Topics in RFID Data Management Anti-collision Algorithms Authentication and Privacy Protection Protocols Localization and Activity Sensing Performance Tuning in Realistic Settings Protocol Stack PHY Layer MAC Layer APP Layer Fig. 1. New Research Topics in RFID Data Management collisions. Therefore, effective anti-collision algorithms should be customized for RFID systems. Klair et al. provided an overview of RFID anti-collision protocols [10]. Here, to be more specifically, we further classify and elaborate the principles of the most recent research progresses. In the following subsections, we first introduce the anti-collision algorithms in tag identification, then, two kinds of ingenious techniques are respectively introduced, i.e., estimating the tag size and polling the tags based on the anti-collision algorithms. A. Tag Identification In regard to the features of RFID system, an effective tag identification protocol is expected to have the following properties: 1) Simple: the processing logic of the protocol (including the execution flow and the state transition) should be as simple as possible, due to the scarce resource in the tag; 2) Efficient: when interrogating a large number of tags, the protocol should provide a light-weight communication mechanism to avoid unnecessary transmissions of control messages. In this way, the high throughput and low transmission delay can be guaranteed. The current anti-collision algorithms can be divided into two categories: the tree based anti-collision algorithms [11, 12] and the slotted ALOHA based anti-collision algorithms [13– 18]. The former leverages the binary search tree to divide the collided tag set into two subsets in a recursive approach. In regard to those tag sets with opportunities of collisions, the “keeping silent” mechanism is used to resolve the collision problems. The method to divide the sets is comprised of the random binary tree algorithms and the query binary tree algorithms. Myung et al. [11] proposed an adaptive tree-based anti-collision algorithm to efficiently identify the tags. Pan et al. [12] proposed a smart tree traversal mechanism based on the query tree, which conducts the tag identification with low delay. The ALOHA protocol is first used for random access in packet radio network. In order to improve the efficiency of tag identification in RFID systems, the slotted ALOHA protocol [13, 14] is proposed to effectively resolve the collisions. The slotted ALOHA protocol combines a certain number of time slots into a “frame”. During the interrogation, the reader sends a continuous wave to energize all tags in the effective scanning TABLE II COMPARATIVE STUDY OF THE TWO KINDS OF ANTI-COLLISION ALGORITHMS Binary tree-based anticollision algorithms Slotted ALOHA-based anti-collision algorithms Advantages Normally deterministic algorithm is used. The query tree-based algorithms do not require to store the intermediate state variable. The randomized algorithm is used with a good performance on average. The randomization makes the results over slots meet a certain probability distribution, facilitating various statistical analysis. Disadvantages For the query tree-based algorithms, the delay for tag identification is subject to the length and distribution of tag IDs. The randomness of the algorithm makes the “starvation problem” possible, some tags can never select a singleton slot to respond. In the worst case, the delay approaches +∞. range. At the beginning of each frame, the reader broadcasts the frame size f to the surrounding tags, i.e., the number of slots in the following frame. After each tag receives the frame size f, it randomly selects a slot from the 1st slot to the fth slot in this frame and responds to the reader. If the tag successfully responds, i.e., no collision happens in the selected slot, then the specified tag is kept silent in the following query rounds; otherwise, the tag will continue to select another slot in the next frame to resend the ID. Therefore, there exists one of the following situations for each slot within the frame: 1) empty slot: no tag responds in this slot; 2) singleton slot: only one unique tag responds in this slot; 3) collision slot: multiple tags respond in this slot. Each kind of slot gives different information. Currently, the slotted ALOHA protocol has been adopted in the EPC C1G2 standard. In Table II, we respectively compare the two kinds of anticollision algorithms in terms of advantages and disadvantages. Generally speaking, there are pros and cons to both kinds of algorithms. In regard to the slotted ALOHA protocol, the frame size for each query round is of crucial importance to the overall performance for identification. For a given tag set, if the frame size is too large, then most of the slots in this frame will be empty slots, causing the waste of slots; if the frame size This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. IEEE COMMUNICATIONS SURVEYS TUTORIALS.ACCEPTED FOR PUBLICATION 0.4 0.8 0.6 0.2 0.4 0.1 0.2 100 200 300 400 20 40 60 80 100 Frame Size f Frame size f (a) (b) Fig.2.(a)Given the tag size n100.the relationship between the read efficiencyand the frame size f.(b)As the tag size n varies,the maximum read efficiency obtained when f*=n. is too small,then there will exist transmission collisions for efficiency in a query round approaches to I when n>5,if most of the slots,causing most tags to retransmit in the next the optimal frame size f*=n is used.If this strategy is used query round.Hence,the frame size should be dynamically for each query round,then the overall efficiency is close to adjusted according to the number of tags in the current Since I is the upper bound of the overall efficiency,then the query round.Therefore,many researchers have investigated global optimal efficiency is also achieved with this strategy. into this problem.Schoute et al.analyze the impact of the The above anti-collision algorithms are conventionally de- dynamic frame size selection on the reading performance in vised towards fairly idealized settings,without sufficiently ad- slotted ALOHA protocol [15].Floerkemeier et al.propose the dressing the difficulties in real applications,e.g.,the frequent strategy of optimizing the dynamic frame size,according to the movement of tags.the signal interference among multiple Bayesian probability model [16].Vogt et al.leverage Markov RFID readers,the signal attenuation and multi-path effect, process to model the identification process,and calculate a etc.Therefore,a few research work start to focus on and series of the optimized frame sizes during the interrogation try to solve the above problems.Due to the limited scanning process[17刀]. range of a single RFID reader.multiple readers are con- Lee et al.further derive the optimal dynamic frame sizes ventionally deployed in the application scenarios.Since the to maximize the channel efficiency [18.Their research con- former research work mainly focus on resolving the collisions clusions indicate that,if the frame size used in each query among multiple tag transmissions,without considering the round is equivalent to the number of tags to be identified, signal interference problem among multiple readers,literature then the maximum efficiency can be achieved for the channel.[19,20]propose optimized activating and scheduling schemes The detail principle is described as follows:assume the current for multiple readers.In this way,multiple readers are able number of tags within the effective scanning range is n,the to collaboratively avoid the signal transmission collisions. frame size is f.As the number of tags in each slot conforms Furthermore,by utilizing the mobile RFID reader,Sheng et al. to the Binomial distribution,then,the expected number of sin-develop efficient schemes for continuous scanning operations gleton slots in this frame is E[m]=nx(1-1/f)"-1.In order defined in both spatial and temporal domains [21].Their to maximize the channel efficiency i.e.,the ratio of the basic idea is to fully utilize the information gathered in the number of singleton slots to the frame size,we get the value previous scanning operations to reduce the scanning time of of f through the extremum of':aE业=0→f*=n, the succeeding ones.The former research work mainly devise then the maximum channel efficiency is n/f*1/e.In optimized schemes to identify statically deployed tags in fairly Fig.2(a)and Fig.2(b),we provide more detail descriptions for idealized settings,without considering the impact of some the above properties.Fig.2(a)depicts the impact of the frame ubiquitous issues like path loss in the mobile environment.In size f on the channel efficiency,when the tag size n =100. regard to this problem,Xie et al.propose a probabilistic model Note that as the frame size gradually increases from 1 to 400, for RFID tag identification [22].according to the continuous the read efficiency gradually increases to the maximum value changing properties of signal attenuation.Based on this model, and then decreases.The maximum efficiency is achieved they devise optimal parameters for the tag identification, when f=100.Fig.2(b)depicts the impact of the tag size n which conforms to the EPC Gen2 standard.Moreover,in on the channel efficiency,when the optimal frame size f*=n order to execute the continuous scanning with mobile reader is used.Note that when the value of n is small,the optimal for tag identification in realistic settings,Xie et al.conduct efficiency is larger than 1,e.g.,when the tag size n =1,comprehensive experimental study on mobile RFID reading, the optimal efficiency is 100%if the frame size is set to 1.and design very efficient algorithms to maximize the time- As the value of n gradually increases,the optimal efficiency efficiency and energy-efficiency by skillfully adjusting the quickly converges to It implies that,the local optimal read readers power and moving speed [23].In order to identify

4 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION 0 100 200 300 400 0 0.1 0.2 0.3 0.4 Frame Size f Read Efficiency n1/f (a) 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 The maximum of read efficiency n1/f Frame size f (b) Fig. 2. (a) Given the tag size n = 100, the relationship between the read efficiency n1 f and the frame size f. (b) As the tag size n varies, the maximum read efficiency obtained when f∗ = n. is too small, then there will exist transmission collisions for most of the slots, causing most tags to retransmit in the next query round. Hence, the frame size should be dynamically adjusted according to the number of tags in the current query round. Therefore, many researchers have investigated into this problem. Schoute et al. analyze the impact of the dynamic frame size selection on the reading performance in slotted ALOHA protocol [15]. Floerkemeier et al. propose the strategy of optimizing the dynamic frame size, according to the Bayesian probability model [16]. Vogt et al. leverage Markov process to model the identification process, and calculate a series of the optimized frame sizes during the interrogation process [17]. Lee et al. further derive the optimal dynamic frame sizes to maximize the channel efficiency [18]. Their research conclusions indicate that, if the frame size used in each query round is equivalent to the number of tags to be identified, then the maximum efficiency can be achieved for the channel. The detail principle is described as follows: assume the current number of tags within the effective scanning range is n, the frame size is f. As the number of tags in each slot conforms to the Binomial distribution, then, the expected number of singleton slots in this frame is E[n1] = n×(1−1/f)n−1. In order to maximize the channel efficiency n1 f , i.e., the ratio of the number of singleton slots to the frame size, we get the value of f through the extremum of n1 f : ∂E[n1]/f ∂f = 0 → f ∗ = n, then the maximum channel efficiency is n1/f ∗ → 1/e. In Fig.2(a) and Fig.2(b), we provide more detail descriptions for the above properties. Fig.2(a) depicts the impact of the frame size f on the channel efficiency, when the tag size n = 100. Note that as the frame size gradually increases from 1 to 400, the read efficiency gradually increases to the maximum value 1 e and then decreases. The maximum efficiency is achieved when f = 100. Fig.2(b) depicts the impact of the tag size n on the channel efficiency, when the optimal frame size f ∗ = n is used. Note that when the value of n is small, the optimal efficiency is larger than 1 e , e.g., when the tag size n = 1, the optimal efficiency is 100% if the frame size is set to 1. As the value of n gradually increases, the optimal efficiency quickly converges to 1 e . It implies that, the local optimal read efficiency in a query round approaches to 1 e when n > 5, if the optimal frame size f ∗ = n is used. If this strategy is used for each query round, then the overall efficiency is close to 1 e . Since 1 e is the upper bound of the overall efficiency, then the global optimal efficiency is also achieved with this strategy. The above anti-collision algorithms are conventionally devised towards fairly idealized settings, without sufficiently addressing the difficulties in real applications, e.g., the frequent movement of tags, the signal interference among multiple RFID readers, the signal attenuation and multi-path effect, etc. Therefore, a few research work start to focus on and try to solve the above problems. Due to the limited scanning range of a single RFID reader, multiple readers are conventionally deployed in the application scenarios. Since the former research work mainly focus on resolving the collisions among multiple tag transmissions, without considering the signal interference problem among multiple readers, literature [19, 20] propose optimized activating and scheduling schemes for multiple readers. In this way, multiple readers are able to collaboratively avoid the signal transmission collisions. Furthermore, by utilizing the mobile RFID reader, Sheng et al. develop efficient schemes for continuous scanning operations defined in both spatial and temporal domains [21]. Their basic idea is to fully utilize the information gathered in the previous scanning operations to reduce the scanning time of the succeeding ones. The former research work mainly devise optimized schemes to identify statically deployed tags in fairly idealized settings, without considering the impact of some ubiquitous issues like path loss in the mobile environment. In regard to this problem, Xie et al. propose a probabilistic model for RFID tag identification [22], according to the continuous changing properties of signal attenuation. Based on this model, they devise optimal parameters for the tag identification, which conforms to the EPC Gen2 standard. Moreover, in order to execute the continuous scanning with mobile reader for tag identification in realistic settings, Xie et al. conduct comprehensive experimental study on mobile RFID reading, and design very efficient algorithms to maximize the timeefficiency and energy-efficiency by skillfully adjusting the readers power and moving speed [23]. In order to identify This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. XIE er al:MANAGING RFID DATA:CHALLENGES.OPPORTUNITIES AND SOLUTIONS TABLE III COMPARISON OF ALGORITHMS FOR TAG SIZE ESTIMATION Empty slots Estimation Compatibility Indicator for Estima- Probability Model 0.8 Singleton slots Algorithm with Slotted tion -Collision slots ALOHA 25 Compatible The number of Binomial 0.6 empty and collision distribution slots 0.4 26 Compatible The number of Posterior probability empty,singleton and model based on bi- collision slots nomial distribution 0.2 127 Compatible The first slot with tag Binomial response distribution T2829] Not The fringe of colli- Geometric distribu- 0 Compatible sion slots tion 2 6 10 [30 Compatible The average run- Binomial The ratio of tag size to frame size n/f length of ones in the distribution bit string received Fig.3.The normalized expected number of empty/singleton/collision slots as the ratio of子varies Binomial distribution,the authors present unified estimation algorithms according to the empty slots and collision slots, the RFID tags towards the specified area,Yin et al.propose a and they further reduce the estimation errors through repeated "focus and shoot"method for efficient tag identification,based sampling on extensive empirical study on RFID systems [24]. Although the number of singleton slots is not monotonic to the overall tag size,the tag size cannot be estimated purely B.Estimating the Tag Size from the number of singleton slots,nevertheless,the number With the further extension of RFID applications,a number of three kinds of slots together can help to estimate the tag of applications only require some statistical information for size in a more accurate approach.Hence,Chen et al.present a data analysis and mining.In this situation,the RFID system posterior probability model based on the observed number of only requires to quickly estimate the statistics,instead of empty/singleton/collision slots [26].They propose an accurate identifying the tags one by one.One of the most important tag estimate method by maximizing the posterior probability. statistics is the tag size,i.e.,the overall number of tags. The above mechanisms all require a large number of sam- Moreover,the slotted ALOHA protocol based on dynamic plings over the three kinds of slots to improve the estimation frame sizes also requires to estimate the current tag size for accuracy.In fact,some other properties can be leveraged to determining the frame size.Therefore,recently there are many accurately estimate the tag size in a faster approach.Han research work which focus on how to fast and accurately et al.propose an efficient and anonymous scheme for tag estimate the overall tag size.The core idea is as follows:since size estimation [27],which leverages the position of the first the slotted ALOHA protocol requires the tags to randomly reply from a group of tags in a frame.Moreover,in order select the slots to respond inside the frame,then it is possible to tackle with the multiple reading problem,i.e.,one single to leverage the statistics regularities in the randomized algo- tag can be interrogated multiple times with multiple readers, rithm to estimate the tag size.Specifically,according to the Chen et al.present a replicate-insensitive estimation scheme current protocol in EPC Gen 2 standard,the number of empty [28],which eliminates the multiple reading in multi-reader slots,singleton slots and collision slots actually conform to scenarios.The authors further propose an adaptively splitting- the Binomial distribution in a statistical approach.When the based arbitration protocol [29].This protocol requires a single number of samplings for slots is large enough,the number of tag to select multiple slots in a frame to respond,according tags can be effectively estimated according to the Binomial to the Geometric distribution.Specifically,each tag have a distribution. probability of to respond in the t-1th slot.Then,the Based on the above idea,Kodialam et al.propose a fast tag size can be effectively estimated according to the fringe and reliable estimation mechanisms for tag size in a practical of collision slots in the frame.Shahzad et al.propose a new approach [25].The major idea is as follows:assume in a scheme for estimating tag population size called average run certain query round with fixed frame size f,as the overall based tag estimation [30].The technique is based on the number of tags n increases,the number of empty slots is average run-length of ones in the bit string received using the expected to decrease,the number of collision slots is expected standardized framed slotted ALOHA protocol.It is easy to to increase,while the number of singleton slots is expected deploy because it neither requires modification to tags nor to to first increase and then decrease.Therefore,the number the communication protocol between tags and readers.Table of empty slots (collision slots)is monotonically decreasing III provides a comparison of the above algorithms for tag size (increasing)with the overall number of tags.Fig.3 depicts estimation. the normalized expected number of empty/singleton/collision Instead of simply estimating the tag size,a number of slots as the ratio of 4 varies,here the number of slots is researchers start to investigate how to realize more compli- normalized to a real number in the range 0 through I by cated data analysis and mining over RFID systems.Sheng et dividing the frame size f.Based on the probabilistic model of al.consider the problem of identifying popular categories of

XIE et al.: MANAGING RFID DATA: CHALLENGES, OPPORTUNITIES AND SOLUTIONS 5 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 The ratio of tag size to frame size n/f Normalized expected number of slots Empty slots Singleton slots Collision slots Fig. 3. The normalized expected number of empty/singleton/collision slots as the ratio of n f varies the RFID tags towards the specified area, Yin et al. propose a “focus and shoot” method for efficient tag identification, based on extensive empirical study on RFID systems [24]. B. Estimating the Tag Size With the further extension of RFID applications, a number of applications only require some statistical information for data analysis and mining. In this situation, the RFID system only requires to quickly estimate the statistics, instead of identifying the tags one by one. One of the most important statistics is the tag size, i.e., the overall number of tags. Moreover, the slotted ALOHA protocol based on dynamic frame sizes also requires to estimate the current tag size for determining the frame size. Therefore, recently there are many research work which focus on how to fast and accurately estimate the overall tag size. The core idea is as follows: since the slotted ALOHA protocol requires the tags to randomly select the slots to respond inside the frame, then it is possible to leverage the statistics regularities in the randomized algorithm to estimate the tag size. Specifically, according to the current protocol in EPC Gen 2 standard, the number of empty slots, singleton slots and collision slots actually conform to the Binomial distribution in a statistical approach. When the number of samplings for slots is large enough, the number of tags can be effectively estimated according to the Binomial distribution. Based on the above idea, Kodialam et al. propose a fast and reliable estimation mechanisms for tag size in a practical approach [25]. The major idea is as follows: assume in a certain query round with fixed frame size f, as the overall number of tags n increases, the number of empty slots is expected to decrease, the number of collision slots is expected to increase, while the number of singleton slots is expected to first increase and then decrease. Therefore, the number of empty slots (collision slots) is monotonically decreasing (increasing) with the overall number of tags. Fig.3 depicts the normalized expected number of empty/singleton/collision slots as the ratio of n f varies, here the number of slots is normalized to a real number in the range 0 through 1 by dividing the frame size f. Based on the probabilistic model of TABLE III COMPARISON OF ALGORITHMS FOR TAG S IZE ESTIMATION Estimation Algorithm Compatibility with Slotted ALOHA Indicator for Estimation Probability Model [25] Compatible The number of empty and collision slots Binomial distribution [26] Compatible The number of empty, singleton and collision slots Posterior probability model based on binomial distribution [27] Compatible The first slot with tag response Binomial distribution [28][29] Not Compatible The fringe of collision slots Geometric distribution [30] Compatible The average runlength of ones in the bit string received Binomial distribution Binomial distribution, the authors present unified estimation algorithms according to the empty slots and collision slots, and they further reduce the estimation errors through repeated sampling. Although the number of singleton slots is not monotonic to the overall tag size, the tag size cannot be estimated purely from the number of singleton slots, nevertheless, the number of three kinds of slots together can help to estimate the tag size in a more accurate approach. Hence, Chen et al. present a posterior probability model based on the observed number of empty/singleton/collision slots [26]. They propose an accurate tag estimate method by maximizing the posterior probability. The above mechanisms all require a large number of samplings over the three kinds of slots to improve the estimation accuracy. In fact, some other properties can be leveraged to accurately estimate the tag size in a faster approach. Han et al. propose an efficient and anonymous scheme for tag size estimation [27], which leverages the position of the first reply from a group of tags in a frame. Moreover, in order to tackle with the multiple reading problem, i.e., one single tag can be interrogated multiple times with multiple readers, Chen et al. present a replicate-insensitive estimation scheme [28], which eliminates the multiple reading in multi-reader scenarios. The authors further propose an adaptively splittingbased arbitration protocol [29]. This protocol requires a single tag to select multiple slots in a frame to respond, according to the Geometric distribution. Specifically, each tag have a probability of 1 2t to respond in the t − 1th slot. Then, the tag size can be effectively estimated according to the fringe of collision slots in the frame. Shahzad et al. propose a new scheme for estimating tag population size called average run based tag estimation [30]. The technique is based on the average run-length of ones in the bit string received using the standardized framed slotted ALOHA protocol. It is easy to deploy because it neither requires modification to tags nor to the communication protocol between tags and readers. Table III provides a comparison of the above algorithms for tag size estimation. Instead of simply estimating the tag size, a number of researchers start to investigate how to realize more complicated data analysis and mining over RFID systems. Sheng et al. consider the problem of identifying popular categories of This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. 6 IEEE COMMUNICATIONS SURVEYS TUTORIALS.ACCEPTED FOR PUBLICATION RFID tags out of a large collection of tags,without reading to be singleton/collision slot but turns out to be empty,then all the tag data [31].They propose effective algorithms based it implies that the tag(s)mapping to this slot is(are)missing on the idea of group testing.which can efficiently derive Fig.4 gives an example of the polling mechanism.As the figure popular categories of tags.Kodialam et al.propose a privacy- shows,the dotted box denotes the missing tag,while the solid preserving estimation mechanism over dynamically moving box denotes the existing tag.Assume that the tag E,F and H tags in applications like object tracking [32].This mechanism are missing,if the corresponding slots turn out to be empty, can fast count the number of tags moving from location A then the tags can be determined missing.However,if the tag C to B during the time interval(t1,t2).In this way,the traffic is missing,the corresponding slot still turns out to be collision can be effectively tracked without exactly reading the tag IDs. slot due to the existence of the tag B and D.In this situation, In general,research work in this area are relatively rare.With the tag C cannot be identified as missing,resulting in the the further extension of RFID applications,efficient techniques false positive error.The above polling mechanism provides a for data analysis and mining over RFID systems will attract theoretical foundation to recent research work. widespread attentions and in-depth studies Based on the above understanding,Li et al.study a practically important problem of monitoring a large set of C.Polling the Tags RFID tags,i.e.,efficiently identifying the missing tags in a large RFID system [33].Based on the"pseudo randomness", Instead of obtaining all tags'IDs in the effective scanning they design a series of missing-tag identification protocols range,some applications only require to poll the tags in a that employ novel techniques to reduce the execution time. specified tag set,verifying whether these tags are missing. In order to deal with the problem of collecting real-time For example,in regard to the applications like warehouse information from a set of battery-powered active tags,Qiao management,assume that each of the goods is attached with et al.propose a tag-ordering polling protocol [34]that can an RFID tag with unique ID,the administrator should check reduce per-tag energy consumption by more than an order the specified inventories according to the list of goods.In this of magnitude.Moreover,they apply partitioned Bloom filters situation.it becomes an essential problem to devise efficient to further enhance the performance.such that it can achieve polling protocols over the RFID tags,such that both the much better energy efficiency without degradation in protocol time efficiency and energy efficiency can be achieved.A execution time.Chen et al.respectively propose a single- straightforward solution is to realize the following"roll call" hash and a multi-hash based information collection protocol to mechanism:the reader continuously broadcasts the tags'IDs address the above problem [35],which dramatically reduces according to the check list.After each tag ID is broadcasted. the expected execution time.In order to meet the requirement the reader will wait for the specified tag to return a short of prompt and reliable batch authentications in large scale response.If a short response is obtained,then it implies that RFID applications,Yang et al.propose an identification-free this tag exists in the scanning range;otherwise,this tag is batch authentication protocol based on the polling mechanism expected to be missing.However,there are a number of [36].Conventionally,in order to authenticate a large number disadvantages for the above mechanism:1)it is not compatible of tags,the reader should identify and authenticate the tags with the slotted ALOHA:2)the transmission delay of the one by one,which requires a large amount of scanning tag ID with 96 bits is fairly large,the efficiency is greatly time.Based on the "pseudo randomness",this batch authen- reduced.Therefore,many researchers have investigated into tication protocol greatly reduces the overall scanning time, this problem,and tried to design efficient polling protocols while guaranteeing the probability of false positive error to based on the slotted ALOHA,aiming to sufficiently reduce be below the specified threshold.Being different from the the overall scanning time. above research work,which mainly poll over all tags in Note that during each query round of the interrogation,each the scanning area,Zheng et al.address the problem of fast of the activated tags will "randomly"select a slot to respond searching a particular subset in a large number of RFID tags inside the frame.However,due to the simplicity of the tag's [37].They utilize Bloom filter-based compact approximators design,it is rather difficult to realize the "pure randomness". to efficiently aggregate and exchange the tag information Instead,the "pseudo randomness"is used as follows:For each with a two-phase approximation protocol,which significantly query round the reader first broadcasts a random number r to reduces the searching time.Generally speaking,these polling the tags.After that,each of the activated tags will compute mechanisms mainly leverage the pseudo randomness in the a pseudo random number s as the selected slot number,here slotted ALOHA protocol to effectively reduce the additional s=hash(ID,r)mod f.ID and f respectively denote the tag cost of uniquely identifying the tags.Nevertheless,since there ID and frame size.The above"pseudo randomness"implies exist the false positive errors for the polling mechanisms. that,in regard to a specified tag,once the random number r optimized algorithms are proposed to reduce the probability and the frame size f is set,the corresponding slot selected by of false positive errors. this tag inside the frame is determined.This property makes it possible for efficient polling over the tags.Since the tag IDs in the specified tag set can be known in advance,then D.Summing Up Challenges and Opportunities before interrogating the tags,the reader can precompute the According to the above recent research progress,we sum- expected state for each slot accordingly.The expected state marize the challenges and opportunities for anti-collision for each slot can be empty slot,singleton slot or collision slot. algorithms in Table IV,respectively for the tag identification, Therefore,in regard to a specified slot,if this slot is expected estimating the tag size,as well as polling the tags

6 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION RFID tags out of a large collection of tags, without reading all the tag data [31]. They propose effective algorithms based on the idea of group testing, which can efficiently derive popular categories of tags. Kodialam et al. propose a privacypreserving estimation mechanism over dynamically moving tags in applications like object tracking [32]. This mechanism can fast count the number of tags moving from location A to B during the time interval (t1, t2). In this way, the traffic can be effectively tracked without exactly reading the tag IDs. In general, research work in this area are relatively rare. With the further extension of RFID applications, efficient techniques for data analysis and mining over RFID systems will attract widespread attentions and in-depth studies. C. Polling the Tags Instead of obtaining all tags’ IDs in the effective scanning range, some applications only require to poll the tags in a specified tag set, verifying whether these tags are missing. For example, in regard to the applications like warehouse management, assume that each of the goods is attached with an RFID tag with unique ID, the administrator should check the specified inventories according to the list of goods. In this situation, it becomes an essential problem to devise efficient polling protocols over the RFID tags, such that both the time efficiency and energy efficiency can be achieved. A straightforward solution is to realize the following “roll call” mechanism: the reader continuously broadcasts the tags’ IDs according to the check list. After each tag ID is broadcasted, the reader will wait for the specified tag to return a short response. If a short response is obtained, then it implies that this tag exists in the scanning range; otherwise, this tag is expected to be missing. However, there are a number of disadvantages for the above mechanism: 1) it is not compatible with the slotted ALOHA; 2) the transmission delay of the tag ID with 96 bits is fairly large, the efficiency is greatly reduced. Therefore, many researchers have investigated into this problem, and tried to design efficient polling protocols based on the slotted ALOHA, aiming to sufficiently reduce the overall scanning time. Note that during each query round of the interrogation, each of the activated tags will “randomly” select a slot to respond inside the frame. However, due to the simplicity of the tag’s design, it is rather difficult to realize the “pure randomness”. Instead, the “pseudo randomness” is used as follows: For each query round the reader first broadcasts a random number r to the tags. After that, each of the activated tags will compute a pseudo random number s as the selected slot number, here s = hash(ID, r)mod f, ID and f respectively denote the tag ID and frame size. The above “pseudo randomness” implies that, in regard to a specified tag, once the random number r and the frame size f is set, the corresponding slot selected by this tag inside the frame is determined. This property makes it possible for efficient polling over the tags. Since the tag IDs in the specified tag set can be known in advance, then before interrogating the tags, the reader can precompute the expected state for each slot accordingly. The expected state for each slot can be empty slot, singleton slot or collision slot. Therefore, in regard to a specified slot, if this slot is expected to be singleton/collision slot but turns out to be empty, then it implies that the tag(s) mapping to this slot is(are) missing. Fig.4 gives an example of the polling mechanism. As the figure shows, the dotted box denotes the missing tag, while the solid box denotes the existing tag. Assume that the tag E,F and H are missing, if the corresponding slots turn out to be empty, then the tags can be determined missing. However, if the tag C is missing, the corresponding slot still turns out to be collision slot due to the existence of the tag B and D. In this situation, the tag C cannot be identified as missing, resulting in the false positive error. The above polling mechanism provides a theoretical foundation to recent research work. Based on the above understanding, Li et al. study a practically important problem of monitoring a large set of RFID tags, i.e., efficiently identifying the missing tags in a large RFID system [33]. Based on the “pseudo randomness”, they design a series of missing-tag identification protocols that employ novel techniques to reduce the execution time. In order to deal with the problem of collecting real-time information from a set of battery-powered active tags, Qiao et al. propose a tag-ordering polling protocol [34] that can reduce per-tag energy consumption by more than an order of magnitude. Moreover, they apply partitioned Bloom filters to further enhance the performance, such that it can achieve much better energy efficiency without degradation in protocol execution time. Chen et al. respectively propose a singlehash and a multi-hash based information collection protocol to address the above problem [35], which dramatically reduces the expected execution time. In order to meet the requirement of prompt and reliable batch authentications in large scale RFID applications, Yang et al. propose an identification-free batch authentication protocol based on the polling mechanism [36]. Conventionally, in order to authenticate a large number of tags, the reader should identify and authenticate the tags one by one, which requires a large amount of scanning time. Based on the “pseudo randomness”, this batch authentication protocol greatly reduces the overall scanning time, while guaranteeing the probability of false positive error to be below the specified threshold. Being different from the above research work, which mainly poll over all tags in the scanning area, Zheng et al. address the problem of fast searching a particular subset in a large number of RFID tags [37]. They utilize Bloom filter-based compact approximators to efficiently aggregate and exchange the tag information with a two-phase approximation protocol, which significantly reduces the searching time. Generally speaking, these polling mechanisms mainly leverage the pseudo randomness in the slotted ALOHA protocol to effectively reduce the additional cost of uniquely identifying the tags. Nevertheless, since there exist the false positive errors for the polling mechanisms, optimized algorithms are proposed to reduce the probability of false positive errors. D. Summing Up Challenges and Opportunities According to the above recent research progress, we summarize the challenges and opportunities for anti-collision algorithms in Table IV, respectively for the tag identification, estimating the tag size, as well as polling the tags. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. XIE er al:MANAGING RFID DATA:CHALLENGES.OPPORTUNITIES AND SOLUTIONS Tag A 1Ta熙H Expected State Observed ) State Polling Cannot decide if tag Tag E,F are Tag His C is missing,false Results missing positive error. missing. Fig.4.An example of the polling mechanism in slotted ALOHA-based protocol TABLE IV bits,while the user memory can only store 512 bits.Hence, CHALLENGES AND OPPORTUNITIES FOR ANTI-COLLISION ALGORITHMS the tag's scarce resource cannot support the implementation Mechanism Challenges Opportunities of the above encryption and decryption operations.Therefore, Tag Iden- Ensuring time-efficiency Dynamically adjusting the greatest challenge to the security and privacy protection tification in realistic situations. the parameters like in RFID systems lies in how to implement the authentication the reader's power,the scanning angle,and the and privacy protection protocols in a lightweight approach.In frame size can effectively the following subsections,we elaborate more on the related improve the actual research work respectively in physical mechanism-based so- performance. Estimating Ensuring both the The regular probability lutions,symmetric-key encryption-based solutions,and hash the Tag accuracy and the time- distribution in Monte function-based solutions. Size efficiency for estimation. Carlo method contributes to accurate estimation. Polling Ensuring time-efficiency Pseudo randomness can the Tags while reducing false be leveraged for effective A.Physical Mechanism-based Solutions positive/negative errors. polling. The privacy problem of RFID system is caused by the lack of authentication when the RFID reader interrogates the tags.Without the privacy protection,any RFID reader can IV.AUTHENTICATION AND PRIVACY PROTECTION privately interrogate the surrounding tags to obtain their ID. PROTOCOLS In regard to those encrypted tags which cannot be directly The premise of efficient RFID data management is to identified,they can still be tracked by the illegal readers guarantee the security and privacy of RFID data.The threats according to the backscattered encrypted messages.In order to of RFID systems mainly come from the unauthorized access protect the users'privacy in RFID systems,a straightforward to the tags and the existence of counterfeit tags.Specifically. approach is to utilize the physical mechanism-based solutions. the security problem is how to effectively authenticate the tags which mainly include tag killing,electrostatic screening,active when there exist counterfeit tags;the privacy problem is how jamming and blocking. to prevent the illegal access from the readers to preserve the The tag killing is actually a brute-force operation.In order users'privacy. to prevent a tag from illegal eavesdropping,the reader simply For conventional solutions in network security,there are deactivates a tag by sending a"kill"command with a tag- already some encryption and decryption algorithms like DES, specific PIN.When a tag receives a "kill"command from AES,RSA and ECC.They can implement functions like the reader,it renders itself permanently inoperative.In this encryption and authentication,such that the illegal access,respect,the killing operation leads to loss of primary functions spoofing,eavesdropping and replay attack can be effec- for RFID tags,while preventing arbitrary interrogation and tively resisted.However,they require a large number of tracking from illegal users,since "dead tags tell no tales" logic processing units on the chip,e.g..AES requires about Apparently,it is not a very reasonable solution to completely 20000~30000 logic gates,whereas RSA and ECC require deactivate a tag.Sarma et al.propose a solution to partially more logic gates to implement their functions. erase the unique part of identification code while keeping the Due to the low cost limitation of RFID tags,conventionally other part including the category ID [38].In this way,the one RFID tag can only have 5000~10000 logic gates.More-tag can be prevented from being tracked,while sacrificing the over,these logic gates are mainly used to implement the basic unique identification.Inoue et al.propose to use a new identifi- functions of the tag,leaving very few gates for the security cation code to replace the former code for unique identification functions.Besides,the tag's on-board memory is also rather [39].The former identification code can be reactivated when limited,conventionally the EPC memory can only store 96 the tag is recycled.In consideration of reusing the tags,the

XIE et al.: MANAGING RFID DATA: CHALLENGES, OPPORTUNITIES AND SOLUTIONS 7 0 0 1 0 C 0 0 C 1 1 0 0 Tag A Tag B Tag C Tag D Tag E Tag F Tag G Tag H Ă 0 0 1 0 C 0 0 Ă 0 1 0 0 0 Expected State Observed State Polling Results Tag E,F are missing Tag H is missing. Cannot decide if tag C is missing, false positive error. Fig. 4. An example of the polling mechanism in slotted ALOHA-based protocol TABLE IV CHALLENGES AND OPPORTUNITIES FOR ANTI-COLLISION ALGORITHMS Mechanism Challenges Opportunities Tag Identification Ensuring time-efficiency in realistic situations. Dynamically adjusting the parameters like the reader’s power, the scanning angle, and the frame size can effectively improve the actual performance. Estimating the Tag Size Ensuring both the accuracy and the timeefficiency for estimation. The regular probability distribution in Monte Carlo method contributes to accurate estimation. Polling the Tags Ensuring time-efficiency while reducing falsepositive/negative errors. Pseudo randomness can be leveraged for effective polling. IV. AUTHENTICATION AND PRIVACY PROTECTION PROTOCOLS The premise of efficient RFID data management is to guarantee the security and privacy of RFID data. The threats of RFID systems mainly come from the unauthorized access to the tags and the existence of counterfeit tags. Specifically, the security problem is how to effectively authenticate the tags when there exist counterfeit tags; the privacy problem is how to prevent the illegal access from the readers to preserve the users’ privacy. For conventional solutions in network security, there are already some encryption and decryption algorithms like DES, AES, RSA and ECC. They can implement functions like encryption and authentication, such that the illegal access, spoofing, eavesdropping and replay attack can be effectively resisted. However, they require a large number of logic processing units on the chip, e.g., AES requires about 20000∼30000 logic gates, whereas RSA and ECC require more logic gates to implement their functions. Due to the low cost limitation of RFID tags, conventionally one RFID tag can only have 5000∼10000 logic gates. Moreover, these logic gates are mainly used to implement the basic functions of the tag, leaving very few gates for the security functions. Besides, the tag’s on-board memory is also rather limited, conventionally the EPC memory can only store 96 bits, while the user memory can only store 512 bits. Hence, the tag’s scarce resource cannot support the implementation of the above encryption and decryption operations. Therefore, the greatest challenge to the security and privacy protection in RFID systems lies in how to implement the authentication and privacy protection protocols in a lightweight approach. In the following subsections, we elaborate more on the related research work respectively in physical mechanism-based solutions, symmetric-key encryption-based solutions, and hash function-based solutions. A. Physical Mechanism-based Solutions The privacy problem of RFID system is caused by the lack of authentication when the RFID reader interrogates the tags. Without the privacy protection, any RFID reader can privately interrogate the surrounding tags to obtain their ID. In regard to those encrypted tags which cannot be directly identified, they can still be tracked by the illegal readers according to the backscattered encrypted messages. In order to protect the users’ privacy in RFID systems, a straightforward approach is to utilize the physical mechanism-based solutions, which mainly include tag killing, electrostatic screening, active jamming and blocking. The tag killing is actually a brute-force operation. In order to prevent a tag from illegal eavesdropping, the reader simply deactivates a tag by sending a “kill” command with a tagspecific PIN. When a tag receives a “kill” command from the reader, it renders itself permanently inoperative. In this respect, the killing operation leads to loss of primary functions for RFID tags, while preventing arbitrary interrogation and tracking from illegal users, since “dead tags tell no tales”. Apparently, it is not a very reasonable solution to completely deactivate a tag. Sarma et al. propose a solution to partially erase the unique part of identification code while keeping the other part including the category ID [38]. In this way, the tag can be prevented from being tracked, while sacrificing the unique identification. Inoue et al. propose to use a new identifi- cation code to replace the former code for unique identification [39]. The former identification code can be reactivated when the tag is recycled. In consideration of reusing the tags, the This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. IEEE COMMUNICATIONS SURVEYS TUTORIALS.ACCEPTED FOR PUBLICATION Query- B.Symmetric-key Encryption-based Solutions 1)Symmetric-key Encryption:The public key encryption RFID Reader RFID (legal &ilegal) Tag is a powerful technique which can effectively implement -Tag ID the encryption and electronic signature.However,due to its -Bit-collision masking- complex operations,the public key encryption cannot be implemented in the RFID system,since the resource in the Recover the bits RFID tag is too few to support it.Therefore,the RFID system Blocker Tag usually leverages the symmetric-key encryption to implement Legitimate or RFID Reader Privacy Enhancement Device the security and privacy protection,as it has a much simpler processing logic.Specifically,the symmetric-key encryption Fig.5.The framework for the jamming/blocking-based protocols can be used to authenticate the tags.Algorithm 1 depicts the protocol of authenticating the tags with the symmetric- key encryption [45].In this protocol,any tag shares a distinct symmetric-key with the RFID reader. "sleep"mechanism [39]can be used to render the tags only temporarily inactive.The tag can be appropriately"waked up" Algorithm 1 Authenticate the tags with the symmetric-key when it is needed. encryption The electrostatic screening is to place the tag into a con- 1:The tag transmits the identifier T to the reader to identify tainer which can physically screen the tags from interrogation. itself. This mechanism needs an additional device like Faraday cage 2:The reader generates a random bit string R and transmits T40]as a shield against the electromagnetic coupling.Active it to the tag jamming uses a device to actively broadcast the interfering 3:The tag encrypts the bit string R with its key ki,i.e.,C= signals to prevent the unauthorized read-write operations over EiR],and transmits C to the reader. tags.Lim et al.propose an active jamming mechanism relying 4:The reader locally computes C=Eki[R],and verifies if on masking of the identifier at the PHY layer [41].In their C=C.If yes,then the tag is authenticated. cross-layer framework,the bit-collisions are induced between the backscattered tag identifier and a protective mask,such Although the above protocol can authenticate the tags,it that a legitimate reader can be allowed to recover the tag cannot effectively protect the privacy of the tag.As in step identifier but an illegitimate party would not be able to do 1 the tag needs to transmit the identifier Ti to the reader, so.The blocking method uses a special blocking tag to all adjacent RFID readers can overhear this message,which prevent unwanted scanning of tags,by exploiting the anti- completely exposes the privacy of the tag.On the other hand, collision protocol.Juels et al.propose the above idea of if the tag does not transmit the identifier T to the reader, blocking tag based on the tree-based anti-collision protocol then the reader cannot quickly find the corresponding key ki [42].Specifically,a blocking tag impedes RFID scanning by to support the following operations.In order to protect the simulating collisions in the singulation tree.Fig.5 depicts the privacy during the authentication,in fact there exists a straight- framework of the above jamming or blocking-based protocols. forward solution with poor performance:In comparison to Generally speaking,a blocker tag or a privacy enhanced device Algorithm 1,the first step is omitted,the tag does not need is used to actively broadcast a collision-bit mask when the to transmit the identifier Ti to the reader,instead it directly legal or illegal readers are interrogating the tags.Then the transmits the ciphertext C=Ei[R]according to the received legitimate reader further communicates with this device to bit string R.After receiving the ciphertext C,the reader locally recover the ID from the collision bits.Therefore,this device enumerates all possible ki to compute C=Eki[R],if there works as a proxy for the privacy-preserving interrogation. exists some key ki to satisfy Ci=C,then the tag with the Moreover,recently some researchers are using the sig- corresponding key is the one being authenticated.Assume nal spectral feature in physical layer to implement secure that there are n tags in the search space,then the compute authentication in RFID systems.Kulseng et al.propose a complexity of the search operation is O(n).When the value lightweight solution to mutual authentication for RFID sys- of n is large,the time cost of the search operation is unac- tems in which only the authenticated readers and tags can ceptable.In order to address this problem,many researchers successfully communicate with each other [43].Their proto- start to focus on how to fast search the data according to the cols are realized utilizing minimalistic cryptography such as received ciphertext.Song et al.propose practical techniques Physically Unclonable Functions(PUF)and Linear Feedback for searches on encrypted data [46].Wang et al.propose Shift Registers(LFSR),which are very efficient in hardware private information retrieval techniques using trusted hardware and particularly suitable for the low-cost RFID tags.By [47.48].In order to fast search over the encrypted data,Chiu et utilizing the dynamic bit encoding in the physical layer,Sakai al.maintain a monotonically increasing counter in the tag,and et al.propose two novel RFID backward channel protection use it to conduct fast searching based on the binary splitting protocols for privacy protection against correlation attacks in method [49]. RFID backward channel [44].By leveraging the physical layer 2)Light-weight Solution:When the length of the key ki is features,these protocols are able to greatly reduce the compute large enough(more than 128 bits),it is rather difficult to derive complexities in privacy protection and authentication. the key ki within a short time simply according to the bit string

8 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION Fig. 5. The framework for the jamming/blocking-based protocols “sleep” mechanism [39] can be used to render the tags only temporarily inactive. The tag can be appropriately “waked up” when it is needed. The electrostatic screening is to place the tag into a container which can physically screen the tags from interrogation. This mechanism needs an additional device like Faraday cage [40] as a shield against the electromagnetic coupling. Active jamming uses a device to actively broadcast the interfering signals to prevent the unauthorized read-write operations over tags. Lim et al. propose an active jamming mechanism relying on masking of the identifier at the PHY layer [41]. In their cross-layer framework, the bit-collisions are induced between the backscattered tag identifier and a protective mask, such that a legitimate reader can be allowed to recover the tag identifier but an illegitimate party would not be able to do so. The blocking method uses a special blocking tag to prevent unwanted scanning of tags, by exploiting the anticollision protocol. Juels et al. propose the above idea of blocking tag based on the tree-based anti-collision protocol [42]. Specifically, a blocking tag impedes RFID scanning by simulating collisions in the singulation tree. Fig.5 depicts the framework of the above jamming or blocking-based protocols. Generally speaking, a blocker tag or a privacy enhanced device is used to actively broadcast a collision-bit mask when the legal or illegal readers are interrogating the tags. Then the legitimate reader further communicates with this device to recover the ID from the collision bits. Therefore, this device works as a proxy for the privacy-preserving interrogation. Moreover, recently some researchers are using the signal spectral feature in physical layer to implement secure authentication in RFID systems. Kulseng et al. propose a lightweight solution to mutual authentication for RFID systems in which only the authenticated readers and tags can successfully communicate with each other [43]. Their protocols are realized utilizing minimalistic cryptography such as Physically Unclonable Functions (PUF) and Linear Feedback Shift Registers (LFSR), which are very efficient in hardware and particularly suitable for the low-cost RFID tags. By utilizing the dynamic bit encoding in the physical layer, Sakai et al. propose two novel RFID backward channel protection protocols for privacy protection against correlation attacks in RFID backward channel [44]. By leveraging the physical layer features, these protocols are able to greatly reduce the compute complexities in privacy protection and authentication. B. Symmetric-key Encryption-based Solutions 1) Symmetric-key Encryption: The public key encryption is a powerful technique which can effectively implement the encryption and electronic signature. However, due to its complex operations, the public key encryption cannot be implemented in the RFID system, since the resource in the RFID tag is too few to support it. Therefore, the RFID system usually leverages the symmetric-key encryption to implement the security and privacy protection, as it has a much simpler processing logic. Specifically, the symmetric-key encryption can be used to authenticate the tags. Algorithm 1 depicts the protocol of authenticating the tags with the symmetrickey encryption [45]. In this protocol, any tag shares a distinct symmetric-key with the RFID reader. Algorithm 1 Authenticate the tags with the symmetric-key encryption 1: The tag transmits the identifier Ti to the reader to identify itself. 2: The reader generates a random bit string R and transmits it to the tag. 3: The tag encrypts the bit string R with its key ki, i.e.,C = Eki[R], and transmits C to the reader. 4: The reader locally computes C = Eki[R], and verifies if C = C. If yes, then the tag is authenticated. Although the above protocol can authenticate the tags, it cannot effectively protect the privacy of the tag. As in step 1 the tag needs to transmit the identifier Ti to the reader, all adjacent RFID readers can overhear this message, which completely exposes the privacy of the tag. On the other hand, if the tag does not transmit the identifier Ti to the reader, then the reader cannot quickly find the corresponding key ki to support the following operations. In order to protect the privacy during the authentication, in fact there exists a straightforward solution with poor performance: In comparison to Algorithm 1, the first step is omitted, the tag does not need to transmit the identifier Ti to the reader, instead it directly transmits the ciphertext C = Eki[R] according to the received bit string R. After receiving the ciphertext C, the reader locally enumerates all possible ki to compute C = Eki[R], if there exists some key ki to satisfy Ci = C, then the tag with the corresponding key is the one being authenticated. Assume that there are n tags in the search space, then the compute complexity of the search operation is O(n). When the value of n is large, the time cost of the search operation is unacceptable. In order to address this problem, many researchers start to focus on how to fast search the data according to the received ciphertext. Song et al. propose practical techniques for searches on encrypted data [46]. Wang et al. propose private information retrieval techniques using trusted hardware [47, 48]. In order to fast search over the encrypted data, Chiu et al. maintain a monotonically increasing counter in the tag, and use it to conduct fast searching based on the binary splitting method [49]. 2) Light-weight Solution: When the length of the key ki is large enough (more than 128 bits), it is rather difficult to derive the key ki within a short time simply according to the bit string This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. XIE er al:MANAGING RFID DATA:CHALLENGES.OPPORTUNITIES AND SOLUTIONS 9 R and the ciphertext C.In this situation.the symmetric-key which employs a novel sparse tree architecture,such that the encryption-based protocols have high security.However,due key of every tag is independent from one another. to the limitation of manufacturing cost,the off-the-shelf RFID The group-based approaches are another novel authentica- tags often have limited memory which is less than 512 bits, tion schemes which improves the tradeoff between scalability the number of logical gates is also very limited.In practical and privacy by dividing the tags into a number of tags.Hoque use of RFID systems,the length of bit string R.C and ki et al.propose a group-based anonymous private authentication is much less than the expected value to achieve the security protocol,which provides unlinkability and thereby preserves standard.For example,Texas Instruments Inc.has devised the privacy [58].The adversary cannot link the responses with encrypted RFID tags used for vehicle security alarm systems. the tags in the protocol,even if he/she can learn the identifier In consideration of the tag cost,the length of the bit string R that the tags are using to produce the response.Based on the and the key k;is only 40 bits,the length of tag response C group-based method,Gong et al.design a fine-grained batch is only 24 bits.In this situation,techniques like the reverse authentication scheme [59],which provides authentication engineering and password cracking can be used to crack the results with accurate estimates of the number of counterfeiting encryption systems.In order to address the above problem, tags and genuine tags. the researchers have proposed various kinds of lightweight While a tree-based approach achieves high performance in solutions to guarantee security.DESL [50]is a lightweight key authentication,it suffers from the issue of low privacy extension based on the traditional encryption protocol DES.It should a fraction of tags be compromised.On the contrary, is specially devised to accommodate the resource requirement while group-based key authentication is relatively invulnerable for those tiny computing devices like the RFID tags.HIGHT to compromise attacks,it is not scalable to the large number of [51]is a protocol based on block encryption algorithm,which tags.Therefore,recently several new techniques are proposed utilizes the 64-bit block and 128-bit key.The subkeys are for private authentication based on various structures.Sakai generated during the process of encryption and decryption,et al.propose a new private tag authentication protocol based which has a low requirement for the hardware resources. on skip lists [60].Without sacrificing the authentication per- Realizing that most lightweight protocols are not fully con- formance,their scheme provides a strong privacy preserving forming to the Gen2 standard,Sun et al.propose a novel mechanism.In order to achieve forward secrecy and resistance authentication protocol based on Gen2,called Gen2+,for low- to attacks,Yao et al.propose a lightweight RFID private cost RFID tags.Gen2+is a multiple round protocol using authentication protocol based on the random walk concept shared pseudonyms and Cyclic Redundancy Check (CRC)to [611 achieve reader-to-tag authentication.Their protocol follows every message flow in Gen2 to provide backward compatibility C.Hash Function-based Solutions 52]. 3)Efficient Key Management:Recently a number of re- In comparison to the symmetric-key encryption,in most searchers have turned their attention to the privacy-preserving cases an equivalent security mechanism can be implemented authentication in RFID systems.The key technical issue is using the hash functions,but the implementation logic can be how can a reader and tag that share a secret efficiently greatly simplified.Therefore,in recent years many researchers authenticate each other without revealing their identities to have focused on implementing a lightweight security mech- an adversary [53].In order to tackle the above problem,an anism using the hash function-based solutions.In regard to essential method is to implement the efficient key management the RFID systems,although the implementation logic of hash in RFID systems.Based on how keys are managed in the function is fairly simple,it still exceeds the resource limitation system,the privacy preserving tag authentications proposed in of RFID tags.Therefore,the hash function in the RFID system the past can be mainly categorized into tree-based and group- should be further simplified.It is found that,the hash value based approaches. can be derived from a pool of random bits pre-stored in the The tree-based approaches employ tree structures to achieve tag's onboard memory [33].First,a string of 200 random bits fast authentication,which allow any pair of tags to share a can be generated for each tag by an offline random number number of key components.Dimitriou propose a lightweight generator,using the tag ID as the seed.The random bits are protocol to the RFID privacy problem,which has the potential then stored in the tag.These bits form a logical ring.Then the to guarantee user privacy without requiring changes to existing hash function H(ID,r)returns a certain number of bits after infrastructure or reducing business value from the use of RFID the rth bit in the ring.200 random bits provide 200 different technology [54].Lu et al.propose a strong and lightweight hash values,which are sufficient for general purpose usage. RFID private authentication protocol,SPA [55].By designing The hash function-based protocols mainly include the hash- a novel key updating method,they achieve the forward secrecy lock protocol,the randomized hash-lock protocol,and the hash in SPA with an efficient key search algorithm.To address the chain protocol.Instead of using the real tag ID Ti,the hash- heavy computational demand for the tree-based authentication,lock protocol [62]utilizes the metalD (the hash value of the Li et al.design two privacy-preserving protocols based on tag's key)for effective authentication to the RFID reader.In cryptographical encoding [56].which significantly reduces this way,the tag's ID can avoid being revealed.However, both authentication data transmitted by each tag and computa-as the metalD keeps unchanged for any tag,if the fixed tion overhead incurred at the reader.To address compromising metalD is used in each response for a specified tag,then the attacks in the tree-based key management structure,Lu et al.RFID system is vulnerable to malicious tracking and replay propose an anti-compromising authentication protocol [57]. attacks.The randomized hash-lock protocol [63]utilizes a

XIE et al.: MANAGING RFID DATA: CHALLENGES, OPPORTUNITIES AND SOLUTIONS 9 R and the ciphertext C. In this situation, the symmetric-key encryption-based protocols have high security. However, due to the limitation of manufacturing cost, the off-the-shelf RFID tags often have limited memory which is less than 512 bits, the number of logical gates is also very limited. In practical use of RFID systems, the length of bit string R, C and ki is much less than the expected value to achieve the security standard. For example, Texas Instruments Inc. has devised the encrypted RFID tags used for vehicle security alarm systems. In consideration of the tag cost, the length of the bit string R and the key ki is only 40 bits, the length of tag response C is only 24 bits. In this situation, techniques like the reverse engineering and password cracking can be used to crack the encryption systems. In order to address the above problem, the researchers have proposed various kinds of lightweight solutions to guarantee security. DESL [50] is a lightweight extension based on the traditional encryption protocol DES. It is specially devised to accommodate the resource requirement for those tiny computing devices like the RFID tags. HIGHT [51] is a protocol based on block encryption algorithm, which utilizes the 64-bit block and 128-bit key. The subkeys are generated during the process of encryption and decryption, which has a low requirement for the hardware resources. Realizing that most lightweight protocols are not fully conforming to the Gen2 standard, Sun et al. propose a novel authentication protocol based on Gen2, called Gen2+, for lowcost RFID tags. Gen2+ is a multiple round protocol using shared pseudonyms and Cyclic Redundancy Check (CRC) to achieve reader-to-tag authentication. Their protocol follows every message flow in Gen2 to provide backward compatibility [52]. 3) Efficient Key Management: Recently a number of researchers have turned their attention to the privacy-preserving authentication in RFID systems. The key technical issue is how can a reader and tag that share a secret efficiently authenticate each other without revealing their identities to an adversary [53]. In order to tackle the above problem, an essential method is to implement the efficient key management in RFID systems. Based on how keys are managed in the system, the privacy preserving tag authentications proposed in the past can be mainly categorized into tree-based and groupbased approaches. The tree-based approaches employ tree structures to achieve fast authentication, which allow any pair of tags to share a number of key components. Dimitriou propose a lightweight protocol to the RFID privacy problem, which has the potential to guarantee user privacy without requiring changes to existing infrastructure or reducing business value from the use of RFID technology [54]. Lu et al. propose a strong and lightweight RFID private authentication protocol, SPA [55]. By designing a novel key updating method, they achieve the forward secrecy in SPA with an efficient key search algorithm. To address the heavy computational demand for the tree-based authentication, Li et al. design two privacy-preserving protocols based on cryptographical encoding [56], which significantly reduces both authentication data transmitted by each tag and computation overhead incurred at the reader. To address compromising attacks in the tree-based key management structure, Lu et al. propose an anti-compromising authentication protocol [57], which employs a novel sparse tree architecture, such that the key of every tag is independent from one another. The group-based approaches are another novel authentication schemes which improves the tradeoff between scalability and privacy by dividing the tags into a number of tags. Hoque et al. propose a group-based anonymous private authentication protocol, which provides unlinkability and thereby preserves privacy [58]. The adversary cannot link the responses with the tags in the protocol, even if he/she can learn the identifier that the tags are using to produce the response. Based on the group-based method, Gong et al. design a fine-grained batch authentication scheme [59], which provides authentication results with accurate estimates of the number of counterfeiting tags and genuine tags. While a tree-based approach achieves high performance in key authentication, it suffers from the issue of low privacy should a fraction of tags be compromised. On the contrary, while group-based key authentication is relatively invulnerable to compromise attacks, it is not scalable to the large number of tags. Therefore, recently several new techniques are proposed for private authentication based on various structures. Sakai et al. propose a new private tag authentication protocol based on skip lists [60]. Without sacrificing the authentication performance, their scheme provides a strong privacy preserving mechanism. In order to achieve forward secrecy and resistance to attacks, Yao et al. propose a lightweight RFID private authentication protocol based on the random walk concept [61]. C. Hash Function-based Solutions In comparison to the symmetric-key encryption, in most cases an equivalent security mechanism can be implemented using the hash functions, but the implementation logic can be greatly simplified. Therefore, in recent years many researchers have focused on implementing a lightweight security mechanism using the hash function-based solutions. In regard to the RFID systems, although the implementation logic of hash function is fairly simple, it still exceeds the resource limitation of RFID tags. Therefore, the hash function in the RFID system should be further simplified. It is found that, the hash value can be derived from a pool of random bits pre-stored in the tag’s onboard memory [33]. First, a string of 200 random bits can be generated for each tag by an offline random number generator, using the tag ID as the seed. The random bits are then stored in the tag. These bits form a logical ring. Then the hash function H(ID, r) returns a certain number of bits after the rth bit in the ring. 200 random bits provide 200 different hash values, which are sufficient for general purpose usage. The hash function-based protocols mainly include the hashlock protocol, the randomized hash-lock protocol, and the hash chain protocol. Instead of using the real tag ID Ti, the hashlock protocol [62] utilizes the metaID (the hash value of the tag’s key) for effective authentication to the RFID reader. In this way, the tag’s ID can avoid being revealed. However, as the metaID keeps unchanged for any tag, if the fixed metaID is used in each response for a specified tag, then the RFID system is vulnerable to malicious tracking and replay attacks. The randomized hash-lock protocol [63] utilizes a This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination

This article has been accepted for inclusion in a future issue of this journal.Content is final as presented,with the exception of pagination. 10 IEEE COMMUNICATIONS SURVEYS TUTORIALS.ACCEPTED FOR PUBLICATION TABLE V COMPARISON OF AUTHENTICATION AND PRIVACY PROTECTION SOLUTIONS Compute Complexity in Scope of Applications Resource Requirement Flexibility Tag Side Physical Very simple.The tag is Passive tags with very Very few requirements. Not flexible.The tag is ei- Mechanism- not required to implement limited processing capac. but additional devices are ther screened or fully ex- based Solutions the logic for security and ity. required to provide secu- posed. privacy protection. rity and privacy protec- tion Symmetric-key Complex.20000-30000 Active tags or passive tags High requirement.Enough Very flexible.Encryption Encryption- logical gates are required. with higher processing ca- processing and storage ca- and decryption can be ef- based Solutions pacity to conduct encryp- pability should be guaran- fectively implemented. tion and decryption. teed. Hash Function- Simple.A simple hash Passive tags to conduct Few requirements.A hash Flexible."Forward secu- based Solutions function can be easily im- authentication. function can be realized rity"can be guaranteed, plemented. by 200 bit random series but encryption and de- stored in the tag side cryption cannot be imple- mented. Ouery metalD metalD- (key,ID)- -key Database Reader -ID- Tag The hash-lock protocol -Ouery Search and compare all +R,Hkey(IDR)- the possible -R,Hkey(IDR)- values of ID Hkey(IDR) ID Database Reader Tag The randomized hash-lock protocol Fig.6.The diagram of the hash-lock protocol and the randomized hash-lock protocol random number-based inquiry and response mechanism.When leverages the "forward security"property of hash functions the reader interrogates the tag,the tag first generates a random to update the tag response with each interrogation.In this number R using the pseudo random number generator,then it way,the replay attack can be effectively avoided.However, computes the hash value V=Hke(IDR),finally it sends the attacker can maliciously interrogate the tag multiple times, the hash value V and R to the reader.The reader conducts which breaks the synchronization between the reader and the exhaustive search over the tags and computes all possible hash tag.In this case,the authentication mechanism cannot work values V'=Hkey (IDR)according to the tag ID.If V'=V properly. for a certain tag ID,then the reader successfully identifies the tag.The above mechanism can prevent the tag from replying a fixed message in each response,which can effectively avoid D.Comparative Study malicious tracking.But it cannot prevent the replay attacks, In Table V,we compare and analyze the three kinds of since the illegal user can overhear the random number R and solutions from four aspects:compute complexity,scope of the corresponding hash value V,and replay this message when applications,resource requirement and flexibility.Note that the reader interrogates the tags,pretending to be the specified there are pros and cons for these three kinds of solutions.It tag.Fig.6 provides the diagram of the hash-lock protocol and is essential to choose a reasonable solution according to the the randomized hash-lock protocol. specific application requirements and resource limitations. The hash-chain protocol [64]utilizes a shared secret-based inquiry and response mechanism.In this protocol,the reader and the tag share two hash functions G and H,as well as E.Summing Up Challenges and Opportunities a random initial identifier s.When the reader interrogates a According to the above recent research progress,we sum- tag,the tag responds the current identifier rk=G(sk)to the marize the challenges and opportunities for authentication and reader and locally updates sk+1=H(sk).The reader searches privacy protection in Table VI,respectively for the physical the database with the identifier rk to authenticates the tag, mechanism-based solution,symmetric-key encryption-based and updates sk to synchronize with the tag.This mechanism solution,as well as Hash function-based solution

10 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION TABLE V COMPARISON OF AUTHENTICATION AND PRIVACY PROTECTION SOLUTIONS Compute Complexity in Tag Side Scope of Applications Resource Requirement Flexibility Physical Mechanismbased Solutions Very simple. The tag is not required to implement the logic for security and privacy protection. Passive tags with very limited processing capacity. Very few requirements, but additional devices are required to provide security and privacy protection. Not flexible. The tag is either screened or fully exposed. Symmetric-key Encryptionbased Solutions Complex. 20000-30000 logical gates are required. Active tags or passive tags with higher processing capacity to conduct encryption and decryption. High requirement. Enough processing and storage capability should be guaranteed. Very flexible. Encryption and decryption can be effectively implemented. Hash Functionbased Solutions Simple. A simple hash function can be easily implemented. Passive tags to conduct authentication. Few requirements. A hash function can be realized by 200 bit random series stored in the tag side. Flexible. “Forward security” can be guaranteed, but encryption and decryption cannot be implemented. Search and compare all the possible values of Hkey(ID||R) Fig. 6. The diagram of the hash-lock protocol and the randomized hash-lock protocol random number-based inquiry and response mechanism. When the reader interrogates the tag, the tag first generates a random number R using the pseudo random number generator, then it computes the hash value V = Hkey(ID||R), finally it sends the hash value V and R to the reader. The reader conducts exhaustive search over the tags and computes all possible hash values V = Hkey(ID||R) according to the tag ID. If V = V for a certain tag ID, then the reader successfully identifies the tag. The above mechanism can prevent the tag from replying a fixed message in each response, which can effectively avoid malicious tracking. But it cannot prevent the replay attacks, since the illegal user can overhear the random number R and the corresponding hash value V , and replay this message when the reader interrogates the tags, pretending to be the specified tag. Fig.6 provides the diagram of the hash-lock protocol and the randomized hash-lock protocol. The hash-chain protocol [64] utilizes a shared secret-based inquiry and response mechanism. In this protocol, the reader and the tag share two hash functions G and H, as well as a random initial identifier s1. When the reader interrogates a tag, the tag responds the current identifier rk = G(sk) to the reader and locally updates sk+1 = H(sk). The reader searches the database with the identifier rk to authenticates the tag, and updates sk to synchronize with the tag. This mechanism leverages the “forward security” property of hash functions to update the tag response with each interrogation. In this way, the replay attack can be effectively avoided. However, the attacker can maliciously interrogate the tag multiple times, which breaks the synchronization between the reader and the tag. In this case, the authentication mechanism cannot work properly. D. Comparative Study In Table V, we compare and analyze the three kinds of solutions from four aspects: compute complexity, scope of applications, resource requirement and flexibility. Note that there are pros and cons for these three kinds of solutions. It is essential to choose a reasonable solution according to the specific application requirements and resource limitations. E. Summing Up Challenges and Opportunities According to the above recent research progress, we summarize the challenges and opportunities for authentication and privacy protection in Table VI, respectively for the physical mechanism-based solution, symmetric-key encryption-based solution, as well as Hash function-based solution. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

点击下载完整版文档（PDF格式）

共18页，试读已结束，阅读完整版请下载

点击下载（PDF格式）

浏览记录