XIE er al:MANAGING RFID DATA:CHALLENGES.OPPORTUNITIES AND SOLUTIONS 1299 RFID tags out of a large collection of tags,without reading to be singleton/collision slot but turns out to be empty,then all the tag data [31].They propose effective algorithms based it implies that the tag(s)mapping to this slot is(are)missing on the idea of group testing.which can efficiently derive Fig.4 gives an example of the polling mechanism.As the figure popular categories of tags.Kodialam et al.propose a privacy- shows,the dotted box denotes the missing tag,while the solid preserving estimation mechanism over dynamically moving box denotes the existing tag.Assume that the tag E,F and H tags in applications like object tracking [32].This mechanism are missing,if the corresponding slots turn out to be empty, can fast count the number of tags moving from location A then the tags can be determined missing.However,if the tag C to B during the time interval(t1,t2).In this way,the traffic is missing,the corresponding slot still turns out to be collision can be effectively tracked without exactly reading the tag IDs. slot due to the existence of the tag B and D.In this situation, In general,research work in this area are relatively rare.With the tag C cannot be identified as missing,resulting in the the further extension of RFID applications,efficient techniques false positive error.The above polling mechanism provides a for data analysis and mining over RFID systems will attract theoretical foundation to recent research work. widespread attentions and in-depth studies Based on the above understanding,Li et al.study a practically important problem of monitoring a large set of C.Polling the Tags RFID tags,i.e.,efficiently identifying the missing tags in a large RFID system [33].Based on the"pseudo randomness", Instead of obtaining all tags'IDs in the effective scanning they design a series of missing-tag identification protocols range,some applications only require to poll the tags in a that employ novel techniques to reduce the execution time. specified tag set,verifying whether these tags are missing. In order to deal with the problem of collecting real-time For example,in regard to the applications like warehouse information from a set of battery-powered active tags,Qiao management,assume that each of the goods is attached with et al.propose a tag-ordering polling protocol [34]that can an RFID tag with unique ID,the administrator should check reduce per-tag energy consumption by more than an order the specified inventories according to the list of goods.In this of magnitude.Moreover,they apply partitioned Bloom filters situation,it becomes an essential problem to devise efficient to further enhance the performance.such that it can achieve polling protocols over the RFID tags,such that both the much better energy efficiency without degradation in protocol time efficiency and energy efficiency can be achieved.A execution time.Chen et al.respectively propose a single- straightforward solution is to realize the following"roll call" hash and a multi-hash based information collection protocol to mechanism:the reader continuously broadcasts the tags'IDs address the above problem [35],which dramatically reduces according to the check list.After each tag ID is broadcasted. the expected execution time.In order to meet the requirement the reader will wait for the specified tag to return a short of prompt and reliable batch authentications in large scale response.If a short response is obtained,then it implies that RFID applications,Yang et al.propose an identification-free this tag exists in the scanning range;otherwise,this tag is batch authentication protocol based on the polling mechanism expected to be missing.However,there are a number of [36].Conventionally,in order to authenticate a large number disadvantages for the above mechanism:1)it is not compatible of tags,the reader should identify and authenticate the tags with the slotted ALOHA:2)the transmission delay of the one by one,which requires a large amount of scanning tag ID with 96 bits is fairly large,the efficiency is greatly time.Based on the "pseudo randomness",this batch authen- reduced.Therefore,many researchers have investigated into tication protocol greatly reduces the overall scanning time, this problem,and tried to design efficient polling protocols while guaranteeing the probability of false positive error to based on the slotted ALOHA,aiming to sufficiently reduce be below the specified threshold.Being different from the the overall scanning time. above research work,which mainly poll over all tags in Note that during each query round of the interrogation,each the scanning area,Zheng et al.address the problem of fast of the activated tags will "randomly"select a slot to respond searching a particular subset in a large number of RFID tags inside the frame.However,due to the simplicity of the tag's [37].They utilize Bloom filter-based compact approximators design,it is rather difficult to realize the "pure randomness". to efficiently aggregate and exchange the tag information Instead,the"pseudo randomness"is used as follows:For each with a two-phase approximation protocol,which significantly query round the reader first broadcasts a random number r to reduces the searching time.Generally speaking,these polling the tags.After that,each of the activated tags will compute mechanisms mainly leverage the pseudo randomness in the a pseudo random number s as the selected slot number.here slotted ALOHA protocol to effectively reduce the additional s hash(ID,r)mod f,ID and f respectively denote the tag cost of uniquely identifying the tags.Nevertheless,since there ID and frame size.The above "pseudo randomness"implies exist the false positive errors for the polling mechanisms. that,in regard to a specified tag,once the random number r optimized algorithms are proposed to reduce the probability and the frame size f is set,the corresponding slot selected by of false positive errors. this tag inside the frame is determined.This property makes it possible for efficient polling over the tags.Since the tag IDs in the specified tag set can be known in advance,then D.Summing Up Challenges and Opportunities before interrogating the tags,the reader can precompute the According to the above recent research progress,we sum- expected state for each slot accordingly.The expected state marize the challenges and opportunities for anti-collision for each slot can be empty slot,singleton slot or collision slot. algorithms in Table IV,respectively for the tag identification, Therefore,in regard to a specified slot,if this slot is expected estimating the tag size,as well as polling the tags.XIE et al.: MANAGING RFID DATA: CHALLENGES, OPPORTUNITIES AND SOLUTIONS 1299 RFID tags out of a large collection of tags, without reading all the tag data [31]. They propose effective algorithms based on the idea of group testing, which can efficiently derive popular categories of tags. Kodialam et al. propose a privacypreserving estimation mechanism over dynamically moving tags in applications like object tracking [32]. This mechanism can fast count the number of tags moving from location A to B during the time interval (t1, t2). In this way, the traffic can be effectively tracked without exactly reading the tag IDs. In general, research work in this area are relatively rare. With the further extension of RFID applications, efficient techniques for data analysis and mining over RFID systems will attract widespread attentions and in-depth studies. C. Polling the Tags Instead of obtaining all tags’ IDs in the effective scanning range, some applications only require to poll the tags in a specified tag set, verifying whether these tags are missing. For example, in regard to the applications like warehouse management, assume that each of the goods is attached with an RFID tag with unique ID, the administrator should check the specified inventories according to the list of goods. In this situation, it becomes an essential problem to devise efficient polling protocols over the RFID tags, such that both the time efficiency and energy efficiency can be achieved. A straightforward solution is to realize the following “roll call” mechanism: the reader continuously broadcasts the tags’ IDs according to the check list. After each tag ID is broadcasted, the reader will wait for the specified tag to return a short response. If a short response is obtained, then it implies that this tag exists in the scanning range; otherwise, this tag is expected to be missing. However, there are a number of disadvantages for the above mechanism: 1) it is not compatible with the slotted ALOHA; 2) the transmission delay of the tag ID with 96 bits is fairly large, the efficiency is greatly reduced. Therefore, many researchers have investigated into this problem, and tried to design efficient polling protocols based on the slotted ALOHA, aiming to sufficiently reduce the overall scanning time. Note that during each query round of the interrogation, each of the activated tags will “randomly” select a slot to respond inside the frame. However, due to the simplicity of the tag’s design, it is rather difficult to realize the “pure randomness”. Instead, the “pseudo randomness” is used as follows: For each query round the reader first broadcasts a random number r to the tags. After that, each of the activated tags will compute a pseudo random number s as the selected slot number, here s = hash(ID, r)mod f, ID and f respectively denote the tag ID and frame size. The above “pseudo randomness” implies that, in regard to a specified tag, once the random number r and the frame size f is set, the corresponding slot selected by this tag inside the frame is determined. This property makes it possible for efficient polling over the tags. Since the tag IDs in the specified tag set can be known in advance, then before interrogating the tags, the reader can precompute the expected state for each slot accordingly. The expected state for each slot can be empty slot, singleton slot or collision slot. Therefore, in regard to a specified slot, if this slot is expected to be singleton/collision slot but turns out to be empty, then it implies that the tag(s) mapping to this slot is(are) missing. Fig.4 gives an example of the polling mechanism. As the figure shows, the dotted box denotes the missing tag, while the solid box denotes the existing tag. Assume that the tag E,F and H are missing, if the corresponding slots turn out to be empty, then the tags can be determined missing. However, if the tag C is missing, the corresponding slot still turns out to be collision slot due to the existence of the tag B and D. In this situation, the tag C cannot be identified as missing, resulting in the false positive error. The above polling mechanism provides a theoretical foundation to recent research work. Based on the above understanding, Li et al. study a practically important problem of monitoring a large set of RFID tags, i.e., efficiently identifying the missing tags in a large RFID system [33]. Based on the “pseudo randomness”, they design a series of missing-tag identification protocols that employ novel techniques to reduce the execution time. In order to deal with the problem of collecting real-time information from a set of battery-powered active tags, Qiao et al. propose a tag-ordering polling protocol [34] that can reduce per-tag energy consumption by more than an order of magnitude. Moreover, they apply partitioned Bloom filters to further enhance the performance, such that it can achieve much better energy efficiency without degradation in protocol execution time. Chen et al. respectively propose a singlehash and a multi-hash based information collection protocol to address the above problem [35], which dramatically reduces the expected execution time. In order to meet the requirement of prompt and reliable batch authentications in large scale RFID applications, Yang et al. propose an identification-free batch authentication protocol based on the polling mechanism [36]. Conventionally, in order to authenticate a large number of tags, the reader should identify and authenticate the tags one by one, which requires a large amount of scanning time. Based on the “pseudo randomness”, this batch authentication protocol greatly reduces the overall scanning time, while guaranteeing the probability of false positive error to be below the specified threshold. Being different from the above research work, which mainly poll over all tags in the scanning area, Zheng et al. address the problem of fast searching a particular subset in a large number of RFID tags [37]. They utilize Bloom filter-based compact approximators to efficiently aggregate and exchange the tag information with a two-phase approximation protocol, which significantly reduces the searching time. Generally speaking, these polling mechanisms mainly leverage the pseudo randomness in the slotted ALOHA protocol to effectively reduce the additional cost of uniquely identifying the tags. Nevertheless, since there exist the false positive errors for the polling mechanisms, optimized algorithms are proposed to reduce the probability of false positive errors. D. Summing Up Challenges and Opportunities According to the above recent research progress, we summarize the challenges and opportunities for anti-collision algorithms in Table IV, respectively for the tag identification, estimating the tag size, as well as polling the tags