to tackle the top-k query problem.We use ensemble sampling be selected,the reader can provide multiple bit masks in the to quickly estimate the threshold corresponding to the k-th Select command. largest category,and reduce it to the iceberg query problem.3) While achieving time-efficiency,our solutions are completely B.The Impact of the Inter-cycle Overhead compatible with current industry standards,i.e.,the EPCglobal The MAC protocol for the C1G2 system is based on s- C1G2 standards,and do not require any modification to tags. lotted ALOHA.In order to accurately estimate the size of a specified set of tags,conventionally,the reader should issue II.RELATED WORK multiple query cycles over the same set of tags and take the In RFID systems,a reader needs to receive data from average of the estimates.The inter-cycle overhead consists of multiple tags,while the tags are unable to self-regulate their the time between cycles when the reader is powered down, radio transmissions to avoid collisions;then,a series of s-and the carrier time used to power the tags before beginning lotted ALOHA-based anti-collision protocols are designed to communication.According to the experiment results in [14], efficiently resolve collisions in RFID systems.In order to deal which are conducted in realistic settings,these times are 40 ms with the collision problems in multi-reader RFID systems, and 3 ms respectively,while the average time interval per slot scheduling protocols for reader activation are explored in [7,8]. is 1~2 ms.We have further measured the time interval for Recently,a number of polling protocols [9,10]are proposed, various slots with the USRP N210 platform,it is found that aiming to collect information from battery-powered active tags the average interval for empty slots is 1.6 ms per slot,and the in an energy efficient approach. average interval for singleton/collision slots is 5.1 ms per slot. Recent research is focused on the collection of statistical Note that if the powered-down interval is not long enough,it is information over the RFID tags [2,4,5,11-13].The authors possible that some surrounding tags will maintain the former mainly consider the problem of estimating the number of tags state for the inventoried flag with their local residual power, without collecting the tag IDs.In [4],Murali et al.provide very which causes them to keep silent in the upcoming query cycle. fast and reliable estimation mechanisms for tag quantity in a Therefore,the inter-cycle duration must be taken into account more practical approach.In [2],Shahzad et al.propose a new when considering overall reading performance.It is obvious scheme for estimating tag population size called Average Run that reading a larger number of tags per cycle amortizes the cost based Tag estimation.In [5],Sheng et al.consider the problem of inter-cycle overhead,resulting in lower per tag reading time, of identifying popular categories of RFID tags out of a large while for small tag sets the inter-cycle overhead is significant. collection of tags,while the set of category IDs are supposed to be known.In this paper,our goal is to collect the histograms IV.PROBLEM FORMULATION for all categories over RFID tags in a time-efficient approach, Suppose there are a large number of tags in the effective without any priori knowledge of the categories,including the scanning area of the RFID reader,the RFID system conforms number of categories and the category IDs. to EPCglobal C1G2 standards.i.e..the slotted ALOHA-based anti-collision scheme is used in the system model.The objective III.PRELIMINARY is to collect the histogram over RFID tags according to some A.The framed slotted ALOHA protocol categorized metric,e.g,the type of merchandise,while the In the Class 1 Gen 2 standard,the RFID system leverages present set of category IDs cannot be predicted in advance. the framed slotted ALOHA protocol to resolve the collisions for As we aim at a dynamic environment where the tags may tag identification.When a reader wishes to read a set of tags, frequently enter and leave the scanning area,a time-efficient it first powers up and transmits a continuous wave to energize strategy must be proposed.Therefore,the specified accuracy the tags.It then initiates a series of frames,varying the number can be relaxed in order to quickly collect the histogram. of slots in each frame to best accommodate the number of Assume that the overall tag size is n,there exists m categories tags.Each frame has a number of slots and each active tag C=[C1,C2,...,Cm),and the actual tag size for each category will reply in a randomly selected slot per frame.After all tags is n1,n2,...,nm. are read,the reader powers down.We refer to the series of In the Basic Histogram Collection,the RFID system needs frames between power down periods as a Query Cycle.Note to collect the histogram for all categories.Due to the inherent that,within each frame,tags may choose the same slot,which inaccurate property for RFID systems,users can specify the causes multiple tags to reply during a slot.Therefore,within accuracy requirement for the histogram collection.Suppose the each frame there exist three kinds of slots:(1)the empty slot estimated tag size for category Ci(1<i<m)is ni,then the where no tag replies;(2)the single slot where only one tag following accuracy constraint should be satisfied: replies;and (3)the collision slot where multiple tags reply. Pr[mi-nil≤e·ni≥1-B accuracy constraint.. (1) In regard to the tag ID,each tag has a unique 96-bit ID in its EPC memory,where the first s binary bits can be regarded as The accuracy constraint illustrates that,given the exact tag size the category ID(1 <s<96).According to the C1G2 standard,ni for a specified category,the estimated tag size n should for each QueryCycle,the reader is able to select the tags in a be in a confidence interval of width 2e[1 specified category by sending a Select command with an s-bit e,1+e with a probability greater than 1-B.For example,if mask in the category ID field.If multiple categories need to e=0.1,B=0.05,then in regard to a category with tag sizeto tackle the top-k query problem. We use ensemble sampling to quickly estimate the threshold corresponding to the 𝑘-th largest category, and reduce it to the iceberg query problem. 3) While achieving time-efficiency, our solutions are completely compatible with current industry standards, i.e., the EPCglobal C1G2 standards, and do not require any modification to tags. II. RELATED WORK In RFID systems, a reader needs to receive data from multiple tags, while the tags are unable to self-regulate their radio transmissions to avoid collisions; then, a series of slotted ALOHA-based anti-collision protocols are designed to efficiently resolve collisions in RFID systems. In order to deal with the collision problems in multi-reader RFID systems, scheduling protocols for reader activation are explored in [7, 8]. Recently, a number of polling protocols [9, 10] are proposed, aiming to collect information from battery-powered active tags in an energy efficient approach. Recent research is focused on the collection of statistical information over the RFID tags [2, 4, 5, 11–13]. The authors mainly consider the problem of estimating the number of tags without collecting the tag IDs. In [4], Murali et al. provide very fast and reliable estimation mechanisms for tag quantity in a more practical approach. In [2], Shahzad et al. propose a new scheme for estimating tag population size called Average Run based Tag estimation. In [5], Sheng et al. consider the problem of identifying popular categories of RFID tags out of a large collection of tags, while the set of category IDs are supposed to be known. In this paper, our goal is to collect the histograms for all categories over RFID tags in a time-efficient approach, without any priori knowledge of the categories, including the number of categories and the category IDs. III. PRELIMINARY A. The framed slotted ALOHA protocol In the Class 1 Gen 2 standard, the RFID system leverages the framed slotted ALOHA protocol to resolve the collisions for tag identification. When a reader wishes to read a set of tags, it first powers up and transmits a continuous wave to energize the tags. It then initiates a series of frames, varying the number of slots in each frame to best accommodate the number of tags. Each frame has a number of slots and each active tag will reply in a randomly selected slot per frame. After all tags are read, the reader powers down. We refer to the series of frames between power down periods as a Query Cycle. Note that, within each frame, tags may choose the same slot, which causes multiple tags to reply during a slot. Therefore, within each frame there exist three kinds of slots: (1) the empty slot where no tag replies; (2) the single slot where only one tag replies; and (3) the collision slot where multiple tags reply. In regard to the tag ID, each tag has a unique 96-bit ID in its EPC memory, where the first 𝑠 binary bits can be regarded as the category ID (1 <𝑠< 96). According to the C1G2 standard, for each Query Cycle, the reader is able to select the tags in a specified category by sending a Select command with an 𝑠-bit mask in the category ID field. If multiple categories need to be selected, the reader can provide multiple bit masks in the Select command. B. The Impact of the Inter-cycle Overhead The MAC protocol for the C1G2 system is based on slotted ALOHA. In order to accurately estimate the size of a specified set of tags, conventionally, the reader should issue multiple query cycles over the same set of tags and take the average of the estimates. The inter-cycle overhead consists of the time between cycles when the reader is powered down, and the carrier time used to power the tags before beginning communication. According to the experiment results in [14], which are conducted in realistic settings, these times are 40 ms and 3 ms respectively, while the average time interval per slot is 1 ∼ 2 ms. We have further measured the time interval for various slots with the USRP N210 platform, it is found that the average interval for empty slots is 1.6 ms per slot, and the average interval for singleton/collision slots is 5.1 ms per slot. Note that if the powered-down interval is not long enough, it is possible that some surrounding tags will maintain the former state for the inventoried flag with their local residual power, which causes them to keep silent in the upcoming query cycle. Therefore, the inter-cycle duration must be taken into account when considering overall reading performance. It is obvious that reading a larger number of tags per cycle amortizes the cost of inter-cycle overhead, resulting in lower per tag reading time, while for small tag sets the inter-cycle overhead is significant. IV. PROBLEM FORMULATION Suppose there are a large number of tags in the effective scanning area of the RFID reader, the RFID system conforms to EPCglobal C1G2 standards, i.e., the slotted ALOHA-based anti-collision scheme is used in the system model. The objective is to collect the histogram over RFID tags according to some categorized metric, e.g, the type of merchandise, while the present set of category IDs cannot be predicted in advance. As we aim at a dynamic environment where the tags may frequently enter and leave the scanning area, a time-efficient strategy must be proposed. Therefore, the specified accuracy can be relaxed in order to quickly collect the histogram. Assume that the overall tag size is 𝑛, there exists 𝑚 categories 𝐶 = {𝐶1, 𝐶2, ..., 𝐶𝑚}, and the actual tag size for each category is 𝑛1, 𝑛2, ..., 𝑛𝑚. In the Basic Histogram Collection, the RFID system needs to collect the histogram for all categories. Due to the inherent inaccurate property for RFID systems, users can specify the accuracy requirement for the histogram collection. Suppose the estimated tag size for category 𝐶𝑖(1 ≤ 𝑖 ≤ 𝑚) is 𝑛ˆ𝑖, then the following accuracy constraint should be satisfied: 𝑃 𝑟[∣𝑛ˆ𝑖 − 𝑛𝑖∣ ≤ 𝜖 ⋅ 𝑛𝑖] ≥ 1 − 𝛽 accuracy constraint. (1) The accuracy constraint illustrates that, given the exact tag size 𝑛𝑖 for a specified category, the estimated tag size 𝑛ˆ𝑖 should be in a confidence interval of width 2𝜖 ⋅ 𝑛𝑖, i.e., 𝑛ˆ𝑖 𝑛𝑖 ∈ [1 − 𝜖, 1 + 𝜖] with a probability greater than 1 − 𝛽. For example, if 𝜖 = 0.1, 𝛽 = 0.05, then in regard to a category with tag size