ni=100,the estimated tag size n should be within the range categories m can be fairly large,e.g.,in the order of hundreds [90,110]with a probability greater than 95%. the Select command and the fixed initial frame size for each In the Iceberg Ouery Problem,only those categories with a category,as well as the inter-cycle overhead among a large tag size over a specified threshold t are essential to be illustrated number of query cycles,make the overall scanning time rather in the histogram,while the accuracy requirement is satisfied. large. As the exact tag size ni for category Ci is unknown,then, Therefore,we consider an ensemble sampling-based estima- given the estimated value of tag size ni,it is possible to have tion scheme as follows:select a certain number of categories false negative error and false positive error in verifying the and issue a query cycle,obtain the empty/singleton/collision population constraint.Therefore,it is essential to guarantee that slots,and then estimate the tag size for each of the categories the false negative/positive rate is below B,that is: according to the sampling in the singleton slots.In this way,the Pr[m<tn:โฅt<B, ensemble sampling is more preferred than the separate counting (2) in terms of reading performance.Since more tags are involved Prmโฅtn:<t<B. (3) in one query cycle,more slots amortize the cost of inter-cycle In the Top-k Query Problem,we use the definition of the overhead,the Select command,as well as the fixed initial frame probabilistic threshold top-k query (PT-Topk query),i.e.,in size.Thus,the overall scanning time can be greatly reduced. regard to the tag size,only the set of categories where each A.The Estimator SE takes a probability of at least 1-B to be in the top-k list are In the slotted ALOHA-based protocol,besides the empty illustrated in the histogram,while the accuracy requirement is slots and the collision slots,the singleton slots can be obtained. satisfied.Much like the iceberg query problem,as the exact tag size ni for category Ci is unknown,then,given the estimated In the ensemble sampling-based estimation,according to the observed statistics of the empty/singleton/collision slots,we can value of tag size ni,it is possible to have false negative error use estimators in [4]12]15]etc.to estimate the overall tag and false positive error in verifying the population constraint, the following constraint must be satisfied: size.Then,according to the response in each singleton slot, the category ID is obtained from the first s bits in the tag ID. Pr[Ciis regarded out of top-k list|Ci E top-k list]<B,(4) Based on the sampling from the singleton slots,the tag size for Pr[C;is regarded in top-k list Ci top-k list]<B.(5) each category can be estimated.The reason is as follows: Assume that there exists m categories C1,C2,...,Cm,the In this paper,we aim to propose a series of novel solutions overall tag size is n,and the tag size for each category is to tackle the above problems,while satisfying the following n,n2,...,nm.We define an indicator variable Xij to denote properties:(1)Time-efficient.(2)Simple for the tag side in the whether one tag of category Ci selects a slot j inside the frame protocol.(3)Complies with the EPCglobal C1G2 standards. with the size f.We set Xi.j=1 if only one tag of category V.USE ENSEMBLE SAMPLING TO COLLECT HISTOGRAMS Ci selects the slot j,and Xi.j=0 otherwise.Moreover,we use Pr[Xi.j=1]to denote the probability that only one tag When collecting the histograms over a large number of of category Ci selects the slot j,then, categories,the objective is to minimize the overall scanning time while the corresponding accuracy/population constraints Pr[X=1ๅ= are satisfied.Two straightforward solutions are summarized as follows:(1)Basic Tag Identification:The histogram is collected If we use ns.i to denote the number of singleton slots selected by uniquely identifying each tag from the massive tag set and by tags of category Ci,thus ns.i= โ=lXi,then,the putting it into the corresponding categories,thus the accuracy expected value is 100%,and(2)Separate Counting:The histogram is collected by estimating the number of tags in each category one by E(nsi)= Pr[Xi=1=(1-F)n-1.n one.Specifically,for each category,the reader broadcasts a i= Select command to activate the tags in the specified category. According to the replies from the specified tags,the estimators Furthermore,let ns denote the number of singleton slots,the like [4][12][15]can be used to estimate the tag size for each expected value E(ns)=(1-1)n-1.n.Then, =ๅ
ด E(nใ) category.As the rough tag size for each category cannot be Thus we can approximate the tag size of category C as follows: predicted,a fixed initial frame size is used for each category. si.ไบข ๅ
= (6 Both of the above two solutions are not time-efficient.In Ta regard to the basic tag identification,uniquely identifying each Here,is the estimated value of the overall tag size.Let= tag in the massive set is not scalable,for as the tag size grows ,henๅบ=ๅ
into a huge number,the scanning time can be an unacceptable value.In regard to the separated counting,the reader needs to B.Accuracy Analysis scan each category with at least one query cycle,even if the 1)Accuracy of the SE Estimator:In the ensemble sampling- category is a minor category,which is not necessarily addressed based estimation,since the estimators such as [4][12][15]can in the iceberg query or the top-k query.As the number of be utilized for estimating the overall number of tags,we use 6๐๐ = 100, the estimated tag size ๐ห๐ should be within the range [90, 110] with a probability greater than 95%. In the Iceberg Query Problem, only those categories with a tag size over a specified threshold ๐ก are essential to be illustrated in the histogram, while the accuracy requirement is satisfied. As the exact tag size ๐๐ for category ๐ถ๐ is unknown, then, given the estimated value of tag size ๐ห๐, it is possible to have false negative error and false positive error in verifying the population constraint. Therefore, it is essential to guarantee that the false negative/positive rate is below ๐ฝ, that is: ๐ ๐[๐ห๐ < ๐กโฃ๐๐ โฅ ๐ก] < ๐ฝ, (2) ๐ ๐[๐ห๐ โฅ ๐กโฃ๐๐ < ๐ก] < ๐ฝ. (3) In the Top-k Query Problem, we use the definition of the probabilistic threshold top-๐ query (PT-Topk query), i.e., in regard to the tag size, only the set of categories where each takes a probability of at least 1 โ ๐ฝ to be in the top-๐ list are illustrated in the histogram, while the accuracy requirement is satisfied. Much like the iceberg query problem, as the exact tag size ๐๐ for category ๐ถ๐ is unknown, then, given the estimated value of tag size ๐ห๐, it is possible to have false negative error and false positive error in verifying the population constraint, the following constraint must be satisfied: ๐ ๐[๐ถ๐is regarded out of top-๐ listโฃ๐ถ๐ โ top-๐ list] < ๐ฝ, (4) ๐ ๐[๐ถ๐is regarded in top-๐ listโฃ๐ถ๐ โ/ top-๐ list] < ๐ฝ. (5) In this paper, we aim to propose a series of novel solutions to tackle the above problems, while satisfying the following properties: (1) Time-efficient. (2) Simple for the tag side in the protocol. (3) Complies with the EPCglobal C1G2 standards. V. USE ENSEMBLE SAMPLING TO COLLECT HISTOGRAMS When collecting the histograms over a large number of categories, the objective is to minimize the overall scanning time while the corresponding accuracy/population constraints are satisfied. Two straightforward solutions are summarized as follows: (1) Basic Tag Identification: The histogram is collected by uniquely identifying each tag from the massive tag set and putting it into the corresponding categories, thus the accuracy is 100%, and (2) Separate Counting: The histogram is collected by estimating the number of tags in each category one by one. Specifically, for each category, the reader broadcasts a Select command to activate the tags in the specified category. According to the replies from the specified tags, the estimators like [4][12][15] can be used to estimate the tag size for each category. As the rough tag size for each category cannot be predicted, a fixed initial frame size is used for each category. Both of the above two solutions are not time-efficient. In regard to the basic tag identification, uniquely identifying each tag in the massive set is not scalable, for as the tag size grows into a huge number, the scanning time can be an unacceptable value. In regard to the separated counting, the reader needs to scan each category with at least one query cycle, even if the category is a minor category, which is not necessarily addressed in the iceberg query or the top-๐ query. As the number of categories ๐ can be fairly large, e.g., in the order of hundreds, the Select command and the fixed initial frame size for each category, as well as the inter-cycle overhead among a large number of query cycles, make the overall scanning time rather large. Therefore, we consider an ensemble sampling-based estima๏ฟพtion scheme as follows: select a certain number of categories and issue a query cycle, obtain the empty/singleton/collision slots, and then estimate the tag size for each of the categories according to the sampling in the singleton slots. In this way, the ensemble sampling is more preferred than the separate counting in terms of reading performance. Since more tags are involved in one query cycle, more slots amortize the cost of inter-cycle overhead, the Select command, as well as the fixed initial frame size. Thus, the overall scanning time can be greatly reduced. A. The Estimator SE In the slotted ALOHA-based protocol, besides the empty slots and the collision slots, the singleton slots can be obtained. In the ensemble sampling-based estimation, according to the observed statistics of the empty/singleton/collision slots, we can use estimators in [4][12][15] etc. to estimate the overall tag size. Then, according to the response in each singleton slot, the category ID is obtained from the first ๐ bits in the tag ID. Based on the sampling from the singleton slots, the tag size for each category can be estimated. The reason is as follows: Assume that there exists ๐ categories ๐ถ1, ๐ถ2, ..., ๐ถ๐, the overall tag size is ๐, and the tag size for each category is ๐1, ๐2, ..., ๐๐. We define an indicator variable ๐๐,๐ to denote whether one tag of category ๐ถ๐ selects a slot ๐ inside the frame with the size ๐. We set ๐๐,๐ = 1 if only one tag of category ๐ถ๐ selects the slot ๐, and ๐๐,๐ = 0 otherwise. Moreover, we use ๐ ๐[๐๐,๐ = 1] to denote the probability that only one tag of category ๐ถ๐ selects the slot ๐, then, ๐ ๐[๐๐,๐ = 1] = 1 ๐ โ
(1 โ 1 ๐ ) ๐โ1 โ
๐๐. If we use ๐๐ ,๐ to denote the number of singleton slots selected by tags of category ๐ถ๐, thus ๐๐ ,๐ = โ๐ ๐=1 ๐๐,๐ , then, the expected value ๐ธ(๐๐ ,๐) = โ ๐ ๐=1 ๐ ๐[๐๐,๐ = 1] = (1 โ 1 ๐ ) ๐โ1 โ
๐๐. Furthermore, let ๐๐ denote the number of singleton slots, the expected value ๐ธ(๐๐ ) = (1 โ 1 ๐ )๐โ1 โ
๐. Then, ๐ธ(๐๐ ,๐) ๐ธ(๐๐ ) = ๐๐ ๐ . Thus we can approximate the tag size of category ๐ถ๐ as follows: ๐ห๐ = ๐๐ ,๐ ๐๐ โ
๐. ห (6) Here, ๐ห is the estimated value of the overall tag size. Let ๐ผห๐ = ๐๐ ,๐ ๐๐ , then ๐ห๐ = ๐ผห๐ โ
๐ห. B. Accuracy Analysis 1) Accuracy of the SE Estimator: In the ensemble sampling๏ฟพbased estimation, since the estimators such as [4][12][15] can be utilized for estimating the overall number of tags, we use ๐ฟ