𝑛𝑖 = 100, the estimated tag size 𝑛_中国高校课件下载中心

点击下载：计算机科学与技术（参考文献）Efficiently Collecting Histograms Over RFID Tags

正在加载图片...

ni=100,the estimated tag size n should be within the range categories m can be fairly large,e.g.,in the order of hundreds [90,110]with a probability greater than 95%. the Select command and the fixed initial frame size for each In the Iceberg Ouery Problem,only those categories with a category,as well as the inter-cycle overhead among a large tag size over a specified threshold t are essential to be illustrated number of query cycles,make the overall scanning time rather in the histogram,while the accuracy requirement is satisfied. large. As the exact tag size ni for category Ci is unknown,then, Therefore,we consider an ensemble sampling-based estima- given the estimated value of tag size ni,it is possible to have tion scheme as follows:select a certain number of categories false negative error and false positive error in verifying the and issue a query cycle,obtain the empty/singleton/collision population constraint.Therefore,it is essential to guarantee that slots,and then estimate the tag size for each of the categories the false negative/positive rate is below B,that is: according to the sampling in the singleton slots.In this way,the Pr[m<tn:≥t<B, ensemble sampling is more preferred than the separate counting (2) in terms of reading performance.Since more tags are involved Prm≥tn:<t<B. (3) in one query cycle,more slots amortize the cost of inter-cycle In the Top-k Query Problem,we use the definition of the overhead,the Select command,as well as the fixed initial frame probabilistic threshold top-k query (PT-Topk query),i.e.,in size.Thus,the overall scanning time can be greatly reduced. regard to the tag size,only the set of categories where each A.The Estimator SE takes a probability of at least 1-B to be in the top-k list are In the slotted ALOHA-based protocol,besides the empty illustrated in the histogram,while the accuracy requirement is slots and the collision slots,the singleton slots can be obtained. satisfied.Much like the iceberg query problem,as the exact tag size ni for category Ci is unknown,then,given the estimated In the ensemble sampling-based estimation,according to the observed statistics of the empty/singleton/collision slots,we can value of tag size ni,it is possible to have false negative error use estimators in [4]12]15]etc.to estimate the overall tag and false positive error in verifying the population constraint, the following constraint must be satisfied: size.Then,according to the response in each singleton slot, the category ID is obtained from the first s bits in the tag ID. Pr[Ciis regarded out of top-k list|Ci E top-k list]<B,(4) Based on the sampling from the singleton slots,the tag size for Pr[C;is regarded in top-k list Ci top-k list]<B.(5) each category can be estimated.The reason is as follows: Assume that there exists m categories C1,C2,...,Cm,the In this paper,we aim to propose a series of novel solutions overall tag size is n,and the tag size for each category is to tackle the above problems,while satisfying the following n,n2,...,nm.We define an indicator variable Xij to denote properties:(1)Time-efficient.(2)Simple for the tag side in the whether one tag of category Ci selects a slot j inside the frame protocol.(3)Complies with the EPCglobal C1G2 standards. with the size f.We set Xi.j=1 if only one tag of category V.USE ENSEMBLE SAMPLING TO COLLECT HISTOGRAMS Ci selects the slot j,and Xi.j=0 otherwise.Moreover,we use Pr[Xi.j=1]to denote the probability that only one tag When collecting the histograms over a large number of of category Ci selects the slot j,then, categories,the objective is to minimize the overall scanning time while the corresponding accuracy/population constraints Pr[X=1刂= are satisfied.Two straightforward solutions are summarized as follows:(1)Basic Tag Identification:The histogram is collected If we use ns.i to denote the number of singleton slots selected by uniquely identifying each tag from the massive tag set and by tags of category Ci,thus ns.i= ∑=lXi,then,the putting it into the corresponding categories,thus the accuracy expected value is 100%,and(2)Separate Counting:The histogram is collected by estimating the number of tags in each category one by E(nsi)= Pr[Xi=1=(1-F)n-1.n one.Specifically,for each category,the reader broadcasts a i= Select command to activate the tags in the specified category. According to the replies from the specified tags,the estimators Furthermore,let ns denote the number of singleton slots,the like [4][12][15]can be used to estimate the tag size for each expected value E(ns)=(1-1)n-1.n.Then, =兴 E(n。) category.As the rough tag size for each category cannot be Thus we can approximate the tag size of category C as follows: predicted,a fixed initial frame size is used for each category. si.亢元= (6 Both of the above two solutions are not time-efficient.In Ta regard to the basic tag identification,uniquely identifying each Here,is the estimated value of the overall tag size.Let= tag in the massive set is not scalable,for as the tag size grows ,hen应=元 into a huge number,the scanning time can be an unacceptable value.In regard to the separated counting,the reader needs to B.Accuracy Analysis scan each category with at least one query cycle,even if the 1)Accuracy of the SE Estimator:In the ensemble sampling- category is a minor category,which is not necessarily addressed based estimation,since the estimators such as [4][12][15]can in the iceberg query or the top-k query.As the number of be utilized for estimating the overall number of tags,we use 6𝑛𝑖 = 100, the estimated tag size 𝑛ˆ𝑖 should be within the range [90, 110] with a probability greater than 95%. In the Iceberg Query Problem, only those categories with a tag size over a specified threshold 𝑡 are essential to be illustrated in the histogram, while the accuracy requirement is satisfied. As the exact tag size 𝑛𝑖 for category 𝐶𝑖 is unknown, then, given the estimated value of tag size 𝑛ˆ𝑖, it is possible to have false negative error and false positive error in verifying the population constraint. Therefore, it is essential to guarantee that the false negative/positive rate is below 𝛽, that is: 𝑃 𝑟[𝑛ˆ𝑖 < 𝑡∣𝑛𝑖 ≥ 𝑡] < 𝛽, (2) 𝑃 𝑟[𝑛ˆ𝑖 ≥ 𝑡∣𝑛𝑖 < 𝑡] < 𝛽. (3) In the Top-k Query Problem, we use the definition of the probabilistic threshold top-𝑘 query (PT-Topk query), i.e., in regard to the tag size, only the set of categories where each takes a probability of at least 1 − 𝛽 to be in the top-𝑘 list are illustrated in the histogram, while the accuracy requirement is satisfied. Much like the iceberg query problem, as the exact tag size 𝑛𝑖 for category 𝐶𝑖 is unknown, then, given the estimated value of tag size 𝑛ˆ𝑖, it is possible to have false negative error and false positive error in verifying the population constraint, the following constraint must be satisfied: 𝑃 𝑟[𝐶𝑖is regarded out of top-𝑘 list∣𝐶𝑖 ∈ top-𝑘 list] < 𝛽, (4) 𝑃 𝑟[𝐶𝑖is regarded in top-𝑘 list∣𝐶𝑖 ∈/ top-𝑘 list] < 𝛽. (5) In this paper, we aim to propose a series of novel solutions to tackle the above problems, while satisfying the following properties: (1) Time-efficient. (2) Simple for the tag side in the protocol. (3) Complies with the EPCglobal C1G2 standards. V. USE ENSEMBLE SAMPLING TO COLLECT HISTOGRAMS When collecting the histograms over a large number of categories, the objective is to minimize the overall scanning time while the corresponding accuracy/population constraints are satisfied. Two straightforward solutions are summarized as follows: (1) Basic Tag Identification: The histogram is collected by uniquely identifying each tag from the massive tag set and putting it into the corresponding categories, thus the accuracy is 100%, and (2) Separate Counting: The histogram is collected by estimating the number of tags in each category one by one. Specifically, for each category, the reader broadcasts a Select command to activate the tags in the specified category. According to the replies from the specified tags, the estimators like [4][12][15] can be used to estimate the tag size for each category. As the rough tag size for each category cannot be predicted, a fixed initial frame size is used for each category. Both of the above two solutions are not time-efficient. In regard to the basic tag identification, uniquely identifying each tag in the massive set is not scalable, for as the tag size grows into a huge number, the scanning time can be an unacceptable value. In regard to the separated counting, the reader needs to scan each category with at least one query cycle, even if the category is a minor category, which is not necessarily addressed in the iceberg query or the top-𝑘 query. As the number of categories 𝑚 can be fairly large, e.g., in the order of hundreds, the Select command and the fixed initial frame size for each category, as well as the inter-cycle overhead among a large number of query cycles, make the overall scanning time rather large. Therefore, we consider an ensemble sampling-based estimation scheme as follows: select a certain number of categories and issue a query cycle, obtain the empty/singleton/collision slots, and then estimate the tag size for each of the categories according to the sampling in the singleton slots. In this way, the ensemble sampling is more preferred than the separate counting in terms of reading performance. Since more tags are involved in one query cycle, more slots amortize the cost of inter-cycle overhead, the Select command, as well as the fixed initial frame size. Thus, the overall scanning time can be greatly reduced. A. The Estimator SE In the slotted ALOHA-based protocol, besides the empty slots and the collision slots, the singleton slots can be obtained. In the ensemble sampling-based estimation, according to the observed statistics of the empty/singleton/collision slots, we can use estimators in [4][12][15] etc. to estimate the overall tag size. Then, according to the response in each singleton slot, the category ID is obtained from the first 𝑠 bits in the tag ID. Based on the sampling from the singleton slots, the tag size for each category can be estimated. The reason is as follows: Assume that there exists 𝑚 categories 𝐶1, 𝐶2, ..., 𝐶𝑚, the overall tag size is 𝑛, and the tag size for each category is 𝑛1, 𝑛2, ..., 𝑛𝑚. We define an indicator variable 𝑋𝑖,𝑗 to denote whether one tag of category 𝐶𝑖 selects a slot 𝑗 inside the frame with the size 𝑓. We set 𝑋𝑖,𝑗 = 1 if only one tag of category 𝐶𝑖 selects the slot 𝑗, and 𝑋𝑖,𝑗 = 0 otherwise. Moreover, we use 𝑃 𝑟[𝑋𝑖,𝑗 = 1] to denote the probability that only one tag of category 𝐶𝑖 selects the slot 𝑗, then, 𝑃 𝑟[𝑋𝑖,𝑗 = 1] = 1 𝑓 ⋅ (1 − 1 𝑓 ) 𝑛−1 ⋅ 𝑛𝑖. If we use 𝑛𝑠,𝑖 to denote the number of singleton slots selected by tags of category 𝐶𝑖, thus 𝑛𝑠,𝑖 = ∑𝑓 𝑗=1 𝑋𝑖,𝑗 , then, the expected value 𝐸(𝑛𝑠,𝑖) = ∑ 𝑓 𝑗=1 𝑃 𝑟[𝑋𝑖,𝑗 = 1] = (1 − 1 𝑓 ) 𝑛−1 ⋅ 𝑛𝑖. Furthermore, let 𝑛𝑠 denote the number of singleton slots, the expected value 𝐸(𝑛𝑠) = (1 − 1 𝑓 )𝑛−1 ⋅ 𝑛. Then, 𝐸(𝑛𝑠,𝑖) 𝐸(𝑛𝑠) = 𝑛𝑖 𝑛 . Thus we can approximate the tag size of category 𝐶𝑖 as follows: 𝑛ˆ𝑖 = 𝑛𝑠,𝑖 𝑛𝑠 ⋅ 𝑛. ˆ (6) Here, 𝑛ˆ is the estimated value of the overall tag size. Let 𝛼ˆ𝑖 = 𝑛𝑠,𝑖 𝑛𝑠 , then 𝑛ˆ𝑖 = 𝛼ˆ𝑖 ⋅ 𝑛ˆ. B. Accuracy Analysis 1) Accuracy of the SE Estimator: In the ensemble samplingbased estimation, since the estimators such as [4][12][15] can be utilized for estimating the overall number of tags, we use 𝛿

<<向上翻页向下翻页>>

点击下载：计算机科学与技术（参考文献）Efficiently Collecting Histograms Over RFID Tags