of the tag size 𝑛ˆ𝑖 is above/below _中国高校课件下载中心

点击下载：计算机科学与技术（参考文献）Efficiently Collecting Histograms Over RFID Tags

正在加载图片...

of the tag size ni is above/below the threshold t,the specified B.Algorithm for the Iceberg Query Problem category can be respectively identified as qualified/unqualifed, as both the false positive and false negative probability are less Algorithm 2 Algorithm for Iceberg Query than B;otherwise,the specified category is still undetermined. 1:Initialize R to all categories,set Q,U,V to Set I=1. According to the weighted statistical averaging method,as the 2:while R≠ado number of repeated tests increases,the averaged variance o;for 3 If I=1,set the initial frame size f. each category decreases,thus the confidence interval for each 4 Issue a query cycle over the tags,add those relatively category is shrinking.Therefore,after a certain number of query major categories into the set S.Set S=S. cycles,all categories can be determined as qualified/unqualifed 5 while S≠ado for the population constraint. 6: Compute the frame size fi for each category CiES suchthat the variance,=产可：f片>元e, Qualified Category then remove Ci from S to V.If fi>fmar,set fi= fmaz.Obtain the frame size f as the mid-value among Unqualified Category the series of fi. Threshold 7: Select all tags in S,issue a query cycle with the frame size f,compute the estimated tag size n and the averaged standard deviation o;for each category 内由由一 C S.Detect the qualified category set Q and CI C2 C3 C4 C5 C6 C7 C8 C9 CI0CII CI2 unqualified category set U.Set S=S-Q-U. 8 ifU≠then Fig.1.Histogram with confidence interval annotated 9: Wipe out all categories unexplored in the singleton slots from S Note that when the estimated value元；>t or《t, 10: end if the required variance in the population constraint is much 11: end while larger than the specifications of the accuracy constraint.In 12: n=元-∑c,es元.R=R-S,1=1+1 this situation,these categories can be quickly identified as 13:end while qualified/unqualified,and can be wiped out immediately from 14:Further verify the categories in V and Q for the accuracy the ensemble sampling for verifying the population constraint. constraint Thus,those undetermined categories can be further involved in the ensemble sampling with a much smaller tag size,verifying We propose the algorithm for the iceberg query problem in the population constraint in a faster approach. Algorithm 2.Steps 1-4 are quite similar to steps 1-4 in Algo- Sometimes the tag sizes of various categories are subject rithm 1,due to lack of space we omit the detailed statements to some skew distributions with a "long tail".The long tail for these steps.Assume that the current set of categories is represents those categories each of which occupies a rather R,during the query cycles of ensemble sampling,the reader small percentage among the total categories,but all together continuously updates the statistical value of n as well as the they occupy a substantial proportion of the overall tag sizes.In standard deviation oi for each category CiE R.After each regard to the iceberg query,conventionally,the categories in the query cycle,the categories in R can be further divided into the long tail are unqualified for the population constraint.However, following categories according to the population constraint: due to the small tag size,most of them may not have the Qualified categories O:They refer to the categories whose opportunity to occupy even one singleton slot when contending tag sizes are determined to be over the specified threshold with those major categories during the ensemble sampling. t.f元之t and o:≤已？then category C is They remain undetermined without being immediately wiped identified as qualified for the population constraint. out,leading to inefficiency in scanning the other categories.We Unqualified categories U:They refer to the categories rely on the following theorem to quickly wipe out the categories whose tag sizes are determined to be below the specified in the long tail. t-ni threshold t..Ifm<t andoi≤o--可，then category C Theorem 4:For any two categories Ci and Ci that ns.i< is identified as unqualified for the population constraint. ns.j satisfies for each query cycle of ensemble sampling,if Ci Undetermined categories R:The remaining categories to is determined to be unqualified for the population constraint, be verified are undetermined categories. then C;is also unqualified. Therefore,after each query cycle of ensemble sampling, Due to lack of space,we omit the proof of Theorem 4.The those ungualified categories and qualified categories can be detailed proof is given in [17]. immediately wiped out from the ensemble sampling.When According to Theorem 4,after a number of query cycles of at least one category is determined as unqualified,all of the ensemble sampling,if a category Ci is determined unqualified categories in the current group which have not been explored for the population constraint,then for any category Ci which in the singleton slots are wiped out immediately.The query has not appeared once in the singleton slots,ns.j>ns.i=0,cycles are then continuously issued over those undetermined it can be wiped out immediately as an unqualified category. categories in R until R=0.of the tag size 𝑛ˆ𝑖 is above/below the threshold 𝑡, the specified category can be respectively identified as qualified/unqualifed, as both the false positive and false negative probability are less than 𝛽; otherwise, the specified category is still undetermined. According to the weighted statistical averaging method, as the number of repeated tests increases, the averaged variance 𝜎𝑖 for each category decreases, thus the confidence interval for each category is shrinking. Therefore, after a certain number of query cycles, all categories can be determined as qualified/unqualifed for the population constraint. Tag size for each category C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 Qualified Category Unqualified Category Undetermined Category Threshold t Fig. 1. Histogram with confidence interval annotated Note that when the estimated value 𝑛ˆ𝑖 ≫ 𝑡 or 𝑛ˆ𝑖 ≪ 𝑡, the required variance in the population constraint is much larger than the specifications of the accuracy constraint. In this situation, these categories can be quickly identified as qualified/unqualified, and can be wiped out immediately from the ensemble sampling for verifying the population constraint. Thus, those undetermined categories can be further involved in the ensemble sampling with a much smaller tag size, verifying the population constraint in a faster approach. Sometimes the tag sizes of various categories are subject to some skew distributions with a “long tail”. The long tail represents those categories each of which occupies a rather small percentage among the total categories, but all together they occupy a substantial proportion of the overall tag sizes. In regard to the iceberg query, conventionally, the categories in the long tail are unqualified for the population constraint. However, due to the small tag size, most of them may not have the opportunity to occupy even one singleton slot when contending with those major categories during the ensemble sampling. They remain undetermined without being immediately wiped out, leading to inefficiency in scanning the other categories. We rely on the following theorem to quickly wipe out the categories in the long tail. Theorem 4: For any two categories 𝐶𝑖 and 𝐶𝑗 that 𝑛𝑠,𝑖 < 𝑛𝑠,𝑗 satisfies for each query cycle of ensemble sampling, if 𝐶𝑗 is determined to be unqualified for the population constraint, then 𝐶𝑖 is also unqualified. Due to lack of space, we omit the proof of Theorem 4. The detailed proof is given in [17]. According to Theorem 4, after a number of query cycles of ensemble sampling, if a category 𝐶𝑗 is determined unqualified for the population constraint, then for any category 𝐶𝑖 which has not appeared once in the singleton slots, 𝑛𝑠,𝑗 > 𝑛𝑠,𝑖 = 0, it can be wiped out immediately as an unqualified category. B. Algorithm for the Iceberg Query Problem Algorithm 2 Algorithm for Iceberg Query 1: Initialize 𝑅 to all categories, set 𝑄, 𝑈, 𝑉 to ∅. Set 𝑙 = 1. 2: while 𝑅 ∕= ∅ do 3: If 𝑙 = 1, set the initial frame size 𝑓. 4: Issue a query cycle over the tags, add those relatively major categories into the set 𝑆. Set 𝑆′ = 𝑆. 5: while 𝑆 ∕= ∅ do 6: Compute the frame size 𝑓𝑖 for each category 𝐶𝑖 ∈ 𝑆 such that the variance 𝜎𝑖 = ∣𝑡−𝑛ˆ𝑖∣ Φ−1(1−𝛽) . If 𝑓𝑖 > 𝑛ˆ𝑖 ⋅ 𝑒, then remove 𝐶𝑖 from 𝑆 to 𝑉 . If 𝑓𝑖 > 𝑓𝑚𝑎𝑥, set 𝑓𝑖 = 𝑓𝑚𝑎𝑥. Obtain the frame size 𝑓 as the mid-value among the series of 𝑓𝑖. 7: Select all tags in 𝑆, issue a query cycle with the frame size 𝑓, compute the estimated tag size 𝑛ˆ𝑖 and the averaged standard deviation 𝜎𝑖 for each category 𝐶𝑖 ∈ 𝑆. Detect the qualified category set 𝑄 and unqualified category set 𝑈. Set 𝑆 = 𝑆 − 𝑄 − 𝑈. 8: if 𝑈 ∕= ∅ then 9: Wipe out all categories unexplored in the singleton slots from 𝑆. 10: end if 11: end while 12: 𝑛ˆ = 𝑛ˆ − ∑ 𝐶𝑖∈𝑆′ 𝑛ˆ𝑖. 𝑅 = 𝑅 − 𝑆′ , 𝑙 = 𝑙 + 1. 13: end while 14: Further verify the categories in 𝑉 and 𝑄 for the accuracy constraint. We propose the algorithm for the iceberg query problem in Algorithm 2. Steps 1-4 are quite similar to steps 1-4 in Algorithm 1, due to lack of space we omit the detailed statements for these steps. Assume that the current set of categories is 𝑅, during the query cycles of ensemble sampling, the reader continuously updates the statistical value of 𝑛ˆ𝑖 as well as the standard deviation 𝜎𝑖 for each category 𝐶𝑖 ∈ 𝑅. After each query cycle, the categories in 𝑅 can be further divided into the following categories according to the population constraint: ∙ Qualified categories 𝑄: They refer to the categories whose tag sizes are determined to be over the specified threshold 𝑡. If 𝑛ˆ𝑖 ≥ 𝑡 and 𝜎𝑖 ≤ 𝑛ˆ𝑖−𝑡 Φ−1(1−𝛽) , then category 𝐶𝑖 is identified as qualified for the population constraint. ∙ Unqualified categories 𝑈: They refer to the categories whose tag sizes are determined to be below the specified threshold 𝑡. If 𝑛ˆ𝑖 < 𝑡 and 𝜎𝑖 ≤ 𝑡−𝑛ˆ𝑖 Φ−1(1−𝛽) , then category 𝐶𝑖 is identified as unqualified for the population constraint. ∙ Undetermined categories 𝑅: The remaining categories to be verified are undetermined categories. Therefore, after each query cycle of ensemble sampling, those unqualified categories and qualified categories can be immediately wiped out from the ensemble sampling. When at least one category is determined as unqualified, all of the categories in the current group which have not been explored in the singleton slots are wiped out immediately. The query cycles are then continuously issued over those undetermined categories in 𝑅 until 𝑅 = ∅

<<向上翻页向下翻页>>

点击下载：计算机科学与技术（参考文献）Efficiently Collecting Histograms Over RFID Tags