2430 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.26.NO.9.SEPTEMBER 2015 efficiency.In regard to the basic histogram collection,the number of tags for various categories.However,since the time delay is mainly impacted by two factors:1)the number proposed scheme needs to identify the tag in singleton slots of categories m,2)the category with the smallest tag size, and read 96-bit EPC from the tag,it may incur high commu- say ni,inside the group for ensemble sampling.Generally, nication overheard for ensemble sampling.We thus conduct as the number of categories m increases,the number of real experiments with the USRP N210 platform to evaluate groups and the essential number of slots for each ensemble the ratio of tags that are identified during the whole process sampling is increasing,causing the time delay to increase. of collecting histograms.We respectively test the slot ratio Besides,the category with the smallest tag size ni directly (the ratio of the number of singleton slots to total number of decides the essential frame size inside the group,the larger slots)and time ratio (the ratio of the overall time interval for the gap among the tag sizes of each category in the same the singleton slots to total time duration).In the experiment, group,the lower the time efficiency that is achieved. we use the Alien reader to interrogate 50 tags and use USRP In regard to the iceberg query and the top-k query,the N210 as a sniffer to capture the detailed information in the time delay mainly depends on the number of categories physical layer,we average the experiment results via 50 with the tag size close to the threshold t.Due to the variance repeated test.According to the real experiment results,we in tag size estimation,a relatively large number of slots are find that the average slot ratio is 33 percent,which is lower required to verify whether the specified categories have tag than 36.8 percent in ideal case when the frame size is set to sizes over the threshold t.For the top-k query,additional an optimal value.We further find that the average time ratio time delay is required to estimate the threshold t corre- is 62 percent,it implies that the singleton slots occupy a con- sponding to the top-k query. siderable proportion of the overall scanning time. In order to sufficiently reduce the identification over- 8.2 Interference Factors in Realistic Settings head in singleton slots,we can make a slight modification In realistic settings of various applications,there might exist for the C1G2 protocol as follows:each tag can embed the several interference factors which hinder the actual perfor- category ID into the RN16 response,in this way,during mance of histogram collection.These practical issues mainly the process of collecting histograms,each tag only need to include path loss,multi-path effect,and mutual interfer- reply the RN16 random number in the selected slot instead ence.In the following we elaborate on the detail techniques of the exact EPC ID,the high overhead for identification to effectively tackle these problems. can be effectively avoided.We further evaluate the average Path loss.Path loss is common in RFID-based applica- time ratio for this new method,we find that the average tions,which may lead to the probabilistic backscattering [7] time ratio can be reduced from 62 to 44 percent,which is in RFID systems,even if the tags are placed in the reader's much closer to the slot ratio effective scanning range.In such scenario,the tags may reply in each query cycle with a certain probability instead 9 PERFORMANCE EVALUATION of 100 percent.Therefore,in regard to the tag-counting pro- tocols in our solutions,we need to essentially estimate the We have conducted simulations in Matlab,and the scenario probability via statistical tests in the particular application is as follows:there exist m categories in total,and we ran- scenarios.In this way,we can accurately estimate the num- domly generate the tag size for each category according to ber of tags according to the probability obtained in advance. the normal distribution N(u.o).We set the default values Multi-path effect.Multi-path effect is especially common for the following parameters:in regard to the accuracy con- for indoor applications.Due to multi-path effect,some tags straint and the population constraint,we set 1-B=95%, cannot be effectively activated as the forwarding waves and e=0.2.The average time interval for each slot is may offset each other,even in the effective scanning range ts=1ms,and the inter-cycle overhead is te=43ms.We of RFID systems.To mitigate the multi-path effect,we can compare our solutions with two basic strategies:the basic use the mobile reader to continuously interrogate the sur- tag identification (BI)and the separate counting (SC) rounding tags such that the multi-path profile can be contin- (explained in Section 5).All results are the averaged results uously changing.In this way,the tags are expected to have of 500 independent trials. more chances to be activated for at least once during the continuous scanning [8]. 9.1 Evaluate the Actual Variance in Ensemble Mutual interference:If the tags are placed too close,they Sampling may have a critical state of mutual interference [34]such In order to verify the correctness of the derivation in the var- that neither of the tags can be effectively activated.This is iance of the SE estimator,i.e.,8;in Eq.(7),we conduct simu- mainly caused by the coupling effect when the reader's lations and evaluate the actual variances in ensemble power is adjusted to a certain value.Hence,in order to miti- sampling,thus quantifying the tightness between the gate the mutual interference among RFID tags,we should derived value of 8;and the measured value in simulation skillfully tune the transmission power of the reader so as to studies.We conduct ensemble sampling on 5,500 tags for avoid the critical state among tags.A suitable power step- 200 cycles.For each query cycle,the frame size f is set to ping method should be leveraged to sufficiently reduce the 5,500.We look into a category Ci with tag size ni=100.In mutual interference among all tags Fig.4a,we plot the estimated value of n;in each cycle,while the expected values of ni-oi and ni+oi are respectively 8.3 Overhead from Tag ldentification illustrated in the red line and the green line.We observe In our ensemble sampling-based solution,we conduct effi- that the estimated value n;majorly vibrates between the cient sampling over the singleton slots to estimate the interval (ni-oi,ni+o:).In Fig.4b,we further compare theefficiency. In regard to the basic histogram collection, the time delay is mainly impacted by two factors: 1) the number of categories m, 2) the category with the smallest tag size, say ni, inside the group for ensemble sampling. Generally, as the number of categories m increases, the number of groups and the essential number of slots for each ensemble sampling is increasing, causing the time delay to increase. Besides, the category with the smallest tag size ni directly decides the essential frame size inside the group, the larger the gap among the tag sizes of each category in the same group, the lower the time efficiency that is achieved. In regard to the iceberg query and the top-k query, the time delay mainly depends on the number of categories with the tag size close to the threshold t. Due to the variance in tag size estimation, a relatively large number of slots are required to verify whether the specified categories have tag sizes over the threshold t. For the top-k query, additional time delay is required to estimate the threshold t corresponding to the top-k query. 8.2 Interference Factors in Realistic Settings In realistic settings of various applications, there might exist several interference factors which hinder the actual performance of histogram collection. These practical issues mainly include path loss, multi-path effect, and mutual interference. In the following we elaborate on the detail techniques to effectively tackle these problems. Path loss. Path loss is common in RFID-based applications, which may lead to the probabilistic backscattering [7] in RFID systems, even if the tags are placed in the reader’s effective scanning range. In such scenario, the tags may reply in each query cycle with a certain probability instead of 100 percent. Therefore, in regard to the tag-counting protocols in our solutions, we need to essentially estimate the probability via statistical tests in the particular application scenarios. In this way, we can accurately estimate the number of tags according to the probability obtained in advance. Multi-path effect. Multi-path effect is especially common for indoor applications. Due to multi-path effect, some tags cannot be effectively activated as the forwarding waves may offset each other, even in the effective scanning range of RFID systems. To mitigate the multi-path effect, we can use the mobile reader to continuously interrogate the surrounding tags such that the multi-path profile can be continuously changing. In this way, the tags are expected to have more chances to be activated for at least once during the continuous scanning [8]. Mutual interference: If the tags are placed too close, they may have a critical state of mutual interference [34] such that neither of the tags can be effectively activated. This is mainly caused by the coupling effect when the reader’s power is adjusted to a certain value. Hence, in order to mitigate the mutual interference among RFID tags, we should skillfully tune the transmission power of the reader so as to avoid the critical state among tags. A suitable power stepping method should be leveraged to sufficiently reduce the mutual interference among all tags. 8.3 Overhead from Tag Identification In our ensemble sampling-based solution, we conduct effi- cient sampling over the singleton slots to estimate the number of tags for various categories. However, since the proposed scheme needs to identify the tag in singleton slots and read 96-bit EPC from the tag, it may incur high communication overheard for ensemble sampling. We thus conduct real experiments with the USRP N210 platform to evaluate the ratio of tags that are identified during the whole process of collecting histograms. We respectively test the slot ratio (the ratio of the number of singleton slots to total number of slots) and time ratio (the ratio of the overall time interval for the singleton slots to total time duration). In the experiment, we use the Alien reader to interrogate 50 tags and use USRP N210 as a sniffer to capture the detailed information in the physical layer, we average the experiment results via 50 repeated test. According to the real experiment results, we find that the average slot ratio is 33 percent, which is lower than 36.8 percent in ideal case when the frame size is set to an optimal value. We further find that the average time ratio is 62 percent, it implies that the singleton slots occupy a considerable proportion of the overall scanning time. In order to sufficiently reduce the identification overhead in singleton slots, we can make a slight modification for the C1G2 protocol as follows: each tag can embed the category ID into the RN16 response, in this way, during the process of collecting histograms, each tag only need to reply the RN16 random number in the selected slot instead of the exact EPC ID, the high overhead for identification can be effectively avoided. We further evaluate the average time ratio for this new method, we find that the average time ratio can be reduced from 62 to 44 percent, which is much closer to the slot ratio. 9 PERFORMANCE EVALUATION We have conducted simulations in Matlab, and the scenario is as follows: there exist m categories in total, and we randomly generate the tag size for each category according to the normal distribution Nðm; sÞ. We set the default values for the following parameters: in regard to the accuracy constraint and the population constraint, we set 1 b ¼ 95%, and ¼ 0:2. The average time interval for each slot is ts ¼ 1 ms, and the inter-cycle overhead is tc ¼ 43 ms. We compare our solutions with two basic strategies: the basic tag identification (BI) and the separate counting (SC) (explained in Section 5). All results are the averaged results of 500 independent trials. 9.1 Evaluate the Actual Variance in Ensemble Sampling In order to verify the correctness of the derivation in the variance of the SE estimator, i.e., di in Eq. (7), we conduct simulations and evaluate the actual variances in ensemble sampling, thus quantifying the tightness between the derived value of di and the measured value in simulation studies. We conduct ensemble sampling on 5,500 tags for 200 cycles. For each query cycle, the frame size f is set to 5,500. We look into a category Ci with tag size ni ¼ 100. In Fig. 4a, we plot the estimated value of ni in each cycle, while the expected values of ni si and ni þ si are respectively illustrated in the red line and the green line. We observe that the estimated value nbi majorly vibrates between the interval ðni si; ni þ siÞ. In Fig. 4b, we further compare the 2430 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 9, SEPTEMBER 2015