Energy Efficient Algorithms for the RFID Estimation Problem Tao Lif Samuel Wuts Shigang Chent Mark Yang§ Department of Computer Information Science Engineering Department of Epidemiology and Health Policy Research SDepartment of Statistics University of Florida,Gainesville,FL 32611,USA Abstract-RFID has been gaining popularity for inventory methods that estimate the number of tags with an accuracy control,object tracking,and supply chain management in ware- that can be arbitrarily set.This is called the RFID estimation houses,retail stores,hospitals,etc.Periodically and automatically problem.The follow-up work by Qian et al.[5]significantly estimating the number of RFID tags deployed in a large area has many important applications in inventory management and reduces the estimation time.It can be shown that even for theft detection.The prior work focuses on designing time-efficient the applications that require reading the actual IDs of all tags, algorithms that can estimate tens of thousands of tags in seconds. estimating the number of tags as a pre-processing step will We observe that,for a RFID reader to access tags in a large help make the main procedure of reading the IDs much more area,active tags are likely to be used.These tags are battery- efficient [3].Another advantage of estimating the number of powered and use their own energy for information transmission. However,recharging batteries for tens of thousands of tags is tags without reading the IDs is that it ensures the anonymity laborious.Unlike the prior work,this paper studies the RFID of the tags,which may be useful in privacy-sensitive scenarios estimation problem from the energy angle.Our goal is to reduce involving RFID-enhanced passports or driver's licences,where the amount of energy that is consumed by the tags during counting the number of people present is needed but revealing the estimation procedure.We design several energy-efficient their identities is not necessary. probabilistic algorithms that iteratively refine a control parameter to optimize the information carried in the transmissions from the Is time efficiency the only performance metric for the RFID tags,such that both the number and the size of the transmissions estimation problem?We argue that energy cost is also an are minimized. important issue that must be carefully dealt with.In any application that requires a RFID reader to access tags in a large I.INTRODUCTION area,it is likely that battery-powered active tags will be used. The radio-frequency identification (RFID)technology has Passive tags harvest energy from the radio signal of the reader been widely used in various commercial applications,including and use such minute amount of energy to deliver information inventory control,object tracking,and supply chain manage- back to the reader.Their typical reading range is only several ment.RFID tags (each storing a unique ID)are attached meters,which do not fit well with the big warehouse scenario. to goods at warehouses,merchandizes at retail stores,or Active tags use their own power to transmit.A longer reading equipment at hospitals,allowing an authenticated RFID reader range can be achieved by transmitting at higher power.They are to quickly access the properties of each individual item or also richer in resources for implementing advanced functions. collect the statistical information about a large group of items. Their price becomes less of a concern if they are used for This paper focuses on a RFID-enabled function that is very expensive merchandizes (such as refrigerators)or reused many useful in inventory management.Imagine a large warehouse times as goods moving in and out of the warehouse.But storing thousands of refrigerators,tens of thousands of furni- active tags also have a problem.They are powered by batteries. ture pieces,or hundreds of thousands of footwear.A national Recharging batteries for tens of thousands of tags is a laborious retail survey showed that administration error,vendor fraud operation,considering that the tagged products may stack up, and employee theft caused about 20 billion dollars lost a year making tags not easily accessible.To prolong the tags'lifetime [1].Hence,it is desirable to have a quick way of counting the and reduce the frequency of battery recharge,all functions that number of items in the warehouse or in each section of the involve large-scale transmission by many tags should be made warehouse.To timely detect theft or management errors,such energy-efficient.The prior work focuses on energy-efficient counting may be performed frequently. anti-collision protocols that minimize the energy consumption If each item is attached with a RFID tag,the counting of a mobile reader [6],[7]when the reader collects the IDs of problem can be solved by a RFID reader that receives the the tags.We believe this paper is the first to study energy- IDs transmitted (or backscattered)from the tags [2].However. efficient solutions for the RFID estimation problem (which reading the actual IDs of the tags can be time-consuming does not require reading the IDs of all tags). because so many of them have to be delivered in the same The paper has three major contributions.First,we observe low-rate channel and collisions caused by simultaneous trans- that there exists an asymmetry in energy cost.Solving the RFID missions by different tags make the matter worse.To address estimation problem incurs energy cost both at the RFID reader this problem,Kodialam and Nandagopal [3],[4]showed that and at the active tags.The asymmetry is that the energy cost the reading time can be greatly reduced through probabilistic at the tags should be minimized while the energy cost at the
Energy Efficient Algorithms for the RFID Estimation Problem Tao Li† Samuel Wu‡§ Shigang Chen† Mark Yang§ †Department of Computer & Information Science & Engineering ‡Department of Epidemiology and Health Policy Research §Department of Statistics University of Florida, Gainesville, FL 32611, USA Abstract—RFID has been gaining popularity for inventory control, object tracking, and supply chain management in warehouses, retail stores, hospitals, etc. Periodically and automatically estimating the number of RFID tags deployed in a large area has many important applications in inventory management and theft detection. The prior work focuses on designing time-efficient algorithms that can estimate tens of thousands of tags in seconds. We observe that, for a RFID reader to access tags in a large area, active tags are likely to be used. These tags are batterypowered and use their own energy for information transmission. However, recharging batteries for tens of thousands of tags is laborious. Unlike the prior work, this paper studies the RFID estimation problem from the energy angle. Our goal is to reduce the amount of energy that is consumed by the tags during the estimation procedure. We design several energy-efficient probabilistic algorithms that iteratively refine a control parameter to optimize the information carried in the transmissions from the tags, such that both the number and the size of the transmissions are minimized. I. INTRODUCTION The radio-frequency identification (RFID) technology has been widely used in various commercial applications, including inventory control, object tracking, and supply chain management. RFID tags (each storing a unique ID) are attached to goods at warehouses, merchandizes at retail stores, or equipment at hospitals, allowing an authenticated RFID reader to quickly access the properties of each individual item or collect the statistical information about a large group of items. This paper focuses on a RFID-enabled function that is very useful in inventory management. Imagine a large warehouse storing thousands of refrigerators, tens of thousands of furniture pieces, or hundreds of thousands of footwear. A national retail survey showed that administration error, vendor fraud and employee theft caused about 20 billion dollars lost a year [1]. Hence, it is desirable to have a quick way of counting the number of items in the warehouse or in each section of the warehouse. To timely detect theft or management errors, such counting may be performed frequently. If each item is attached with a RFID tag, the counting problem can be solved by a RFID reader that receives the IDs transmitted (or backscattered) from the tags [2]. However, reading the actual IDs of the tags can be time-consuming because so many of them have to be delivered in the same low-rate channel and collisions caused by simultaneous transmissions by different tags make the matter worse. To address this problem, Kodialam and Nandagopal [3], [4] showed that the reading time can be greatly reduced through probabilistic methods that estimate the number of tags with an accuracy that can be arbitrarily set. This is called the RFID estimation problem. The follow-up work by Qian et al. [5] significantly reduces the estimation time. It can be shown that even for the applications that require reading the actual IDs of all tags, estimating the number of tags as a pre-processing step will help make the main procedure of reading the IDs much more efficient [3]. Another advantage of estimating the number of tags without reading the IDs is that it ensures the anonymity of the tags, which may be useful in privacy-sensitive scenarios involving RFID-enhanced passports or driver’s licences, where counting the number of people present is needed but revealing their identities is not necessary. Is time efficiency the only performance metric for the RFID estimation problem? We argue that energy cost is also an important issue that must be carefully dealt with. In any application that requires a RFID reader to access tags in a large area, it is likely that battery-powered active tags will be used. Passive tags harvest energy from the radio signal of the reader and use such minute amount of energy to deliver information back to the reader. Their typical reading range is only several meters, which do not fit well with the big warehouse scenario. Active tags use their own power to transmit. A longer reading range can be achieved by transmitting at higher power. They are also richer in resources for implementing advanced functions. Their price becomes less of a concern if they are used for expensive merchandizes (such as refrigerators) or reused many times as goods moving in and out of the warehouse. But active tags also have a problem. They are powered by batteries. Recharging batteries for tens of thousands of tags is a laborious operation, considering that the tagged products may stack up, making tags not easily accessible. To prolong the tags’ lifetime and reduce the frequency of battery recharge, all functions that involve large-scale transmission by many tags should be made energy-efficient. The prior work focuses on energy-efficient anti-collision protocols that minimize the energy consumption of a mobile reader [6], [7] when the reader collects the IDs of the tags. We believe this paper is the first to study energyefficient solutions for the RFID estimation problem (which does not require reading the IDs of all tags). The paper has three major contributions. First, we observe that there exists an asymmetry in energy cost. Solving the RFID estimation problem incurs energy cost both at the RFID reader and at the active tags. The asymmetry is that the energy cost at the tags should be minimized while the energy cost at the
reader is relatively a less concern because the reader's battery respond in those slots.Using the probabilistic counting meth- can be easily replaced or it may be powered by an external ods,the reader estimates the number of tags based on the source.To exploit this asymmetry,our new algorithms follows number of empty slots or the number of collision slots.Their a common framework that trades more energy cost at the reader best estimator is called the Unified Probabilistic Estimator for less cost at the tags.The reader will continuously refine and (UPE).A follow-up work by the same authors proposes the broadcast a control parameter called contention probability, Enhanced Zero-Based Estimator (EZB)[4],which makes its which optimizes the amount of information the reader can estimation based on the number of empty slots.The Lottery- extract from the transmissions by the tags.This in turn reduces Frame scheme (LoF)[5]by Qian et al.employs a geometric the number of transmissions by the tags that are necessary to distribution-based scheme to determine which slot in a time achieve a certain estimation accuracy. frame each tag will respond.It significantly reduces the esti- Second,we investigate statistical methods,including the mation time.However,every tag must respond during each of maximum likelihood estimation (MLE)and the average sum the time frames(their number ranges from tens to thousands), estimation (ASE)that are different from the probabilistic resulting in large energy cost when active tags use their own counting methods [8]used by [3].[4].Our estimation al- power to transmit.Also related is a novel security protocol gorithms optimize their performance by iteratively applying proposed by Tan et al.to monitor the missing RFID tags in MLE or ASE with continuously refined parameters.The new the presence of dishonest RFID readers [15]. algorithms not only require fewer transmissions by the tags None of the above estimators are designed with energy but also minimize the size of each transmission.The number conservation in mind.In the following,we will present our of transmissions made by the tags in our best algorithm is less energy efficient estimators. than one third of the number in the state-of-the-art algorithms. III.PROBLEM DEFINITION AND SYSTEM MODEL In terms of the total number of bits transmitted by the tags,it is more than an order of magnitude smaller. A.RFID Estimation Problem Third,we formally analyze the confidence intervals of the The problem is to design efficient algorithms to estimate the estimations made by the new algorithms and establish the number of RFID tags in a deployment area without actually termination conditions for any given accuracy requirement.We reading the ID of each tag.Let N be the actual number of tags perform extensive simulations to demonstrate that the measured and N be the estimate.The estimation accuracy is specified results match well with the analytical results and that the new by a confidence interval with two parameters:a probability algorithms perform far better in terms of energy saving than value a and an error bound B,both in the range of(0,1).The the best existing algorithms. requirement is that the probability forto fall in the interval The rest of the paper is organized as follows.Section 2 [1-B,1+B]is at least a,i.e., discusses the related work.Section 3 defines the problem to be solved and the system model.Sections 4-6 propose three new Prob(1-B)N≤N≤(1+3)N)≥a. solutions for the RFID estimation problem.Section 7 evaluates Our goal is to achieve the above estimation accuracy with the new algorithms through simulations.Section 8 draws the minimum energy overhead to the active tags. conclusion. B.System Model II.RELATED WORK The type of active RFID systems considered in this paper is Most existing work focuses on how to efficiently read the applicable to a large deployment area that are hundreds of feet IDs of the RFID tags.When queried by the RFID reader,the or more across.Passive tags are beyond the scope of this paper. tags will respond with their IDs.Since the communication If they were used,one would have to take the reader and move environment is wireless,inevitably collisions will happen when around the whole area,collecting tag information once every multiple tags choose the same slot.Collision arbitration pro- few feet.Active tags allow a RFID reader to collect information tocols mainly fall into two categories:framed ALOHA-based from one location.The tagged goods (such as apparel)may approach [9],[10],[11],[7]and tree-based approach [12],[13]. stack in piles,and there may be obstacles,such as racks filled [14],[6].In the former approach,the polling request carries with merchandize,between a tag and the reader.We expect the the frame length,and each tag individually chooses a slot in active tags are designed to transmit with significant power that the frame to transmit its ID.The process repeats until all the is high enough to ensure reliable information delivery in such tags successfully transmit their IDs to the reader.In the later a demanding environment.Hence,the energy cost due to the approach,the reader first sends out an ID prefix string.The tags'transmissions is the main concern in our algorithm design; tags whose IDs match the string will respond.If a collision it increases at least in the square of the maximum distance to be happens,the reader will append a'0'or'1'to the prefix string covered by the RFID system.Energy consumption that powers and send out the new string.This process repeats until only the tag's circuit for computing and receiving information is not one tag responds.Essentially the approach traverses a binary affected by long distance and obstacles.The energy consumed tree with the IDs of the tags being the leaf nodes. by the RFID reader is of less concern.We assume the reader Instead of identifying individual RFID tags,Kodialam and transmits at sufficiently high power. Nandagopal [3]study the problem of estimating the cardinality We use the following communication protocol between the of a tag set.In their approach,information from tags are reader and the tags.The reader first synchronizes the clocks collected by a RFID reader in a series of time frames.Each of the tags and then performs a sequence of pollings.In each frame consists of a number of slots,and tags probabilistically polling,the reader sends out a request,which is followed by a
reader is relatively a less concern because the reader’s battery can be easily replaced or it may be powered by an external source. To exploit this asymmetry, our new algorithms follows a common framework that trades more energy cost at the reader for less cost at the tags. The reader will continuously refine and broadcast a control parameter called contention probability, which optimizes the amount of information the reader can extract from the transmissions by the tags. This in turn reduces the number of transmissions by the tags that are necessary to achieve a certain estimation accuracy. Second, we investigate statistical methods, including the maximum likelihood estimation (MLE) and the average sum estimation (ASE) that are different from the probabilistic counting methods [8] used by [3], [4]. Our estimation algorithms optimize their performance by iteratively applying MLE or ASE with continuously refined parameters. The new algorithms not only require fewer transmissions by the tags but also minimize the size of each transmission. The number of transmissions made by the tags in our best algorithm is less than one third of the number in the state-of-the-art algorithms. In terms of the total number of bits transmitted by the tags, it is more than an order of magnitude smaller. Third, we formally analyze the confidence intervals of the estimations made by the new algorithms and establish the termination conditions for any given accuracy requirement. We perform extensive simulations to demonstrate that the measured results match well with the analytical results and that the new algorithms perform far better in terms of energy saving than the best existing algorithms. The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 defines the problem to be solved and the system model. Sections 4-6 propose three new solutions for the RFID estimation problem. Section 7 evaluates the new algorithms through simulations. Section 8 draws the conclusion. II. RELATED WORK Most existing work focuses on how to efficiently read the IDs of the RFID tags. When queried by the RFID reader, the tags will respond with their IDs. Since the communication environment is wireless, inevitably collisions will happen when multiple tags choose the same slot. Collision arbitration protocols mainly fall into two categories: framed ALOHA-based approach [9], [10], [11], [7] and tree-based approach [12], [13], [14], [6]. In the former approach, the polling request carries the frame length, and each tag individually chooses a slot in the frame to transmit its ID. The process repeats until all the tags successfully transmit their IDs to the reader. In the later approach, the reader first sends out an ID prefix string. The tags whose IDs match the string will respond. If a collision happens, the reader will append a ‘0’ or ‘1’ to the prefix string and send out the new string. This process repeats until only one tag responds. Essentially the approach traverses a binary tree with the IDs of the tags being the leaf nodes. Instead of identifying individual RFID tags, Kodialam and Nandagopal [3] study the problem of estimating the cardinality of a tag set. In their approach, information from tags are collected by a RFID reader in a series of time frames. Each frame consists of a number of slots, and tags probabilistically respond in those slots. Using the probabilistic counting methods, the reader estimates the number of tags based on the number of empty slots or the number of collision slots. Their best estimator is called the Unified Probabilistic Estimator (UPE). A follow-up work by the same authors proposes the Enhanced Zero-Based Estimator (EZB) [4], which makes its estimation based on the number of empty slots. The LotteryFrame scheme (LoF) [5] by Qian et al. employs a geometric distribution-based scheme to determine which slot in a time frame each tag will respond. It significantly reduces the estimation time. However, every tag must respond during each of the time frames (their number ranges from tens to thousands), resulting in large energy cost when active tags use their own power to transmit. Also related is a novel security protocol proposed by Tan et al. to monitor the missing RFID tags in the presence of dishonest RFID readers [15]. None of the above estimators are designed with energy conservation in mind. In the following, we will present our energy efficient estimators. III. PROBLEM DEFINITION AND SYSTEM MODEL A. RFID Estimation Problem The problem is to design efficient algorithms to estimate the number of RFID tags in a deployment area without actually reading the ID of each tag. Let N be the actual number of tags and Nˆ be the estimate. The estimation accuracy is specified by a confidence interval with two parameters: a probability value α and an error bound β, both in the range of (0, 1). The requirement is that the probability for N Nˆ to fall in the interval [1 − β, 1 + β] is at least α, i.e., P rob((1 − β)Nˆ ≤ N ≤ (1 + β)Nˆ) ≥ α. Our goal is to achieve the above estimation accuracy with minimum energy overhead to the active tags. B. System Model The type of active RFID systems considered in this paper is applicable to a large deployment area that are hundreds of feet or more across. Passive tags are beyond the scope of this paper. If they were used, one would have to take the reader and move around the whole area, collecting tag information once every few feet. Active tags allow a RFID reader to collect information from one location. The tagged goods (such as apparel) may stack in piles, and there may be obstacles, such as racks filled with merchandize, between a tag and the reader. We expect the active tags are designed to transmit with significant power that is high enough to ensure reliable information delivery in such a demanding environment. Hence, the energy cost due to the tags’ transmissions is the main concern in our algorithm design; it increases at least in the square of the maximum distance to be covered by the RFID system. Energy consumption that powers the tag’s circuit for computing and receiving information is not affected by long distance and obstacles. The energy consumed by the RFID reader is of less concern. We assume the reader transmits at sufficiently high power. We use the following communication protocol between the reader and the tags. The reader first synchronizes the clocks of the tags and then performs a sequence of pollings. In each polling, the reader sends out a request, which is followed by a
slotted time frame during which the tags respond.The polling A.Overview request from the reader carries a contention probability 01,forms the response vector. For example,a company's inventory of certain goods may be At the ith polling,each tag has a probability of pi to transmit routinely in thousands and never be reduced below a certain and,if any tag transmits,z;will be one.Hence, number before,or the company has a policy on the minimum inventory,or the RFID estimation becomes unnecessary when Prob{zi=1}=1-(1-pi)N1-e-NpL, (1) the number of tags is below a threshold.In these cases,we will have a lower bound Nmin,which can be much smaller than N. where N is the the actual number of tags. At the beginning of a polling,each tag makes a probabilistic If the contention probabilities of the pollings are picked decision:It goes to sleep for the whole polling period with too small,the response vector will contain mostly zeros.If probability 1-N and wakes up until the next period starts. the contention probabilities are picked too large,the response or it stays awaketo receive the polling request with probability vector will contain mostly ones.Both cases do not provide 1and then decides to respond with probability px Nmin. sufficient statistical information for accurate estimation.As Nmin For example,if N =10,000 and Nmin =1,000,then only will be discussed shortly,our analysis shows that the optimal 10 tags stay awake in each polling.In Section IV-D,another contention probability is pi=1.594/N.The problem is that energy-reduction method,called request-less pollings,will be we do not know N(which is the quantity we want to estimate). proposed to eliminate most polling requests. In order to determine pi,MLEA consists of an initialization Optionally,the reader's request may include a predicate phase and an iterative phase.The former quickly produces and only tags that satisfy the predicate will participate in the a coarse estimation of N.The latter refines the contention polling.For example,suppose all tags deployed in one section probability and generates the estimation result of a warehouse carry the 96-bit GEN2 IDs that begin with "000"in the Serial Number field.In order to estimate the B.Initialization Phase number of tags in this section,the request carries a predicate We want to pick a small value for the initial contention testing whether the first three bits of a tag's Serial Number is probability.The expected number of responding tags at the “000*」 first polling is Np1.If p is picked too large,a lot of A slot is said to be empty if no tag responds (transmits) tags will respond,which is wasteful because one response in the slot.It is called a singleton slot if exactly one tag or many responses produce the same information-a non- responds.It is a collision slot if more than one tag responds.A empty slot.Suppose we know an upper bound Nmax of N. singleton or collision slot is also called a non-empty slot.The This information is often available in practice.For example, Philips I-Code system [16]requires a slot length of 10 bits we know Nmaz is 10,000 if the warehouse is designed to in order to distinguish singleton slots from collision slots.On hold no more than 10,000 microwaves (each tagged with a the contrary.one bit is enough if we only need to distinguish RFID),or the company's inventory policy requires that in-store empty slots from non-empty slots-'0'means empty and'1' microwaves should not exceed 10,000,or the warehouse only means non-empty.Hence,the response will be much shorter has 10,000 RFID tags in use.Nmar can be much bigger than (or consume much less energy)if an algorithm only needs to N.We pick p=such that the expected number of know empty/non-empty slots,instead of all three types of slots responding tags is no more than one.If=0.we multiply the as required by [3]. contention probability by a constant C(>1),i.e.,p2=p1x C In order to prolong the lifetime of the tags,there are for the second polling.We continue multiplying the contention two ways to reduce their energy consumption:reducing the probability by C after each polling until a non-empty slot is size of each response and reducing the number of responses. observed.When that happens (say,at the Ith polling),we have We will design algorithms that require only the knowledge a coarse estimation of N to be 1/p.Then we move to the of empty/non-empty slots and employ statistical methods to next phase.When C is relatively large,the initialization phase minimize the transmissions needed from the tags. only takes a few pollings to complete due to the exponential increase of the contention probability. IV.MAXIMUM LIKELIHOOD ESTIMATION ALGORITHM Our first estimator for the number of RFID tags is called C.Iterative Phase the maximum likelihood estimation algorithm (MLEA).It fully This phase iteratively refines the estimation result after each utilizes the information from all pollings in order to minimize polling,and terminates when the specified accuracy require- the number of tag responses it needs to meet the accuracy ment is met.Let N;be the estimated number of tags after the requirement ith polling.The reader performs three tasks.First,it sets the
slotted time frame during which the tags respond. The polling request from the reader carries a contention probability 0 1), i.e., p2 = p1 ×C for the second polling. We continue multiplying the contention probability by C after each polling until a non-empty slot is observed. When that happens (say, at the lth polling), we have a coarse estimation of N to be 1/pl . Then we move to the next phase. When C is relatively large, the initialization phase only takes a few pollings to complete due to the exponential increase of the contention probability. C. Iterative Phase This phase iteratively refines the estimation result after each polling, and terminates when the specified accuracy requirement is met. Let Nˆ i be the estimated number of tags after the ith polling. The reader performs three tasks. First, it sets the
contention probability as follows before sending out the polling 3)Termination Condition:We show in Appendix A that, request: when i is large,N;approximately follows the Gaussian distri- Pi= (2) bution: N-1 Norm( (1-(1-p)) i(1-p)Nln2(1-p) where N-1 is the estimate after the previous polling and w is a constant,which will be determined shortly.Second,based According to (6),the variance of N can be written as p on the received z;and the history information,the reader finds Hence,the confidence interval of N is the new estimate of N that maximizes the following likelihood function: eNiPI-1 N:±Za·1 ip明 (9) L:=Π1-p)1-1-(1-p)), (3) where Z is the a percentile for the standard Gaussian distribu- =1 tion.For example,when a=95%,Za =1.96.Because N is where (1-p)N(1-)(1-(1-pj)N)=is the probability that undetermined,we use N;as an approximation when computing the standard deviation in (9) the state of the slot in the jth polling is zj.Namely,we want The termination condition for EMLA is therefore to find Ni arg max{Li} eNiP -1 (4) N ≤N·B, (10) 评 Third,after computing Ni,the reader has to determine if the where B is the error bound.The above inequality can be confidence interval of the estimation meets the requirement.In rewritten as the following,we show how the above tasks can be achieved. 1)Determine the Contention Probability:Using the i≥ ZaVeNIp:-1 (11) 6-method [17],we derive the variance of Ni in Appendix NipiB A. Because p:=1.594/N,we have Var(N)≈ 1-(1-p)N (5) i(1-p)N1n2(1-p) 1.544.Z8 i≥ (12) 32 When N is large and pi is small,we can approximate(1-p:)N as e-Npe and In(1-pi)as pi.The above variance becomes Hence we can theoretically compute the approximate number of pollings that is required in order to meet the accuracy Var()≈e-l requirements.For example,if a =95%and B =5%,2372 (6) ip pollings will be required.Note that (12)is independent with the actual number of tags,N.Hence,our approach has perfect We pick the value of pi that minimizes Var(Ni).Differen- scalability. tiating the right side of (6)and letting it be zero,we have Fig.I shows the simulation result of MLEA when N pi=1.594/N.Because MLE approach provides statistically 10.000.a =95%and 3=5%.The simulation setup can be consistent estimate,when i is large,N can be closely approx- found in Section VII.The middle curve is the estimated number imated by Ni-1.Hence,we set the value of pi as follows: of tags,Ni,with respect to the number pollings.It converges 1.594 to the true value N represented by the central straight line.The P1= (7) upper and lower curves represent the 95%confidence interval, N-1 which shrinks as the number of pollings increases. 2)Compute the value of Ni:We compute the new estimate D.Request-less Pollings of N that maximizes (3).Since the maxima is not affected by We observe that,after a number of pollings,the value of pi monotone transformations,we use logarithm to turn the right will stay in an increasingly small range and does not change side of the equation from product to summation: much.It becomes unnecessary for the reader to transmit it at each polling.Hence,we improve MLEA as follows:If In(Li 1-z)ln(1-p)+n(1-(1-p) the percentage change in pi during a certain number M of consecutive pollings is within a small threshold,the reader will broadcast the latest value of pi and indicate that it will no To find the maxima,we differentiate both sides: longer transmit polling requests (unless the value of p:changes beyond the threshold,in which case the reader will broadcast aln(Li) )血(1-)--p)1- the new value of pi one more time).Without receiving further 1-(1-p5)N polling requests,the tags will respond with the same contention (8) probability in the subsequent slots.This is called the request- We then set the right side to zero to solve for estimate Ni. less pollings.Note that the termination condition in (10) Note that the derivative is a monotone function of N.we can remains correct.With the threshold being 10%and M=10, numerically obtain Ni through the bisection search. the simulation results (which are omitted to save space)show
contention probability as follows before sending out the polling request: pi = ω Nˆ i−1 , (2) where Nˆ i−1 is the estimate after the previous polling and ω is a constant, which will be determined shortly. Second, based on the received zi and the history information, the reader finds the new estimate of N that maximizes the following likelihood function: Li = Y i j=1 (1 − pj ) N(1−zj ) (1 − (1 − pj ) N ) zj , (3) where (1−pj ) N(1−zj ) (1−(1−pj ) N ) zj is the probability that the state of the slot in the jth polling is zj . Namely, we want to find Nˆ i = arg max{Li} N . (4) Third, after computing Nˆ i , the reader has to determine if the confidence interval of the estimation meets the requirement. In the following, we show how the above tasks can be achieved. 1) Determine the Contention Probability: Using the δ−method [17], we derive the variance of Nˆ i in Appendix A. V ar(Nˆ i) ≈ 1 − (1 − pi) N i(1 − pi)N ln2 (1 − pi) . (5) When N is large and pi is small, we can approximate (1−pi) N as e −Npi and ln(1 − pi) as pi . The above variance becomes V ar(Nˆ i) ≈ e Npi − 1 ip2 i . (6) We pick the value of pi that minimizes V ar(Nˆ i). Differentiating the right side of (6) and letting it be zero, we have pi = 1.594/N. Because MLE approach provides statistically consistent estimate, when i is large, N can be closely approximated by Nˆ i−1. Hence, we set the value of pi as follows: pi = 1.594 Nˆ i−1 . (7) 2) Compute the value of Nˆ i: We compute the new estimate of N that maximizes (3). Since the maxima is not affected by monotone transformations, we use logarithm to turn the right side of the equation from product to summation: ln(Li) = X i j=1 · N(1 − zj ) ln(1 − pj ) + zj ln(1 − (1 − pj ) N ) ¸ . To find the maxima, we differentiate both sides: ∂ ln(Li) ∂N = Xi j=1 · (1 − zj ) ln(1 − pj ) − zj (1 − pj ) N ln(1 − pj ) 1 − (1 − pj )N ¸ . (8) We then set the right side to zero to solve for estimate Nˆ i . Note that the derivative is a monotone function of N, we can numerically obtain Nˆ i through the bisection search. 3) Termination Condition: We show in Appendix A that, when i is large, Nˆ i approximately follows the Gaussian distribution: Normµ N, (1 − (1 − pi) N ) i(1 − pi)N ln2 (1 − pi) ¶ . According to (6), the variance of Nˆ i can be written as e Npi−1 ip2 i . Hence, the confidence interval of N is Nˆ i ± Zα · s eNˆipi − 1 ip2 i , (9) where Zα is the α percentile for the standard Gaussian distribution. For example, when α = 95%, Zα = 1.96. Because N is undetermined, we use Nˆ i as an approximation when computing the standard deviation in (9). The termination condition for EMLA is therefore Zα · s eNˆipi − 1 ip2 i ≤ Nˆ i · β, (10) where β is the error bound. The above inequality can be rewritten as √ i ≥ Zα p eNˆipi − 1 Nˆ ipiβ . (11) Because pi = 1.594/Nˆ i , we have i ≥ 1.544 · Z 2 α β 2 . (12) Hence we can theoretically compute the approximate number of pollings that is required in order to meet the accuracy requirements. For example, if α = 95% and β = 5%, 2372 pollings will be required. Note that (12) is independent with the actual number of tags, N. Hence, our approach has perfect scalability. Fig. 1 shows the simulation result of MLEA when N = 10, 000, α = 95% and β = 5%. The simulation setup can be found in Section VII. The middle curve is the estimated number of tags, Nˆ i , with respect to the number pollings. It converges to the true value N represented by the central straight line. The upper and lower curves represent the 95% confidence interval, which shrinks as the number of pollings increases. D. Request-less Pollings We observe that, after a number of pollings, the value of pi will stay in an increasingly small range and does not change much. It becomes unnecessary for the reader to transmit it at each polling. Hence, we improve MLEA as follows: If the percentage change in pi during a certain number M of consecutive pollings is within a small threshold, the reader will broadcast the latest value of pi and indicate that it will no longer transmit polling requests (unless the value of pi changes beyond the threshold, in which case the reader will broadcast the new value of pi one more time). Without receiving further polling requests, the tags will respond with the same contention probability in the subsequent slots. This is called the requestless pollings. Note that the termination condition in (10) remains correct. With the threshold being 10% and M = 10, the simulation results (which are omitted to save space) show
20000 ASEA also consists of an initialization phase and an iterative phase.The initialization phase of ASEA is the same as the 15000 initialization phase of MLEA,except that when the reader 10000 obtains the first non-zero result z at the lth polling with probability pi.it computes a coarse estimation of N as 000 Then it moves to the next phase,which is described below. B.Iterative Phase 500100015002000 number of pollings This phase iteratively refines the estimation after each polling,and terminates when the specified accuracy require- Fig.1.The middle curve shows the estimated number of tags with respect ment is met.The reader performs four tasks in the ith iteration. to the number of pollings.The upper and lower curves show the confidence interval.The straightline shows the true number of tags First,it computes the contention probability pi by (13)before sending out the polling request. 1 that the performance difference caused by request-less pollings =- (13) is negligibly small even though the contention probability during request-less pollings may be slightly off the value set by (7).Request-less pollings can also be applied to the algorithms where Nis the estimation after the previous polling. in the next two sections. Second,the reader computes the number of responses zi in the current frame. E.Information Loss due to Collision Third,it computes Ni as follows: MLEA has a frame size of one slot.It obtains only binary information at each polling.Since the target contention prob- N= ∑=1+1z(1+ (14) ability is set to be 1.594/N,the probability for more than one ∑=+1 tag to respond in each polling is significant.However,no matter where s is introduced to compensate for collision and the how many tags respond,the information that the reader receives is always the same,i.e.,zi=1,which implies information iterative phase begins from the(1+1)th polling.The intuition loss when two or more tags decide to transmit at a polling behind (14)is given as follows:All pollings are performed Let's compare two scenarios.In one scenario,only one tag independent of each other.Hence,the previous pollings with responds at a polling.In the other,two tags respond.These the contention probability pi(l4}is only about 0.0037.Next we compute the polling are likely to respond at different slots in the frame.We probability for collisions to happen at the ith polling,which is pick values for pi and f such that the collision probability is denoted as Probifcollision}. negligibly small.Third,we compensate the impact of collision 1We could also use (7).In that case,the frame size f would increase in our computation. accordingly to keep a small collision probability
0 5000 10000 15000 20000 0 500 1000 1500 2000 number of pollings Fig. 1. The middle curve shows the estimated number of tags with respect to the number of pollings. The upper and lower curves show the confidence interval. The straightline shows the true number of tags. that the performance difference caused by request-less pollings is negligibly small even though the contention probability during request-less pollings may be slightly off the value set by (7). Request-less pollings can also be applied to the algorithms in the next two sections. E. Information Loss due to Collision MLEA has a frame size of one slot. It obtains only binary information at each polling. Since the target contention probability is set to be 1.594/N, the probability for more than one tag to respond in each polling is significant. However, no matter how many tags respond, the information that the reader receives is always the same, i.e., zi = 1, which implies information loss when two or more tags decide to transmit at a polling. Let’s compare two scenarios. In one scenario, only one tag responds at a polling. In the other, two tags respond. These two scenarios generate the same information but the energy cost of the second scenario is twice of the first. To address this problem, we design two more algorithms that reduce the probability of collision and, moreover, compensate the impact of collision in its computation. V. AVERAGE SUM ESTIMATION ALGORITHM The average sum estimation algorithm (ASEA) is our second estimator for the number of RFID tags. It also utilizes history information from previous pollings. However, instead of only obtaining binary information, it computes the number of responses in each polling. Because more information can be extracted, it requires a fewer number of responses than MLEA. Moreover, the computation performed by ASEA after each polling is much simpler than MLEA’s complicated bisection search. A. Overview ASEA uses the same polling protocol as MLEA does, except that its frame size f is larger than one in order to reduce the probability of collision. The result of the ith polling, xi , is no longer a binary value. Instead, it is an estimate of the number of tags that respond at the polling. ASEA takes three steps to solve the collision. First, it slightly reduces the contention probability pi . Second, it increases the frame size f such that the tags that decide to respond at a polling are likely to respond at different slots in the frame. We pick values for pi and f such that the collision probability is negligibly small. Third, we compensate the impact of collision in our computation. ASEA also consists of an initialization phase and an iterative phase. The initialization phase of ASEA is the same as the initialization phase of MLEA, except that when the reader obtains the first non-zero result xl at the lth polling with probability pl , it computes a coarse estimation of N as xl pl . Then it moves to the next phase, which is described below. B. Iterative Phase This phase iteratively refines the estimation after each polling, and terminates when the specified accuracy requirement is met. The reader performs four tasks in the ith iteration. First, it computes the contention probability pi by (13) before sending out the polling request. pi = 1 Nˆ i−1 , (13) where Nˆ i−1 is the estimation after the previous polling. 1 Second, the reader computes the number of responses xi in the current frame. Third, it computes Nˆ i as follows: Nˆ i = Pi j=l+1 xj (1 + ε) Pi j=l+1 pj , (14) where ε is introduced to compensate for collision and the iterative phase begins from the (l + 1)th polling. The intuition behind (14) is given as follows: All pollings are performed independent of each other. Hence, the previous pollings with the contention probability pj (l 4} is only about 0.0037. Next we compute the probability for collisions to happen at the ith polling, which is denoted as P robi{collision}. 1We could also use (7). In that case, the frame size f would increase accordingly to keep a small collision probability
20000 1500 10000 5000 0.1 0 0 10 20 0200400600800100012001400 frame size number of pollings Fig.2.The collision probability with respect to the frame size f. Fig.3.The middle curve shows the estimated number of tags with respect to the number of pollings.The upper and lower curves show the confidence interval.The straightline shows the true number of tags where B is the error bound. Probifcollisionlri=k}x Probfi=k} Fig.3 shows the simulation result of ASEA when N k=2 10,000,a =95%and B =5%.The middle curve is the (1-P()x 1×Prob{x=k} value of Ni,which converges to the value of N represented by k=2 the central straight line.The upper and lower curves represent k=f+ the 95%confidence interval,which shrinks as the number f! where P(f,)is the permutation function.Fig.2 of pollings increases.The algorithm terminates after 1517 shows the collision probability Probfcollision}with respect pollings. to f.It diminishes quickly as f increases.When f =10(which VI.ENHANCED MAXIMUM LIKELIHOOD ESTIMATION is what we will choose in our simulations),Probifcollision ALGORITHM is just 0.046.With such a small probability,the chance for more than two tags involved in a collision or more than one collision This section presents our third estimator,the enhanced at one polling is exceedingly small and thus ignored.Therefore, maximum likelihood estimation algorithm(EMLEA).EMLEA to approximate,we multiply zi by 1.046 to compensate the computes the number of responses ri at each polling in the impact of collision.Namely,=0.046 in (14). same way as ASEA does.But it uses the maximum likelihood 2)Termination Condition:Consider an arbitrary polling method instead of the average sum method to estimate the result,j,j<i.By our previous analysis,conditioned on number of tags.It is able to achieve the best energy efficiency pj,we know that has a binomial distribution Bino(N,pj) among the three. and that(1+)j approximates Hence,when N is large EMLEA uses the same polling protocol as ASEA does.It enough,we can use the Gaussian distribution Norm(uj,oj) sets the same contention probability as ASEA does.It uses the with parameters j=NPj and oj=√/Npi(1-pi)asa same formula to estimate the number of tags that respond close approximation.Namely, in the jth polling.Namely z(1+s)zj,where xi is the number of non-empty slots in the frame and s=0.046 is used (1+E)zj~Norm(Npj,Npj(1-pj)). (16) to compensate the impact of collision. Consequently.(1+)approximately follows Gaus- EMLEA differs from ASEA by using the maximum likeli- hood method instead of the average sum method to estimate sian distribution the number of tags.Its termination condition is also different. Norm( N∑Pv∑,-》 2 1)Compute the value of Ni:Suppose the iterative phase (17) starts at the (+1)th polling.After the ith polling,the j=l+1 j=1+1 reader has collected the values of xj,I<j i.From Hence,the distribution of N=(1+)/ (16),we know that the number of responding tags in the can be approximated by jth polling approximately follows a Gaussian distribution, Norm(x,N∑i=441-》 Norm(Npj,Np;(1 -pj)).Hence,the probability for the measured number of responding tags,(1+s)xj,to occur under (∑=1+1PP this distribution is ep.The 1 2NP1-P%) The confidence interval is likelihood function for all measured numbers of responding tags in the pollings,(1+)xj,I<j<i,to occur is N:±Za :∑=+1p1-pm》 (18) (∑-+1PP Li +1lV2πNp(1-p e (20) where Zo is the a percentile for the standard Gaussian distribution.Hence,the termination condition is We want to find the value Ni that maximizes the likelihood function.Namely, Z :∑=+1(-p》 ≤N·B, (19 (∑=+1PP Ni arg max{Li} (21)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 5 10 15 20 collision probability frame size Fig. 2. The collision probability with respect to the frame size f. P robi{collision} = XN k=2 P robi{collision|x ∗ i = k} × P rob{x ∗ i = k} = X f k=2 (1 − P(f, k) f k ) × P rob{x ∗ i = k} + XN k=f+1 1 × P rob{x ∗ i = k} where P(f, k) = f! (f−k)! is the permutation function. Fig. 2 shows the collision probability P robi{collision} with respect to f. It diminishes quickly as f increases. When f = 10 (which is what we will choose in our simulations), P robi{collision} is just 0.046. With such a small probability, the chance for more than two tags involved in a collision or more than one collision at one polling is exceedingly small and thus ignored. Therefore, to approximate x ∗ i , we multiply xi by 1.046 to compensate the impact of collision. Namely, ε = 0.046 in (14). 2) Termination Condition: Consider an arbitrary polling result, xj , j ≤ i. By our previous analysis, conditioned on pj , we know that x ∗ j has a binomial distribution Bino(N, pj ) and that (1 + ε)xj approximates x ∗ j . Hence, when N is large enough, we can use the Gaussian distribution Norm(µj , σj ) with parameters µj = N pj and σj = p N pj (1 − pj ) as a close approximation. Namely, x ∗ j ≈ (1 + ε)xj ∼ Norm(N pj , N pj (1 − pj )). (16) Consequently, Pi j=l+1 xj (1 +ε) approximately follows Gaussian distribution Normµ N X i j=l+1 pj , N X i j=l+1 (pj (1 − pj ))¶ . (17) Hence, the distribution of Nˆ i = Pi j=l+1 xj (1 + ε)/ Pi j=l+1 pj can be approximated by Normµ N, N Pi j=l+1(pj (1 − pj )) ( Pi j=l+1 pj ) 2 ¶ . The confidence interval is Nˆi ± Zα · vuut Nˆi Pi j=l+1(pj (1 − pj )) ( Pi j=l+1 pj ) 2 , (18) where Zα is the α percentile for the standard Gaussian distribution. Hence, the termination condition is Zα · vuut Nˆi Pi j=l+1(pj (1 − pj )) ( Pi j=l+1 pj ) 2 ≤ Nˆi · β, (19) 0 5000 10000 15000 20000 0 200 400 600 800 1000 1200 1400 number of pollings Fig. 3. The middle curve shows the estimated number of tags with respect to the number of pollings. The upper and lower curves show the confidence interval. The straightline shows the true number of tags. where β is the error bound. Fig. 3 shows the simulation result of ASEA when N = 10, 000, α = 95% and β = 5%. The middle curve is the value of Nˆ i , which converges to the value of N represented by the central straight line. The upper and lower curves represent the 95% confidence interval, which shrinks as the number of pollings increases. The algorithm terminates after 1517 pollings. VI. ENHANCED MAXIMUM LIKELIHOOD ESTIMATION ALGORITHM This section presents our third estimator, the enhanced maximum likelihood estimation algorithm (EMLEA). EMLEA computes the number of responses xi at each polling in the same way as ASEA does. But it uses the maximum likelihood method instead of the average sum method to estimate the number of tags. It is able to achieve the best energy efficiency among the three. EMLEA uses the same polling protocol as ASEA does. It sets the same contention probability as ASEA does. It uses the same formula to estimate the number x ∗ j of tags that respond in the jth polling. Namely x ∗ j ≈ (1 + ε)xj , where xj is the number of non-empty slots in the frame and ε = 0.046 is used to compensate the impact of collision. EMLEA differs from ASEA by using the maximum likelihood method instead of the average sum method to estimate the number of tags. Its termination condition is also different. 1) Compute the value of Nˆ i: Suppose the iterative phase starts at the (l + 1)th polling. After the ith polling, the reader has collected the values of xj , l < j ≤ i. From (16), we know that the number of responding tags in the jth polling approximately follows a Gaussian distribution, Norm(N pj , N pj (1 − pj )). Hence, the probability for the measured number of responding tags, (1+ε)xj , to occur under this distribution is √ 1 2πNpj (1−pj ) · exp[− ((1+ε)xj−Npj ) 2 2Npj (1−pj ) ]. The likelihood function for all measured numbers of responding tags in the pollings, (1 + ε)xj , l < j ≤ i, to occur is Li = Y i j=l+1 · 1 p 2πN pj (1 − pj ) · e − ((1+ε)xj−Npj ) 2 2Npj (1−pj ) ¸ . (20) We want to find the value Nˆ i that maximizes the likelihood function. Namely, Nˆ i = arg max{Li} N . (21)
We first take logarithm on both sides of(20) 20000 In(L:)= 1 (1+e)z-Np)2 15000 =l+1 /2πNp(1-p5) 2Np5(1-p5) (22) 10000 We then differentiate both sides. 5000 = 1 (1+e)2x号-(Np)2 =141 2N2p5(1-P) 2004006008001000 number of pollings (1+e)2x号-(Np)2i-l = 2N (23) Fig.4.The middle curve shows the estimated number of tags with respect =l41 2N2p5(1-P5) to the number of pollings.The upper and lower curves show the confidence interval.The straightline shows the true number of tags. Finally we set the right side to be zero and numerically compute the value of Ni. 2)Termination Condition:The fisher information2 I(Ni) TABLE I of Li is defined as follows NUMBER OF RESPONSES WHEN a =90%,B=5% I(Ni)=-E 「a21n(L) (24) Total number of responses N ON2 MLEAT ASEA I EMLEAT UPE-OT UPE-M EZB 5000 2669S 1083S 768S 783L 2415 1030S According to (23),we have 10000 2659S 1083S 763S 13498L 2432L 20261S 200002654S1081S 775S24455L2431L 40521S Z)=E∑ (1+e)2x号 i-1 TABLE II =1+1 N3p1-p5)-2N2 NUMBER OF RESPONSES WHEN Q =95%,B=5% N Total number of responses (Npj)2+Npi(1-Pi)i-l MLEA ASEA EMLEA UPE-O UPE-M EZB 2N2 25 N3p5(1-pj) 5000 3798S 1543S 1083S 8740L 3437L 14472S 1I0000 3776S 1543S 1101S 14573L 3507L 28944S 200003789S 1545S■ 1099S25515L3450L 57888S 盆”+ Pi i-l TABLEⅢI NUMBER OF RESPONSES WHEN =99%.B=5% Above,we applied E((1+)2)=(Npj)2+Npj(1-Pi)in N Total number of responses MLEA ASEAI EMLEAI UPE-O UPE-MI EZB (25)since(1+s)zj~Norm(Npj;Npj(1-pj))and E(z2)= 5D0 6542S 2658S 1857S 11262L 5885L 24602S (E(z))2+Var(z). 10000 6546S 2654S 1862S 16969L 5935L 49205S Following the classical theory for MLE,when i is suffi- 20000 6543S 2655S I887S 28379L5939L 98409S ciently large,the distribution of Ni is approximated by 1 Norm(N, (26) TABLE IV I(Ni) NUMBER OF RESPONSES WHEN Q =99%,B=9% Hence,the confidence interval is Total number of responses N MLEA ASEA EMLEA UPE-M EZB 1 5000 2005S 815S 557S 1802L 7236S N:±Za (27 I00002083S 839S 551s 1929L 14472S I(Ni) 200002027S 819S■ 556S 1755L 28944S Note that we use N;as an approximation for N in the com- TABLE V NUMBER OF RESPONSES WHEN Q =99%,B=6% putation when necessary since N is unknown.The termination condition for EMLEA to achieve the required accurary is N Total number of responses MLEA ASEA EMLEA UPE-M EZB 5000 4766S1835S 1260S 3942L 17366S 1 ≤N· (28) 10000 4756S1833S 240S 4063L0 34733S I(Ni) 200004775S■1852S1227S 3993L 69465S TABLE VI Fig.4 shows the simulation result of EMLEA when N NUMBER OF RESPONSES WHEN Q =99%,B=3% 10,000,a 95%and B =5%.The middle curve is the value of Ni,which converges to the value of N represented by N Total number of responses MLEAT ASEAT EMLEAT UPE-M EZB the central straight line.The upper and lower curves represent 5000 16263S 7308S 5014S 15924L 65124S the 95%confidence interval,which shrinks as the number 10000 16211S 7345S 4942S 16830L 130247S of pollings increases.The algorithm terminates after 1081 2000016291S7376S 5055S 16647L 260495S pollings
We first take logarithm on both sides of (20). ln(Li) = Xi j=l+1 · ln 1 p 2πN pj (1 − pj ) − ((1 + ε)xj − N pj ) 2 2N pj (1 − pj ) ¸ . (22) We then differentiate both sides. ∂ln(Li) ∂N = Xi j=l+1 · − 1 2N + (1 + ε) 2x 2 j − (N pj ) 2 2N2pj (1 − pj ) ¸ = Xi j=l+1 (1 + ε) 2x 2 j − (N pj ) 2 2N2pj (1 − pj ) − i − l 2N . (23) Finally we set the right side to be zero and numerically compute the value of Nˆ i . 2) Termination Condition: The fisher information2 I(Nˆ i) of Li is defined as follows I(Nˆ i) = −E · ∂ 2 ln(Li) ∂N2 ¸ . (24) According to (23), we have I(Nˆ i) = E · X i j=l+1 (1 + ε) 2x 2 j N3pj (1 − pj ) − i − l 2N2 ¸ = X i j=l+1 (N pj ) 2 + N pj (1 − pj ) N3pj (1 − pj ) − i − l 2N2 (25) = X i j=l+1 pj N(1 − pj ) + i − l 2N2 . Above, we applied E((1+ε) 2x 2 j ) = (N pj ) 2+N pj (1−pj ) in (25) since (1+ε)xj ∼ Norm(N pj , N pj (1−pj )) and E(x 2 ) = (E(x))2 + V ar(x). Following the classical theory for MLE, when i is suffi- ciently large, the distribution of Nˆ i is approximated by Norm(N, 1 I(Nˆ i) ). (26) Hence, the confidence interval is Nˆ i ± Zα · s 1 I(Nˆ i) . (27) Note that we use Nˆ i as an approximation for N in the computation when necessary since N is unknown. The termination condition for EMLEA to achieve the required accurary is Zα · s 1 I(Nˆ i) ≤ Nˆ i · β. (28) Fig. 4 shows the simulation result of EMLEA when N = 10, 000, α = 95% and β = 5%. The middle curve is the value of Nˆ i , which converges to the value of N represented by the central straight line. The upper and lower curves represent the 95% confidence interval, which shrinks as the number of pollings increases. The algorithm terminates after 1081 pollings. 0 5000 10000 15000 20000 0 200 400 600 800 1000 number of pollings Fig. 4. The middle curve shows the estimated number of tags with respect to the number of pollings. The upper and lower curves show the confidence interval. The straightline shows the true number of tags. TABLE I NUMBER OF RESPONSES WHEN α = 90%, β = 5% N Total number of responses MLEA ASEA EMLEA UPE-O UPE-M EZB 5000 2669 S 1083 S 768 S 7833 L 2415 L 10130 S 10000 2659 S 1083 S 763 S 13498 L 2432 L 20261 S 20000 2654 S 1081 S 775 S 24455 L 2431 L 40521 S TABLE II NUMBER OF RESPONSES WHEN α = 95%, β = 5% N Total number of responses MLEA ASEA EMLEA UPE-O UPE-M EZB 5000 3798 S 1543 S 1083 S 8740 L 3437 L 14472 S 10000 3776 S 1543 S 1101 S 14573 L 3507 L 28944 S 20000 3789 S 1545 S 1099 S 25515 L 3450 L 57888 S TABLE III NUMBER OF RESPONSES WHEN α = 99%, β = 5% N Total number of responses MLEA ASEA EMLEA UPE-O UPE-M EZB 5000 6542 S 2658 S 1857 S 11262 L 5885 L 24602 S 10000 6546 S 2654 S 1862 S 16969 L 5935 L 49205 S 20000 6543 S 2655 S 1887 S 28379 L 5939 L 98409 S TABLE IV NUMBER OF RESPONSES WHEN α = 99%, β = 9% N Total number of responses MLEA ASEA EMLEA UPE-M EZB 5000 2005 S 815 S 557 S 1802 L 7236 S 10000 2083 S 839 S 551 S 1929 L 14472 S 20000 2027 S 819 S 556 S 1755 L 28944 S TABLE V NUMBER OF RESPONSES WHEN α = 99%, β = 6% N Total number of responses MLEA ASEA EMLEA UPE-M EZB 5000 4766 S 1835 S 1260 S 3942 L 17366 S 10000 4756 S 1833 S 1240 S 4063 L 34733 S 20000 4775 S 1852 S 1227 S 3993 L 69465 S TABLE VI NUMBER OF RESPONSES WHEN α = 99%, β = 3% N Total number of responses MLEA ASEA EMLEA UPE-M EZB 5000 16263 S 7308 S 5014 S 15924 L 65124 S 10000 16211 S 7345 S 4942 S 16830 L 130247 S 20000 16291 S 7376 S 5055 S 16647 L 260495 S
VII.SIMULATION RESULT B.Total Number of Bits Transmitted In this section,we evaluate the performance of MLEA, The second simulation evaluates the energy cost of the algorithms.As mentioned before,one bit is enough to separate ASEA and EMLEA by simulations.We compare the proposed algorithms with the state-of-the-art algorithms in the related empty/non-empty slot.Hence,the response of MLEA,ASEA, work.They are the Unified Probabilistic Estimator (UPE) EMLEA and EZB is one bit long.A response in UPE-M is 10 [3]and the Enhanced Zero-Based (ZEB)estimator [4].The bits long [3].We compare the total number of bits transmitted original UPE,denoted as UPE-O,is very energy-inefficient by all tags before each algorithm terminates.Fig.5 show because its contention probability begins from 100%and thus the simulation results with respect to N when a =95% all tags will respond.We modify it (denoted as UPE-M)to and B =3%,6%and 9%.For example,when a =95%, begin from a small initial contention probability We B=3%,and N =20,000,the ratio between the number run each simulation 100 times and average the outcomes. of bits transmitted by UPE-M (EZB)and that by our best In the initialization phase of our protocols,let Nmaz estimator EMLEA is 32.8(53.6).It should be noted that the number of bits transmitted is not an accurate measurement of 1.000.000 and C=2.The frame size in ASEA and EMLEA is 10 slots.The parameters for UPE and ZEB are picked based the energy cost because it ignores the energy spent to power up the radio and synchronize with the reader.However,combining on the original papers whenever possible.All algorithms except the number of bits and the number of transmissions (in the for UPE need only to identify empty and non-empty slots.UPE has to identify empty,singleton and collision slots.To set a previous subsection)still gives a good idea on how energy- efficient each algorithm is. non-empty slot apart from an empty slot,a tag only needs to respond a short bit string to make the channel busy.However, C.Estimation Time to set a singleton slot apart from a collision slot,many more bits (10 used by UPE)are necessary [19].For example,CRC The third simulation compares the time it takes for each algorithm to complete the estimation of N.Based on the may be used to detect collision. specification of the Philips I-Code system [16],after the The energy cost of an algorithm depends on (1)the num- ber of responses that all tags transmit before the algorithm required waiting times (e.g.,gap between transmissions)are included,it can be calculated that a reader needs 0.4ms to terminates and(2)the size of each response.We use 'S'to detect an empty slot,0.8ms to detect a collision or a singleton mean that the response is a short bit string(in the empty/non- slot,and 1ms to broadcast a polling request.Hence,MLEA, empty case),and 'L'to mean a long bit string (in the ASEA,EMLEA and EZB requires a slot length of 0.4ms, empty/singleton/collision case). while UPE-M requires a slot length of 0.8ms.Recall that We do not include the simulation results for LoF [5]because its energy cost is much higher than others.Its number of the contention probability takes the form ofwhere is responses transmitted by the tags is kN,where k is the number a known constant.Thus the reader transmits N instead of the of frames used in the estimation process. actual probability value in the polling requests.If we assume Nmax is no more than a million,then 20 bits for Ni are sufficient.MLEA has a fixed frame size of one slot.ASEA and A.Number of Responses EMLEA have a fixed frame size of 10 slots.EZB and UPE- The first simulation studies the number of responses in each M also have pre-determined frame sizes.Let a =95%and algorithm with respect to N,a and B.Table I shows the 3=5%.Fig.(6)shows the estimation times of the algorithms number of responses with respect to N when a =90%and with respect to the number of tags in the deployment.The B=5%.The proposed algorithms require fewer responses times grow very slowly as the number of tags increase,which than UPE and EZB.As predicted,UPE-O is energy-inefficient; suggests the algorithms all scale well.UPE-M takes the least UPE-M works much better.The best algorithm is EMLEA amount of time,only 1.1 seconds,to estimate 20,000 tags, whose number of responses is about one third of what UPE- while the other algorithms take between 3.4 to 7.7 seconds. EMLEA takes 5.3 seconds.Even though the new algorithms M requires and one fiftieth of what EZB when N is 20.000. Moveover,each response in UPE is much longer. take longer to complete,their estimation time is still small.We Tables II-III show similar comparison for different a values believe the extra time needed can be well justified for the large energy saving. Tables IV-VI show the comparison under different B values when a =99%.We omit the results for UPE-O(which are VIII.CONCLUSIONS much worse than the numbers of UPE-M)to keep the table width within a column.In all cases,the number of responses This paper proposes three new probabilistic algorithms for increases when a increases or B decreases,and except for EZB. estimating the number of RFID tags in a region.We believe the number does not vary much with respect to N,meaning the algorithms are the first of its kind that targets at prolonging that all algorithms except for EZB achieve good scalability the lifetime of the active RFIDs.Their energy cost is far less The ratio between the numbers for different algorithms appear than the state-of-the-art algorithms in the related work. to be quite stable under different parameter settings REFERENCES [1]R.Hollinger and J.Davis, "National Retail Security Survey," 2The fisher information [18]is a way of measuring the amount of informa- hup://diogenesllc.com/NRSS_2001.pdf.2001 tion that an observable random variable carries about an unknown parameter [2]R.Want,"An Introduction to RFID Technology,"Proc.IEEE PerCom, 6 upon which the likelihood function of 6,L()=f(x;)depends. Jan2006
VII. SIMULATION RESULT In this section, we evaluate the performance of MLEA, ASEA and EMLEA by simulations. We compare the proposed algorithms with the state-of-the-art algorithms in the related work. They are the Unified Probabilistic Estimator (UPE) [3] and the Enhanced Zero-Based (ZEB) estimator [4]. The original UPE, denoted as UPE-O, is very energy-inefficient because its contention probability begins from 100% and thus all tags will respond. We modify it (denoted as UPE-M) to begin from a small initial contention probability 1 Nmax . We run each simulation 100 times and average the outcomes. In the initialization phase of our protocols, let Nmax = 1, 000, 000 and C = 2. The frame size in ASEA and EMLEA is 10 slots. The parameters for UPE and ZEB are picked based on the original papers whenever possible. All algorithms except for UPE need only to identify empty and non-empty slots. UPE has to identify empty, singleton and collision slots. To set a non-empty slot apart from an empty slot, a tag only needs to respond a short bit string to make the channel busy. However, to set a singleton slot apart from a collision slot, many more bits (10 used by UPE) are necessary [19]. For example, CRC may be used to detect collision. The energy cost of an algorithm depends on (1) the number of responses that all tags transmit before the algorithm terminates and (2) the size of each response. We use ‘S’ to mean that the response is a short bit string (in the empty/nonempty case), and ‘L’ to mean a long bit string (in the empty/singleton/collision case). We do not include the simulation results for LoF [5] because its energy cost is much higher than others. Its number of responses transmitted by the tags is kN, where k is the number of frames used in the estimation process. A. Number of Responses The first simulation studies the number of responses in each algorithm with respect to N, α and β. Table I shows the number of responses with respect to N when α = 90% and β = 5%. The proposed algorithms require fewer responses than UPE and EZB. As predicted, UPE-O is energy-inefficient; UPE-M works much better. The best algorithm is EMLEA, whose number of responses is about one third of what UPEM requires and one fiftieth of what EZB when N is 20,000. Moveover, each response in UPE is much longer. Tables II-III show similar comparison for different α values. Tables IV-VI show the comparison under different β values when α = 99%. We omit the results for UPE-O (which are much worse than the numbers of UPE-M) to keep the table width within a column. In all cases, the number of responses increases when α increases or β decreases, and except for EZB, the number does not vary much with respect to N, meaning that all algorithms except for EZB achieve good scalability. The ratio between the numbers for different algorithms appear to be quite stable under different parameter settings. 2The fisher information [18] is a way of measuring the amount of information that an observable random variable x carries about an unknown parameter θ upon which the likelihood function of θ, L(θ) = f(x; θ), depends. B. Total Number of Bits Transmitted The second simulation evaluates the energy cost of the algorithms. As mentioned before, one bit is enough to separate empty/non-empty slot. Hence, the response of MLEA, ASEA, EMLEA and EZB is one bit long. A response in UPE-M is 10 bits long [3]. We compare the total number of bits transmitted by all tags before each algorithm terminates. Fig. 5 show the simulation results with respect to N when α = 95% and β = 3%, 6% and 9%. For example, when α = 95%, β = 3%, and N = 20, 000, the ratio between the number of bits transmitted by UPE-M (EZB) and that by our best estimator EMLEA is 32.8 (53.6). It should be noted that the number of bits transmitted is not an accurate measurement of the energy cost because it ignores the energy spent to power up the radio and synchronize with the reader. However, combining the number of bits and the number of transmissions (in the previous subsection) still gives a good idea on how energyefficient each algorithm is. C. Estimation Time The third simulation compares the time it takes for each algorithm to complete the estimation of N. Based on the specification of the Philips I-Code system [16], after the required waiting times (e.g., gap between transmissions) are included, it can be calculated that a reader needs 0.4ms to detect an empty slot, 0.8ms to detect a collision or a singleton slot, and 1ms to broadcast a polling request. Hence, MLEA, ASEA, EMLEA and EZB requires a slot length of 0.4ms, while UPE-M requires a slot length of 0.8ms. Recall that the contention probability takes the form of ω Nˆi , where ω is a known constant. Thus the reader transmits Nˆ i instead of the actual probability value in the polling requests. If we assume Nmax is no more than a million, then 20 bits for Nˆ i are sufficient. MLEA has a fixed frame size of one slot. ASEA and EMLEA have a fixed frame size of 10 slots. EZB and UPEM also have pre-determined frame sizes. Let α = 95% and β = 5%. Fig. (6) shows the estimation times of the algorithms with respect to the number of tags in the deployment. The times grow very slowly as the number of tags increase, which suggests the algorithms all scale well. UPE-M takes the least amount of time, only 1.1 seconds, to estimate 20,000 tags, while the other algorithms take between 3.4 to 7.7 seconds. EMLEA takes 5.3 seconds. Even though the new algorithms take longer to complete, their estimation time is still small. We believe the extra time needed can be well justified for the large energy saving. VIII. CONCLUSIONS This paper proposes three new probabilistic algorithms for estimating the number of RFID tags in a region. We believe the algorithms are the first of its kind that targets at prolonging the lifetime of the active RFIDs. Their energy cost is far less than the state-of-the-art algorithms in the related work. REFERENCES [1] R. Hollinger and J. Davis, “National Retail Security Survey,” http://diogenesllc.com/NRSS 2001.pdf, 2001. [2] R. Want, “An Introduction to RFID Technology,” Proc. IEEE PerCom, Jan 2006
20000 140000 MLEA 40000 MLEA 35000 120000 30000 15000 p 100000 2500 ZEB 80000 20000 MLEA 10000 60000 1500 40000 1000 UP 5000 20000 5000 ra四WIRATEONIEAMEE0 500010000 15000 20000 0 5000 10000 15000 20000 50001000015000 20000 the number of tags N the number of tags n the number of tags N Fig.5. a=95%,3=3%,6%and9%, MLEA is the estimation of the success probability g.It is known that asymptotically g follows a normal distribution: -UPE-M ZEB ~Norm( q(1-q q, (29) Because the MLE approach provides statistically consistent 2 estimate,when i is large,we can consider the contention probabilities in the later stage of the pooling process to be 0 0 5000 1000015000 20000 approximatly a constant.In addition,the number of polling the number of tags N results before stabilization of the contention probability is limited,and their impact will diminish as i becomes large Fig.6.Estimation times of the algorithms That is,they can be ignored when the asymptotic property of N;is considered.Hence,for the asymptotic property,we can let pj=pi,for1≤j≤i,and Eq.(8)becomes [3]M.Kodialam and T.Nandagopal,"Fast and Reliable Estimation Schemes in RFID Systems,"Proc.ACM MOBICOM,Los Angeles,2006. aln(L.) (1-p)N [4]M.Kodialam,T.Nandagopal,and W.Lau,"Anonymous Tracking using (30) RFID tags,"Proc.IEEE INFOCOM,2007 [5]C.Qian,H.Ngan,and Y.Liu,"Cardinality Estimation for Large-scale RFID Systems,"Proc.IEEE PerCom,2008. [6]V.Namboodiri and L.Gao,"Energy-Aware Tag Anti-Collision Protocols Therefore,the MLE N;that solves ain()0 satisfies for RFID Systems,"Proc.of IEEE PerCom,2007. [7]D.Klair,K.Chin,and R.Raad,"On the Energy Consumption of Pure and Slotted Aloha based RFID Anti-Collision Protocols,"Computer 1-p)=1-(∑Z)/i=1-g 31) Communications,2008. j=1 [8]K.Hwang.B.Vander-Zanden,and H.Taylor,"A Linear-time Probabilis- tic Counting Algorithm for Database Applications,"ACM Transactions Hence,from(29).(1-pi)Wasymptotically follows the fol- on Database Systems,vol.15,no.2.June 1990. lowing normal distribution [9]H.Vogt,"Efficient Object Identification with Passive RFID Tags,"Proc IEEE PerCom,2002. [10]J.Zhai and G.N.Wang,"An Anti-Collision Algorithm Using Two Norm (1-)N,L-1-p)1-)N (32) functioned Estimation for RFID Tags,"Proc./CCSA,2005. [11]J.Cha and J.Kim,"Novel Anti-collision Algorithms for Fast Object Identification in RFID System,"Proc.IEEE /CPADS.2005. According to the 6-method [17],if a random variable Xi [12]D.Hush and C.Wood,"Analysis of Tree Algorithm for RFID Arbitra- satisfies tion,"Proc.IEEE ISIT.1998. [13]J.Myung and W.Lee,"An adaptive memoryless tag anti-collision X:马Norm6,), (33) protocol for RFID networks,"Proc.IEEE ICC,2005 [14]H.Choi,J.Cha,and J.Kim,"Fast Wireless Anti-collision Algorithm in Ubiquitous ID System,"Proc.IEEE VTC.Sep 2004 whereand are finite constants andBmeans convergence [15]Chiu C.Tan,Bo Sheng.and Qun Li,"How to monitor for missing RFID in distribution,then we must have tags."Proc.IEEE ICDCS.June 2008. [16]Philips Semiconductors, "I-CODE Smart Label RFID Tags." htp://www.nxp.com/acrobat_download/other/identification/SL092030.pdf. g(Xi)B Norm g(0), o2Lg(0)]2 (34) Jan2004. [17]G.Casella and R.L.Berger,"Statistical Inference,"2nd edition,Duxbury for any function g such that g')exists and takes a non-zero Pres3,2002. [18]Lehmann and G.Casella,"Theory of Point Estimation,"Springer:2nd value.Based on (33)and (34),taking the logarithm of (32), edirion,1998. we have [19]"EPC Radio-Frequency Identity Protocols Class-1 Generation-2 UHF RFID Protocol for Communications at 860 MHz-960 MHz Version N:n(1-pi)~Norm NIn(1-p), 1-(1-p)) (35) 1.0.9"hmp://www.epcglobalinc.org/standards/uhfc1g2/uhfc1g2_1_0_9- (1-p) standard-20050126.pdf.EPCglobal,2005. That is, Ni~Norm (1-(1-)) (36) APPENDIX A:DISTRIBUTION AND VARIANCE OF Ni i(1-pi)Nn2(1-p) Let i be a large positive integer.Consider the sequence of Bernoulli random variables,Zj,1 <j<i,whose success Hence,we have Var(Ni) (1-(1-p)) probability is q=1-(1-p:)N.Let =(/i which i(1-p)Nn2(1-p)
0 20000 40000 60000 80000 100000 120000 140000 0 5000 10000 15000 20000 number of bits transmitted the number of tags N MLEA ASEA EMLEA UPE-M ZEB 0 5000 10000 15000 20000 25000 30000 35000 40000 0 5000 10000 15000 20000 number of bits transmitted the number of tags N MLEA ASEA EMLEA UPE-M ZEB 0 5000 10000 15000 20000 0 5000 10000 15000 20000 number of bits transmitted the number of tags N MLEA ASEA EMLEA UPE-M ZEB Fig. 5. α = 95%, β = 3%, 6% and 9%. 0 2 4 6 8 10 12 0 5000 10000 15000 20000 estimation time in seconds the number of tags N MLEA ASEA EMLEA UPE-M ZEB Fig. 6. Estimation times of the algorithms [3] M. Kodialam and T. Nandagopal, “Fast and Reliable Estimation Schemes in RFID Systems,” Proc. ACM MOBICOM, Los Angeles, 2006. [4] M. Kodialam, T. Nandagopal, and W. Lau, “Anonymous Tracking using RFID tags,” Proc. IEEE INFOCOM, 2007. [5] C. Qian, H. Ngan, and Y. Liu, “Cardinality Estimation for Large-scale RFID Systems,” Proc. IEEE PerCom, 2008. [6] V. Namboodiri and L. Gao, “Energy-Aware Tag Anti-Collision Protocols for RFID Systems,” Proc. of IEEE PerCom, 2007. [7] D. Klair, K. Chin, and R. Raad, “On the Energy Consumption of Pure and Slotted Aloha based RFID Anti-Collision Protocols,” Computer Communications, 2008. [8] K. Hwang, B. Vander-Zanden, and H. Taylor, “A Linear-time Probabilistic Counting Algorithm for Database Applications,” ACM Transactions on Database Systems, vol. 15, no. 2, June 1990. [9] H. Vogt, “Efficient Object Identification with Passive RFID Tags,” Proc. IEEE PerCom, 2002. [10] J. Zhai and G. N. Wang, “An Anti-Collision Algorithm Using Twofunctioned Estimation for RFID Tags,” Proc. ICCSA, 2005. [11] J. Cha and J. Kim, “Novel Anti-collision Algorithms for Fast Object Identification in RFID System,” Proc. IEEE ICPADS, 2005. [12] D. Hush and C. Wood, “Analysis of Tree Algorithm for RFID Arbitration,” Proc. IEEE ISIT, 1998. [13] J. Myung and W. Lee, “An adaptive memoryless tag anti-collision protocol for RFID networks,” Proc. IEEE ICC, 2005. [14] H. Choi, J. Cha, and J. Kim, “Fast Wireless Anti-collision Algorithm in Ubiquitous ID System,” Proc. IEEE VTC, Sep 2004. [15] Chiu C. Tan, Bo Sheng, and Qun Li, “How to monitor for missing RFID tags,” Proc. IEEE ICDCS, June 2008. [16] Philips Semiconductors, “I-CODE Smart Label RFID Tags,” http://www.nxp.com/acrobat download/other/identification/SL092030.pdf, Jan 2004. [17] G. Casella and R. L. Berger, “Statistical Inference,” 2nd edition, Duxbury Press, 2002. [18] Lehmann and G. Casella, “Theory of Point Estimation,” Springer, 2nd edition, 1998. [19] “EPC Radio-Frequency Identity Protocols Class-1 Generation-2 UHF RFID Protocol for Communications at 860 MHz - 960 MHz Version 1.0.9,” http://www.epcglobalinc.org/standards/uhfc1g2/uhfc1g2 1 0 9- standard-20050126.pdf, EPCglobal, 2005. APPENDIX A: DISTRIBUTION AND VARIANCE OF Nˆ i Let i be a large positive integer. Consider the sequence of Bernoulli random variables, Zj , 1 ≤ j ≤ i, whose success probability is q = 1−(1−pi) N . Let qˆ = (Pi j=1 Zj )/i, which is the estimation of the success probability q. It is known that asymptotically qˆ follows a normal distribution: qˆ ∼ Normµ q, q(1 − q) i ¶ . (29) Because the MLE approach provides statistically consistent estimate, when i is large, we can consider the contention probabilities in the later stage of the pooling process to be approximatly a constant. In addition, the number of polling results before stabilization of the contention probability is limited, and their impact will diminish as i becomes large. That is, they can be ignored when the asymptotic property of Nˆ i is considered. Hence, for the asymptotic property, we can let pj = pi , for 1 ≤ j ≤ i, and Eq. (8) becomes ∂ ln(Li) ∂N = ln(1−pi) · (i− Xi j=1 Zj )− (1 − pi) N 1 − (1 − pi)N Xi j=1 Zj ¸ . (30) Therefore, the MLE Nˆ i that solves ∂ ln(Li) ∂N = 0 satisfies (1 − pi) Nˆi = 1 − ( X i j=1 Zj )/i = 1 − q. ˆ (31) Hence, from (29), (1 − pi) Nˆ asymptotically follows the following normal distribution Normµ (1 − pi) N , (1 − (1 − pi) N )(1 − pi) N i ¶ . (32) According to the δ-method [17], if a random variable Xi satisfies Xi D→ Norm(θ, σ 2 i ), (33) where θ and σ are finite constants and D→ means convergence in distribution, then we must have g(Xi) D→ Normµ g(θ), σ 2 [g 0 (θ)]2 i ¶ , (34) for any function g such that g 0 (θ) exists and takes a non-zero value. Based on (33) and (34), taking the logarithm of (32), we have Nˆi · ln(1 − pi) ∼ Normµ N ln(1 − pi), (1 − (1 − pi) N ) i(1 − pi)N ¶ . (35) That is, Nˆi ∼ Normµ N, (1 − (1 − pi) N ) i(1 − pi)N ln2 (1 − pi) ¶ . (36) Hence, we have V ar(Nˆ i) ≈ (1 − (1 − pi) N ) i(1 − pi)N ln2 (1 − pi)