we can also define a discrete form of pi(r)as follows, IV.DESIGN OF DECENTRALIZED STORAGE SCHEME FOR we equally divide the region 0,1 into m sub-regions,and MULTI-DIMENSIONAL RANGE OUERIES leverage the histogram method to calculate the probability pi.j,(j=1,...,m)and we have: 01D0 ∑P=1 (3) j=1 Hence consider an index node over the k-d tree with multi- dimensional range[(a1,31),(a2,2),·,(ak,3k小,where a and B;respectively denote the lower bound and upper bound of specified range of attribute Ai,we have 0<ai<Bi<1, assume the above multi-dimensional range is associated with a two-dimensional spacial region Sj,therefore we can calculate (a)The K-d Tree based index structure (b)The corresponding zone node and the probability that a random event data falls into this region boundanes Si as: Fig.2.Decentralized Storage Scheme for Multi-Dimensional Range Queries ProbE(Sj)= Pi(x)dx (4) We visualize the deployment field of a sensor network as Assume the overall event frequency is fe,thus the frequency a number of n x n equal sized grid cells.As DIM mainly for which the specific storage region Sj is accessed is fe. applies for sparse sensor network where some zones may not Probe(Si). have any sensor node,in this paper we consider the situation C.Modelling Range Query Distribution where the node density is high enough for each cell to have at Unlike event distributions over attributes,which are mu- least one storage node candidates,these cells are nominated tually exclusive over each attribute Ai,the range query dis- as the storage units of the DCS scheme,at the center of each tributions over attributes are overlapped because each query cell one exact sensor node is selected as the storage node. is corresponding to a range over the attribute dimensions. A.Indexing Storages over Cells Users can specify an k dimensional range query in the form like ([L,U],[L2,U2],....[L,U])over k attributes,where We construct the index structure based on k-d tree over the 0≤L:≤U≤l,(l≤i≤k),for partial range query which cell-based storage deployment area,on one hand,each index does not include attribute A,,we can set Li=0,U;=1.Thus node over the index structure represents a multi-dimensional we utilize another method to depict the distributions:Given a range for the sensor data,on the other hand,it also associates training set of queries Q,we have Q=(Q1,Q2,...,QN}, with a specified spacial region over the two-dimensional where query Qi has a query range over each attribute Ai, deployment area.The definition of the index structure is con- which is (Li,Ui),hence we define an indicator variable Bi sistent with DIM,except that we utilize the cell as the storage for a specified range (i,Bi)of attribute A:as follows: unit instead of sensor node.each index node corresponds to a spacial region which covers one or more cells.Fig.2 shows an 了1if(a,3)n(Li,U)≠p B;0 otherwise (5) example of our decentralized storage scheme to support multi- dimensional range queries.Fig.2(a)depicts the k-d tree based And we further calculate the query frequency fi for the index structure and Fig.2(b)represents the corresponding zone specified range (i,Bi)over attribute Ai: code and boundaries.The black nodes denote the storage fi=∑B regions specified by the k-d tree and those cell(s)contained (6) in the corresponding regions become(s)the replicate storage HQ,∈Q cells,which means each cell will store the same copy of the Thus the specified attribute range will be queried with proba- multi-dimensional ranged data specified by the index node bility pi=fi/N. over the k-d tree. Assume query distributions over various attributes are mutu- ally independent,given a multi-dimensional data region which B.Controlling Granularity for k-d Tree based Index Structure is associated with a two-dimensional spacial region Sj,we can Now we investigate into the k-d tree based index structure, finally get the probability for the specified storage region S; as already mentioned in Section I,we note that controlling to be queried as: the granularity of the index nodes is very crucial for energy efficiency of multi-dimensional event insertion and range Proba(S)=Πn (7) query diffusion.If we split these sensor data into very refined =1 zones according to multi-dimensional data attributes,thus to Assume the overall query frequency is fo,thus the frequency resolve a conventional range query,we have to split it into for which the specified storage region Sj is queried is fa many sub-range queries for various zones,apparently it cannot Probo(Sj). be energy efficient due to the large number of routing cost for 168 Authorized licensed use limited to:Nanjing University.Downloaded on July 11,2010 at 07:42:17 UTC from IEEE Xplore.Restrictions apply.we can also define a discrete form of pi(x) as follows, we equally divide the region [0, 1] into m sub-regions, and leverage the histogram method to calculate the probability pi,j ,(j = 1, ..., m) and we have: Xm j=1 pi,j = 1 (3) Hence consider an index node over the k-d tree with multidimensional range [(α1, β1),(α2, β2), · · · ,(αk, βk)], where αi and βi respectively denote the lower bound and upper bound of specified range of attribute Ai , we have 0 ≤ αi < βi ≤ 1, assume the above multi-dimensional range is associated with a two-dimensional spacial region Sj , therefore we can calculate the probability that a random event data falls into this region Sj as: P robE (Sj ) = Y k i=1 Z βi αi pi(x)dx (4) Assume the overall event frequency is fe, thus the frequency for which the specific storage region Sj is accessed is fe · P robE (Sj ). C. Modelling Range Query Distribution Unlike event distributions over attributes, which are mutually exclusive over each attribute Ai , the range query distributions over attributes are overlapped because each query is corresponding to a range over the attribute dimensions. Users can specify an k dimensional range query in the form like h[L1, U1], [L2, U2], ..., [Lk, Uk]i over k attributes, where 0 ≤ Li ≤ Ui ≤ 1,(1 ≤ i ≤ k), for partial range query which does not include attribute Ai , we can set Li = 0, Ui = 1. Thus we utilize another method to depict the distributions: Given a training set of queries Q, we have Q = {Q1, Q2, ..., QN }, where query Qj has a query range over each attribute Ai , which is (Li , Ui), hence we define an indicator variable Bi for a specified range (αi , βi) of attribute Ai as follows: Bi = 1 if (αi , βi) ∩ (Li , Ui) 6= φ 0 otherwise (5) And we further calculate the query frequency fi for the specified range (αi , βi) over attribute Ai : fi = X ∀Qj∈Q Bi (6) Thus the specified attribute range will be queried with probability pi = fi/N. Assume query distributions over various attributes are mutually independent, given a multi-dimensional data region which is associated with a two-dimensional spacial region Sj , we can finally get the probability for the specified storage region Sj to be queried as: P robQ(Sj ) = Y k i=1 pi (7) Assume the overall query frequency is fq, thus the frequency for which the specified storage region Sj is queried is fq · P robQ(Sj ). IV. DESIGN OF DECENTRALIZED STORAGE SCHEME FOR MULTI-DIMENSIONAL RANGE QUERIES Fig. 2. Decentralized Storage Scheme for Multi-Dimensional Range Queries We visualize the deployment field of a sensor network as a number of n × n equal sized grid cells. As DIM mainly applies for sparse sensor network where some zones may not have any sensor node, in this paper we consider the situation where the node density is high enough for each cell to have at least one storage node candidates, these cells are nominated as the storage units of the DCS scheme, at the center of each cell one exact sensor node is selected as the storage node. A. Indexing Storages over Cells We construct the index structure based on k-d tree over the cell-based storage deployment area, on one hand, each index node over the index structure represents a multi-dimensional range for the sensor data, on the other hand, it also associates with a specified spacial region over the two-dimensional deployment area. The definition of the index structure is consistent with DIM, except that we utilize the cell as the storage unit instead of sensor node, each index node corresponds to a spacial region which covers one or more cells. Fig.2 shows an example of our decentralized storage scheme to support multidimensional range queries. Fig.2(a) depicts the k-d tree based index structure and Fig.2(b) represents the corresponding zone code and boundaries. The black nodes denote the storage regions specified by the k-d tree and those cell(s) contained in the corresponding regions become(s) the replicate storage cells, which means each cell will store the same copy of the multi-dimensional ranged data specified by the index node over the k-d tree. B. Controlling Granularity for k-d Tree based Index Structure Now we investigate into the k-d tree based index structure, as already mentioned in Section I, we note that controlling the granularity of the index nodes is very crucial for energy efficiency of multi-dimensional event insertion and range query diffusion. If we split these sensor data into very refined zones according to multi-dimensional data attributes, thus to resolve a conventional range query, we have to split it into many sub-range queries for various zones, apparently it cannot be energy efficient due to the large number of routing cost for 168 Authorized licensed use limited to: Nanjing University. Downloaded on July 11,2010 at 07:42:17 UTC from IEEE Xplore. Restrictions apply