query forwarding from query node to these various storage where Dauer(S)denotes the expected distance from an nodes.Based on the above observation,keeping these ranges arbitrary query node to the nearest storage node in the storage of data to be hosted in a relatively small number of zones region.And we can simply deduce that Dquery(Si)equals to may be more efficient,but obviously the extreme situation to Devent(Si). keep all ranges of data in one unique zone (with replicated storage scheme)is also not energy efficient,because although the query forwarding cost can be effectively reduced,large amount of energy will be drained for routing the same copy (X, (X,Y) of sensor data to all replicate storages.Based on the above analysis,we must find an optimized storage strategy to control ● the granularity for the index structure as a tradeoff between the two extremum situations to achieve best energy efficiency. Therefore to construct an optimized k-d tree based index Y (X..Y) structure over the sensor network,if we build the k-d tree ● from the root,then at each splitting step of the k-d tree,we have two optional strategies:one strategy is to conduct further splitting for the specified zones,another strategy is to stop Fig.3.Estimate the value for Dquery(Si)or Devent(i) further splitting and thus apply the replicate storage scheme to all cells covered by the specified zones.The most crucial We estimate the expectations of Dquery(Si)or Devent(Si) thing to construct the index structure is how to decide the as follows:for any event/query imposed inside the replicate storage region S;,as it only needs to find a nearest storage optimized strategy at each splitting step along the k-d tree,so as to achieve the best energy efficiency. node inside Si,the expected distance is less than half the cell size,which is d/2;for any event/query imposed outside To solve the above optimization problem,we start by the replicate storage region Si,as Fig.3 shows,we calculate first analyzing the energy cost for event insertion and query the distance according to its quadrant relative to Si,by processing over a specified storage region.In this paper we utilize the Euclidean Distance as the metric of energy summarizing the average distance from arbitrary points to consumption,which means the routing cost between two nodes the storage region,thus we can estimate the expectations of is represented by the Euclidean distance between them.For in- Dquery(Si)or Devent(Si). sertion of event,we leverage the double ruling scheme[9][10] As mentioned before,if we construct a k-d tree based index to distribute the event information to all relevant replicate structure by starting from the root,for each index node at the storage nodes in the storage region.For one specified storage splitting step,we have two optional strategies to choose:to region,the event node first routes event message to the nearest apply the replicate storage scheme over Si or conduct further storage node inside the region,and further forwards it both splitting?We know that the energy cost for replicate storage horizontally and vertically to all storage nodes covered by scheme over Si is: the storage region.Thus the expected energy cost for event Erep(Sa)=fe·ProbE(Si)·Eevent((Si) distribution over the specified storage region Si is: +fq·Probo(S)·Equery(S) (11) Eevent(Si)=Devent(Si)+d.(N(Si)-1) (8) The energy cost Erep(Si)includes the overall event inser- where Devent(Si)denotes the expected distance from an tion cost,as we know the event access frequency for Si arbitrary event node to the nearest storage node in storage is fe.Probe(Si)and the energy cost for each insertion is region Si,and d is the cell size,N(Si)denotes the number Eevent(Si),and it also includes the overall query diffusion of storage nodes inside Si,as each storage node inside Si cost with query access frequency faProb(S)and energy receives exactly one message except the first accessed storage cost for each query forwarding Eqer(S). node,thus the diffusion cost inside S;is d.(N(S;)-1). And consider the k-d tree based index structure,let us Assume the root of the k-d tree is at level 0,and the k-d tree denote Tichild(Si)as the zone associated with left child of indexes the cell unit at level h,(apparently h should be an Si,and Trchid(Si)as the zone associated with its right child, even number)),thus at level i(0≤i≤h),we have: and we utilize E(Si)to represent the optimized energy cost N(Si)=2h-i (9) for region Si,thus the energy cost for further splitting can be represented as: And for query diffusion,we only need to reach one of those specified replicate storage nodes,an intuitive strategy is for Esplit(Si)=E(Tichila(Si))+E(Trchita(Si))(12) the query node to find the nearest storage node in the storage region Si,later we will introduce an optimized routing scheme Therefore the optimized energy cost for region Si is of the for query diffusion,thus the expected energy cost for query following form: diffusion over the specified storage region Si is: E(Si)=min{Erep(Si),E(Tichila(Si))+E(Trchild(Si))} Equery(Si)=Dquery(Si) (10) (13) 169 Authorized licensed use limited to:Nanjing University.Downloaded on July 11,2010 at 07:42:17 UTC from IEEE Xplore.Restrictions apply.query forwarding from query node to these various storage nodes. Based on the above observation, keeping these ranges of data to be hosted in a relatively small number of zones may be more efficient, but obviously the extreme situation to keep all ranges of data in one unique zone (with replicated storage scheme) is also not energy efficient, because although the query forwarding cost can be effectively reduced, large amount of energy will be drained for routing the same copy of sensor data to all replicate storages. Based on the above analysis, we must find an optimized storage strategy to control the granularity for the index structure as a tradeoff between the two extremum situations to achieve best energy efficiency. Therefore to construct an optimized k-d tree based index structure over the sensor network, if we build the k-d tree from the root, then at each splitting step of the k-d tree, we have two optional strategies: one strategy is to conduct further splitting for the specified zones, another strategy is to stop further splitting and thus apply the replicate storage scheme to all cells covered by the specified zones. The most crucial thing to construct the index structure is how to decide the optimized strategy at each splitting step along the k-d tree, so as to achieve the best energy efficiency. To solve the above optimization problem, we start by first analyzing the energy cost for event insertion and query processing over a specified storage region. In this paper we utilize the Euclidean Distance as the metric of energy consumption, which means the routing cost between two nodes is represented by the Euclidean distance between them. For insertion of event, we leverage the double ruling scheme[9][10] to distribute the event information to all relevant replicate storage nodes in the storage region. For one specified storage region, the event node first routes event message to the nearest storage node inside the region, and further forwards it both horizontally and vertically to all storage nodes covered by the storage region. Thus the expected energy cost for event distribution over the specified storage region Si is: Eevent(Si) = Devent(Si) + d · (N(Si) − 1) (8) where Devent(Si) denotes the expected distance from an arbitrary event node to the nearest storage node in storage region Si , and d is the cell size, N(Si) denotes the number of storage nodes inside Si , as each storage node inside Si receives exactly one message except the first accessed storage node, thus the diffusion cost inside Si is d · (N(Si) − 1). Assume the root of the k-d tree is at level 0, and the k-d tree indexes the cell unit at level h, (apparently h should be an even number), thus at level i (0 ≤ i ≤ h), we have: N(Si) = 2h−i (9) And for query diffusion, we only need to reach one of those specified replicate storage nodes, an intuitive strategy is for the query node to find the nearest storage node in the storage region Si , later we will introduce an optimized routing scheme for query diffusion, thus the expected energy cost for query diffusion over the specified storage region Si is: Equery(Si) = Dquery(Si) (10) where Dquery(Si) denotes the expected distance from an arbitrary query node to the nearest storage node in the storage region. And we can simply deduce that Dquery(Si) equals to Devent(Si). Fig. 3. Estimate the value for Dquery(Si) or Devent(Si) We estimate the expectations of Dquery(Si) or Devent(Si) as follows: for any event/query imposed inside the replicate storage region Si , as it only needs to find a nearest storage node inside Si , the expected distance is less than half the cell size, which is d/2; for any event/query imposed outside the replicate storage region Si , as Fig.3 shows, we calculate the distance according to its quadrant relative to Si , by summarizing the average distance from arbitrary points to the storage region, thus we can estimate the expectations of Dquery(Si) or Devent(Si). As mentioned before, if we construct a k-d tree based index structure by starting from the root, for each index node at the splitting step, we have two optional strategies to choose: to apply the replicate storage scheme over Si or conduct further splitting? We know that the energy cost for replicate storage scheme over Si is: Erep(Si) = fe · P robE (Si) · Eevent(Si) +fq · P robQ(Si) · Equery(Si) (11) The energy cost Erep(Si) includes the overall event insertion cost, as we know the event access frequency for Si is fe · P robE(Si) and the energy cost for each insertion is Eevent(Si), and it also includes the overall query diffusion cost with query access frequency fq · P robQ(Si) and energy cost for each query forwarding Equery(Si). And consider the k-d tree based index structure, let us denote Tlchild(Si) as the zone associated with left child of Si , and Trchild(Si) as the zone associated with its right child, and we utilize E(Si) to represent the optimized energy cost for region Si , thus the energy cost for further splitting can be represented as: Esplit(Si) = E(Tlchild(Si)) + E(Trchild(Si)) (12) Therefore the optimized energy cost for region Si is of the following form: E(Si) = min{Erep(Si), E(Tlchild(Si)) + E(Trchild(Si))} (13) 169 Authorized licensed use limited to: Nanjing University. Downloaded on July 11,2010 at 07:42:17 UTC from IEEE Xplore. Restrictions apply