from the k-d tree index structure,and embeds the k-d tree at 0.25 and 0.75 which divide the ranges to [0,0.25),[0.25 into sensor networks in a decentralized approach.Furthermore, 0.5),[0.5,0.75),and [0.75,1).They repeat this procedure KDDCS [7]presents a load-balanced storage scheme over until it reaches the corresponding leaf node.As an example. sensor networks based on k-d tree for multi-dimensional range consider event E =(0.3,0.8),for this event,the zone code queries.POOL[8]addresses some weakness of DIM such over the aforementioned example k-d tree is code(ZE)=011. as hot spot and scalability problems,and further proposes When answering a range query,DIM finds all zones whose a storage scheme to group index nodes together as a data value ranges overlap with the query range and sends the pool to preserve the neighborhood property of nearby multi- query to index nodes of those zones.For example,consider dimensional data,however,their group mechanism are only a range query(0.3,0.7]0.5,0.d,[0.4,0.8,0.7,0.9]),the based on the greatest and the second greatest attribute values, storage regions of zone code 010,011,110,1111 will be which has not been proved to be efficient enough for high involved. dimensional data processing. Building upon this,DIM achieves energy efficiency by forcing events whose attribute values are "close"to be stored III.PRELIMINARIES at the same or nearby nodes.However,some performance A.Motivation issues still remain to be challenged for the traditional DIM: First,DIM constructs the k-d tree structure only based on the two-dimensional sensor locations,without leveraging the estimated distributions of various data attributes and range queries,thus it may cause inefficiency for non-uniform event 110 010 01 insertions and corresponding range queries,which are actually 品 conventional for sensor network applications.Second,DIM assigns a unique zone for each sensor node,hence as the network size increases,the zones have to be further split into smaller zones,more zones of sensors are inevitably involved 100 101 in the query processing,which cannot be energy efficient (a)DIM's zone tree (b)The zone node and boundaries of DIM for range query processing.Therefore how to control the granularity of the k-d tree based index structure is crucial Fig.1.Example of DIM(Distributed Index for Multi-dimensional data for energy efficiency issue.Third,DIM has not considered an optimized approach to route those split sub-queries to specified storage areas,since the query diffusion cost dominates the It is essential to construct an index structure to map overall energy consumption for multi-dimensional range query the multi-dimensional event data to two-dimensional sensors. processing,it is of utmost importance to devise an optimized Obviously a centralized index for multi-dimensional range routing scheme for query forwarding. queries is not feasible for energy-efficiency reasons of sen- Therefore in this paper we try to address the problems raised sor networks.DIM [6]is the first work to support multi- above and we propose a framework which can efficiently dimensional range queries in sensor networks,it utilizes a handle the event insertions and range query processing in an classical database index,the k-d tree [5],and embeds such optimized approach.Our storage framework is also based on an index in a sensor network.DIM divides sensor nodes into the k-d tree and we mainly focus on achieving the optimized separate zones and each zone contains one exact sensor node granularity of k-d tree structure over sensor networks for best as the index node.Fig.1(a)depicts an example of DIM's k-d energy efficiency,and we further investigate into the optimized tree for zones and Fig.I(b)shows the corresponding zones scheme for range query forwarding which dominates the over the sensor network.Each intermediate node over this overall energy consumption. k-d tree based index structure denotes the split parameters for further branching,which include the specified attribute B.Modelling Event Data Distribution Ai(1 i<m)from the attribute list (A1,...,Am),and Consider there exist k attributes A1,A2,...,Ak for sensor associative splitting ranges for Ai.Each leaf node represents data,for each attribute Ai,we have minimum value min(Vi) the event data within specified multi-dimensional ranges. and maximum value max(Vi),we normalize attribute A;'s We introduce the principle of k-d tree to process multi- value into the region [0,1]by calculating: dimensional attribute data with the following example:For = Vi-min(Vi) attributes (A1,...,Am),assume that the maximum depth of the max(V))-min(V (1) k-d tree is k,k is a multiple of m,and this value of k is known to every node.DIM assigns a k bit zone code to an event as And we represent the data distribution over attribute A;as follows.For i between I and m,if Ai<0.5,the i-th bit of Pi(r),which is exactly a probability density function (pdf) the zone code is assigned 0,else 1.For i between m+1 and over the region [0,1].thus we have 2m,if Ai-m <0.25 or Ai-m E 0.5,0.75),the i-th bit of the zone is assigned 0,else 1,because the next level divisions are Pi(x)dx =1 (2) 167 Authorized licensed use limited to:Nanjing University.Downloaded on July 11,2010 at 07:42:17 UTC from IEEE Xplore.Restrictions apply.from the k-d tree index structure, and embeds the k-d tree into sensor networks in a decentralized approach. Furthermore, KDDCS [7] presents a load-balanced storage scheme over sensor networks based on k-d tree for multi-dimensional range queries. POOL[8] addresses some weakness of DIM such as hot spot and scalability problems, and further proposes a storage scheme to group index nodes together as a data pool to preserve the neighborhood property of nearby multidimensional data, however, their group mechanism are only based on the greatest and the second greatest attribute values, which has not been proved to be efficient enough for high dimensional data processing. III. PRELIMINARIES A. Motivation Fig. 1. Example of DIM (Distributed Index for Multi-dimensional data It is essential to construct an index structure to map the multi-dimensional event data to two-dimensional sensors. Obviously a centralized index for multi-dimensional range queries is not feasible for energy-efficiency reasons of sensor networks. DIM [6] is the first work to support multidimensional range queries in sensor networks, it utilizes a classical database index, the k-d tree [5], and embeds such an index in a sensor network. DIM divides sensor nodes into separate zones and each zone contains one exact sensor node as the index node. Fig.1(a) depicts an example of DIM’s k-d tree for zones and Fig.1(b) shows the corresponding zones over the sensor network. Each intermediate node over this k-d tree based index structure denotes the split parameters for further branching, which include the specified attribute Ai(1 ≤ i ≤ m) from the attribute list hA1, ..., Ami, and associative splitting ranges for Ai . Each leaf node represents the event data within specified multi-dimensional ranges. We introduce the principle of k-d tree to process multidimensional attribute data with the following example: For attributes hA1, ..., Ami, assume that the maximum depth of the k-d tree is k, k is a multiple of m, and this value of k is known to every node. DIM assigns a k bit zone code to an event as follows. For i between 1 and m, if Ai < 0.5, the i-th bit of the zone code is assigned 0, else 1. For i between m + 1 and 2m, if Ai−m < 0.25 or Ai−m ∈ [0.5, 0.75), the i-th bit of the zone is assigned 0, else 1, because the next level divisions are at 0.25 and 0.75 which divide the ranges to [0, 0.25), [0.25, 0.5), [0.5, 0.75), and [0.75, 1). They repeat this procedure until it reaches the corresponding leaf node. As an example, consider event E = (0.3, 0.8), for this event, the zone code over the aforementioned example k-d tree is code (ZE) = 011. When answering a range query, DIM finds all zones whose value ranges overlap with the query range and sends the query to index nodes of those zones. For example, consider a range query h[0.3, 0.7], [0.5, 0.6], [0.4, 0.8], [0.7, 0.9]i, the storage regions of zone code 010, 011, 110, 1111 will be involved. Building upon this, DIM achieves energy efficiency by forcing events whose attribute values are “close” to be stored at the same or nearby nodes. However, some performance issues still remain to be challenged for the traditional DIM: First, DIM constructs the k-d tree structure only based on the two-dimensional sensor locations, without leveraging the estimated distributions of various data attributes and range queries, thus it may cause inefficiency for non-uniform event insertions and corresponding range queries, which are actually conventional for sensor network applications. Second, DIM assigns a unique zone for each sensor node, hence as the network size increases, the zones have to be further split into smaller zones, more zones of sensors are inevitably involved in the query processing, which cannot be energy efficient for range query processing. Therefore how to control the granularity of the k-d tree based index structure is crucial for energy efficiency issue. Third, DIM has not considered an optimized approach to route those split sub-queries to specified storage areas, since the query diffusion cost dominates the overall energy consumption for multi-dimensional range query processing, it is of utmost importance to devise an optimized routing scheme for query forwarding. Therefore in this paper we try to address the problems raised above and we propose a framework which can efficiently handle the event insertions and range query processing in an optimized approach. Our storage framework is also based on the k-d tree and we mainly focus on achieving the optimized granularity of k-d tree structure over sensor networks for best energy efficiency, and we further investigate into the optimized scheme for range query forwarding which dominates the overall energy consumption. B. Modelling Event Data Distribution Consider there exist k attributes A1, A2, ..., Ak for sensor data, for each attribute Ai , we have minimum value min(Vi) and maximum value max(Vi), we normalize attribute Ai’s value into the region [0, 1] by calculating: V ′ i = Vi − min(Vi) max(Vi) − min(Vi) (1) And we represent the data distribution over attribute Ai as pi(x), which is exactly a probability density function (pdf) over the region x ∈ [0, 1], thus we have Z 1 0 pi(x)dx = 1 (2) 167 Authorized licensed use limited to: Nanjing University. Downloaded on July 11,2010 at 07:42:17 UTC from IEEE Xplore. Restrictions apply