Optimized Storage Placement over Large Scale Sensor Networks Lei Xie,Sanglu Lu,Yingchun Cao,and Daoxu Chen State Key Laboratory of Novel Software Technology Nanjing University.Nanjing,China Ixie@nju.edu.cn,sanglu@nju.edu.cn,yccao@nju.edu.cn,cdx@nju.edu.cn ABSTRACT they can process these queries,filter locally stored raw sen- Data storage has become an important issue for energy effi- sor data,and send out query replies to users[2].In this way cient data management over sensor networks.In this paper, a large amount of raw sensor data can be avoided for trans- we investigate into the optimized storage placement prob- mission such that the overall energy cost for data forwarding lem over large scale sensor networks,aiming to achieve min- is greatly reduced.Based on the above understanding,var- imized energy cost.In order to efficiently deal with the ious data-centric storage schemes [3]have been proposed to large scale deployment area with irregular shape,we pro- sufficiently leverage the sensors'local storage capability to pose to utilize the "hop"as the computation unit instead of achieve energy efficiency. the "node",such that the computation complexity can be greatly reduced.We propose methodologies to solve the op- However,on the other side,it has also been demonstrated timization problem respectively in situations for unlimited not energy-efficient to locally store the collected raw sensor number of storages and limited number of storages.The ul- data in all sensor nodes.This will introduce a large amount timate goal of this paper is to give a fundamental guidance of query diffusion cost,since the queries should be broad- of optimized storage placement for large scale sensor net- casted from the sink to each of the storage nodes.We note works.Simulation results confirm the performance of our that by appropriately deploying storage nodes over sensor methodologies. networks,the heavy load of query diffusion and raw sensor data forwarding can be alleviated,through making appro- Keywords priate tradeoffs between the above“pure push”and“pure pull”schemes[3.Therefore,a“push and pull'”based stor- Data Storage,Optimization,Storage Placement,Sensor Net- age scheme is essential to extract information from sensor work,Large Scale network,while achieving the overall energy efficiency 4 In this scheme,some intermediate "storage nodes"are de- 1 INTRODUCTION ployed over the sensor network,other ordinary nodes called Sensor networks in pervasive computing applications,such "forwarding nodes"just send their raw data upward to these as environment monitoring,health caring and target track- storages nodes along the routing tree,and queries are dif- ing,generate a large amount of data.Generally these data fused to these storage nodes to fetch those filtered sensor are collected from sensor nodes over the sensor network and data as query replies.Based on the above scenario,a chal- the end users retrieve them through diffusing specific queries lenging problem appears,that is how to place the storage from the sink into the network [1.Conventionally both raw nodes over the sensor network such that the overall energy sensor data and queries are continuously generated over the efficiency can be achieved. sensor network.Due to the limited battery power in these sensor nodes,it will greatly increase the overall energy cost To deal with this problem,Sheng et al.have proposed op- by simply forwarding all raw sensor data to the sink,more- timized algorithms based on dynamic programming to solve over,this will make sensor nodes around the sink heavily the storage placement problem in the tree based model 5 used and quickly exhausted in energy.We note that cur- However,in some industrial or research applications where rently some specially designed sensor nodes are equipped sensor networks are widely and densely deployed in a large with larger storage capability than normal sensors,thus they scale approach,it is laborious and time-consuming to com- can store a certain amount of raw sensor data in their local pute the optimized storage placement one by one based on storages.Hence when queries are diffused into these nodes the above algorithm.As a matter of fact,suppose the sensor nodes are uniformly distributed,it is not essential to com- pute the exact optimized storage placement due to the large scale deployment.Therefore,novel approaches should be proposed to simplify the computation of storage placement over the large scale sensor network,which is in irregular shape for general cases.In this paper,we investigate into this problem and propose optimized storage placement in the situations respectively with unlimited number of stor- ages and limited number of storages.Due to the large scale
Optimized Storage Placement over Large Scale Sensor Networks Lei Xie, Sanglu Lu, Yingchun Cao, and Daoxu Chen State Key Laboratory of Novel Software Technology Nanjing University, Nanjing, China lxie@nju.edu.cn, sanglu@nju.edu.cn, yccao@nju.edu.cn, cdx@nju.edu.cn ABSTRACT Data storage has become an important issue for energy effi- cient data management over sensor networks. In this paper, we investigate into the optimized storage placement problem over large scale sensor networks, aiming to achieve minimized energy cost. In order to efficiently deal with the large scale deployment area with irregular shape, we propose to utilize the “hop” as the computation unit instead of the “node”, such that the computation complexity can be greatly reduced. We propose methodologies to solve the optimization problem respectively in situations for unlimited number of storages and limited number of storages. The ultimate goal of this paper is to give a fundamental guidance of optimized storage placement for large scale sensor networks. Simulation results confirm the performance of our methodologies. Keywords Data Storage, Optimization, Storage Placement, Sensor Network, Large Scale 1. INTRODUCTION Sensor networks in pervasive computing applications, such as environment monitoring, health caring and target tracking, generate a large amount of data. Generally these data are collected from sensor nodes over the sensor network and the end users retrieve them through diffusing specific queries from the sink into the network [1]. Conventionally both raw sensor data and queries are continuously generated over the sensor network. Due to the limited battery power in these sensor nodes, it will greatly increase the overall energy cost by simply forwarding all raw sensor data to the sink, moreover, this will make sensor nodes around the sink heavily used and quickly exhausted in energy. We note that currently some specially designed sensor nodes are equipped with larger storage capability than normal sensors, thus they can store a certain amount of raw sensor data in their local storages. Hence when queries are diffused into these nodes, they can process these queries, filter locally stored raw sensor data, and send out query replies to users[2]. In this way, a large amount of raw sensor data can be avoided for transmission such that the overall energy cost for data forwarding is greatly reduced. Based on the above understanding, various data-centric storage schemes [3] have been proposed to sufficiently leverage the sensors’ local storage capability to achieve energy efficiency. However, on the other side, it has also been demonstrated not energy-efficient to locally store the collected raw sensor data in all sensor nodes. This will introduce a large amount of query diffusion cost, since the queries should be broadcasted from the sink to each of the storage nodes. We note that by appropriately deploying storage nodes over sensor networks, the heavy load of query diffusion and raw sensor data forwarding can be alleviated, through making appropriate tradeoffs between the above “pure push” and “pure pull” schemes[3]. Therefore, a “push and pull” based storage scheme is essential to extract information from sensor network, while achieving the overall energy efficiency [4]. In this scheme, some intermediate “storage nodes” are deployed over the sensor network, other ordinary nodes called “forwarding nodes” just send their raw data upward to these storages nodes along the routing tree, and queries are diffused to these storage nodes to fetch those filtered sensor data as query replies. Based on the above scenario, a challenging problem appears, that is how to place the storage nodes over the sensor network such that the overall energy efficiency can be achieved. To deal with this problem, Sheng et al. have proposed optimized algorithms based on dynamic programming to solve the storage placement problem in the tree based model [5]. However, in some industrial or research applications where sensor networks are widely and densely deployed in a large scale approach, it is laborious and time-consuming to compute the optimized storage placement one by one based on the above algorithm. As a matter of fact, suppose the sensor nodes are uniformly distributed, it is not essential to compute the exact optimized storage placement due to the large scale deployment. Therefore, novel approaches should be proposed to simplify the computation of storage placement over the large scale sensor network, which is in irregular shape for general cases. In this paper, we investigate into this problem and propose optimized storage placement in the situations respectively with unlimited number of storages and limited number of storages. Due to the large scale
deployment,we divide the deployment area into separate over large scale sensor networks,due to the huge computa- partitions and leverage the "hop"as the computation unit tion complexity for large scale deployment instead of the "node",which greatly reduces the computing complexity.By utilizing the characteristics of large scale 3. PROBLEM FORMULATION sensor network,our solution greatly simplifies the computa- Without loss of generality,we consider a data collecting sce- tion complexities. nario in a large-scale sensor network as follows:all sensor nodes are densely and uniformly distributed over a deploy- We aim to minimize the overall energy cost in the data accu- ment area,while the sink is set at some specified position mulation and query by appropriately placing storage nodes within the region in advance.The deployment area is re- over the sensor network.With unlimited number of stor- quired to satisfy the following convex property:for any de- ages,we analyze the optimized storage structure over the ployed sensor node.each point on the straight line segment deployment area,and further provide the optimized parame- that joins the sensor node with the sink also lies within the ters for storage deployment according to the irregular shapes deployment area.The exact shape of the deployment area is of the deployment area.With limited number of storages not restricted,either regular or irregular shape is included. we reduce the problem into an unbounded knapsack prob- We classify these sensor nodes into forwarding nodes and lem.Furthermore,we respectively propose dynamic pro- storage nodes,where storage nodes store sensor data and gramming based algorithm and greedy approximation algo- forwarding nodes simply forward their data to those storage rithm to effectively solve this problem. nodes.For ease of presentation,we can also regard the sink node as a storage node.Raw sensor data collected from The rest of this paper is organized as follows,Section II forwarding nodes are continuously sent upward along the reviews the related work.Section III defines problems for routing tree to the first parent storage node they meet,and storage node placement in the aforementioned scenario.Sec- queries are diffused from the sink to all storage nodes to tion IV and Section V respectively present optimized storage fetch the most recent sensed data,query replies will be sent placement methodologies for unlimited number of storages back from storage nodes to the sink along the routing tree and limited number of storages.Section VI conducts per- structure.Fig.1 depicts the data access model with storage formance analysis.Section VII presents simulation results. nodes.Due to the dense and uniform distribution of sensor Section VIII concludes the paper. nodes,for ease of analysis,we assume to deal with a uniform topology structure for the large-scale sensor network,hence 2.RELATED WORK the network is supposed to have a near-symmetric routing There have been a lot of research works for data-centric tree topology over the deployment area,which implies nodes storage schemes over sensor network,which mainly include at the same hop from the sink have similar subtree topolo- structured storage scheme and unstructured storage scheme gies. Structured storage scheme mainly utilizes a mapping func- tion such as geographic hash function to map event infor- mation to specified geographic points,and leverages routing algorithms like GPSR 6 to route messages to corresponding storage nodes,research works including GHT 6,DIMEN- SIONS [7]and DIM [8]belong to this scheme. Unstruc- tured storage scheme mainly utilizes the "push-pull"scheme to disseminate and gather information,research works in- cluding Double Ruling [9.Comb-Needles 10 belong to this scheme.Fang et.al.[11 propose a hybrid storage scheme to implement the event storage and query.Trigoni et.al. Forwarding Nodes Reply 12 presents a hybrid push and pull scheme for tree based Raw Dat Storage Nedes● structure where queries are injected from the sink.Ahn et.al.[13]conduct fundamental analysis over the scalability Figure 1:Data access model with storage nodes, of these two kinds of data-centric storage schemes. the black nodes denote the storage nodes while the white nodes denote the forwarding nodes. Recent research works focus on the storage placement prob- lem to enhance energy efficiency and load balance.Zhang We define the following parameters for the above scenario: et.al.14 propose a ring based index structure to help re- for data generation,we define rd as data generate rate and sd duce energy consumption of storage schemes.KDDCS [15] as the raw data size;for query diffusion,we use re to denote presents a load-balanced storage scheme for sensor networks query rate and sa to denote query size;for query reply,we based on K-D tree.Sheng et.al.present optimized algo- use a to denote reply reduction rate,for input t,which is rithms based on dynamic programming to solve the storage the size of raw data generated by a set of nodes,function placement problem over fixed tree model 516.moreover f(r)=ox for a E (0,1 returns the size of processed data they propose an approximation algorithm to solve the stor- as query replies. age placement problem over randomized dynamic tree model 17.Their dynamic programming methodology must save a For the large scale deployed sensor network,the objective is large amount of intermediate state parameters,and compute to place storage sensor nodes (and hence forwarding sensor from bottom to top along the tree based routing structure nodes)over the deployment area with irregular shape such While achieving a fair degree of accuracy,their optimized that the overall energy cost is minimized.Then the prob- solutions.however,is not suitable for the storage planning lems to solve are as follows:(1)Given unlimited storage
deployment, we divide the deployment area into separate partitions and leverage the “hop” as the computation unit instead of the “node”, which greatly reduces the computing complexity. By utilizing the characteristics of large scale sensor network, our solution greatly simplifies the computation complexities. We aim to minimize the overall energy cost in the data accumulation and query by appropriately placing storage nodes over the sensor network. With unlimited number of storages, we analyze the optimized storage structure over the deployment area, and further provide the optimized parameters for storage deployment according to the irregular shapes of the deployment area. With limited number of storages, we reduce the problem into an unbounded knapsack problem. Furthermore, we respectively propose dynamic programming based algorithm and greedy approximation algorithm to effectively solve this problem. The rest of this paper is organized as follows, Section II reviews the related work. Section III defines problems for storage node placement in the aforementioned scenario. Section IV and Section V respectively present optimized storage placement methodologies for unlimited number of storages and limited number of storages. Section VI conducts performance analysis. Section VII presents simulation results. Section VIII concludes the paper. 2. RELATED WORK There have been a lot of research works for data-centric storage schemes over sensor network, which mainly include structured storage scheme and unstructured storage scheme. Structured storage scheme mainly utilizes a mapping function such as geographic hash function to map event information to specified geographic points, and leverages routing algorithms like GPSR [6] to route messages to corresponding storage nodes, research works including GHT [6], DIMENSIONS [7] and DIM [8] belong to this scheme. Unstructured storage scheme mainly utilizes the “push-pull” scheme to disseminate and gather information, research works including Double Ruling [9], Comb-Needles [10] belong to this scheme. Fang et.al. [11] propose a hybrid storage scheme to implement the event storage and query. Trigoni et.al. [12] presents a hybrid push and pull scheme for tree based structure where queries are injected from the sink. Ahn et.al. [13] conduct fundamental analysis over the scalability of these two kinds of data-centric storage schemes. Recent research works focus on the storage placement problem to enhance energy efficiency and load balance. Zhang et. al. [14] propose a ring based index structure to help reduce energy consumption of storage schemes. KDDCS [15] presents a load-balanced storage scheme for sensor networks based on K-D tree. Sheng et. al. present optimized algorithms based on dynamic programming to solve the storage placement problem over fixed tree model [5][16], moreover, they propose an approximation algorithm to solve the storage placement problem over randomized dynamic tree model [17]. Their dynamic programming methodology must save a large amount of intermediate state parameters, and compute from bottom to top along the tree based routing structure. While achieving a fair degree of accuracy, their optimized solutions, however, is not suitable for the storage planning over large scale sensor networks, due to the huge computation complexity for large scale deployment. 3. PROBLEM FORMULATION Without loss of generality, we consider a data collecting scenario in a large-scale sensor network as follows: all sensor nodes are densely and uniformly distributed over a deployment area, while the sink is set at some specified position within the region in advance. The deployment area is required to satisfy the following convex property: for any deployed sensor node, each point on the straight line segment that joins the sensor node with the sink also lies within the deployment area. The exact shape of the deployment area is not restricted, either regular or irregular shape is included. We classify these sensor nodes into forwarding nodes and storage nodes, where storage nodes store sensor data and forwarding nodes simply forward their data to those storage nodes. For ease of presentation, we can also regard the sink node as a storage node. Raw sensor data collected from forwarding nodes are continuously sent upward along the routing tree to the first parent storage node they meet, and queries are diffused from the sink to all storage nodes to fetch the most recent sensed data, query replies will be sent back from storage nodes to the sink along the routing tree structure. Fig.1 depicts the data access model with storage nodes. Due to the dense and uniform distribution of sensor nodes, for ease of analysis, we assume to deal with a uniform topology structure for the large-scale sensor network, hence the network is supposed to have a near-symmetric routing tree topology over the deployment area, which implies nodes at the same hop from the sink have similar subtree topologies. Figure 1: Data access model with storage nodes, the black nodes denote the storage nodes while the white nodes denote the forwarding nodes. We define the following parameters for the above scenario: for data generation, we define rd as data generate rate and sd as the raw data size; for query diffusion, we use rq to denote query rate and sq to denote query size; for query reply, we use α to denote reply reduction rate, for input x, which is the size of raw data generated by a set of nodes, function f(x) = αx for α ∈ (0, 1] returns the size of processed data as query replies. For the large scale deployed sensor network, the objective is to place storage sensor nodes (and hence forwarding sensor nodes) over the deployment area with irregular shape such that the overall energy cost is minimized. Then the problems to solve are as follows: (1) Given unlimited storage
nodes,how to deploy storage nodes so as to attain the min PROOF.According to conclusion from [5],all storage nodes imum overall energy cost?(2)Given limited storage nodes have storage ancestors over the tree in the optimal solution (number of storage nodes=K),how to deploy storage nodes thus we can deduce the optimized storage placement struc- so as to attain the minimum overall energy cost?Most im- ture is a storage area surrounding the sink node.As the portant of all,the corresponding solutions are required to be network topology in the unit zone is uniform and symmet- scalable to the large scale deployment in computation com- ric,we can prove the storage area is a fan-shaped area by plexity,such that the optimized storage placement can be contradiction.As Fig.3 shows,suppose the storage area in figured out in an efficient approach. a unit zone is in an irregular shape instead of a regular fan shape,as the gray area depicts.Inside the storage area,we 4. UNLIMITED NUMBER OF STORAGES can find a fan-shaped region R with the maximum radius When the storage capability over each sensor node is large then we randomly choose a storage node outside R,we de- enough to hold some raw sensor data for further usage,then note it as node A.Due to the symmetry of the topology any sensor node has the opportunity to become a storage any node with the same hop of node A,e.g.,node B,must node,thus we can suppose to have unlimited number of have the same opportunity to be storage nodes in the op- timal solution.However,node B is set to be a forwarding storages,every sensor node over the routing tree structure can switch to the role of storage node if necessary node in the former hypothesis.Therefore we get a contradic- tion against the former hypothesis,hence the theorem gets proved.☐ 4.1 Optimized Storage Placement Structure Before analyzing on the optimized storage placement,we give the definition of unit zone.A unit zone is a fan-shaped Sink region rooted from the sink within the deployment area,it has a complete tree based topology with the minimum gran- ularity,which we called unit topology.Fig.2 illustrates an example of the unit zones within the gray deployment area. Due to the irregular shape of the deployment area,the max- imum hop varies among the unit zones,but the degree 0 is equivalent for all unit zones.Therefore,the overall deploy- ment area can be divided into a certain number of unit zones and the whole topology is composed of these corresponding unit topologies.The optimal storage placement inside each Forwarding Area unit zone is independent with each other among the unit zones. Figure 3:Proof of Theorem 1,we prove it by con- tradiction. Unit Zone ●、 5 4.2 Calculating Optimized Parameters According to the conclusion from [5]:If ara rd,then the optimal tree with the minimum energy cost contains only forwarding nodes (except for the root).If ara <rd,there erits a storage area for the optimal tree.,we know that if ara 2 rd,the optimized solution includes no storage nodes. Therefore in the remaining of this paper we only consider the are<rd situation.According to Theorem 1,it is proved that the optimized storage structure inside each unit zone is a fan-shaped area around the sink,there must exist an opti- Figure 2:Unit zone mized hop-count kopt for the fan-shaped area,where sensors within the kopt hops are storage nodes,and sensors outside We rely on the following conclusion to obtain the optimized the kopt hops are forwarding nodes.Hence kopt actually de- storage structure,which is first proposed by 5:In the op- notes the boundary of the storage area.Next we will calcu- timal tree,if si is a forwarding node,all of its descendants late the optimized parameter kop for the optimized storage are forwarding nodes as well.If si is a storage node,all area.Without loss of generality,in the rest of this paper its ancestors are storage nodes as well.According to this we utilize the equal sign "="to denote the expected value conclusion,we have Theorem 1. of specified variables. Assume the degree of a unit zone is 0(0<6<),the THEOREM 1.With unlimited number of storages,the op- maximum hop of a specified unit topology is n,the average timized placement structure for storage nodes in the unit hop distance is d,and the network node density is p,thus the zone is a fan-shaped area,where sensors inside the fan- number of sensor nodes for the ith hop is N =0(2i-1)d2p. shaped area are storage nodes,and others are forwarding Now we analyze the energy consumption model.To transmit nodes. s data units,the energy costs of the sender and receiver are
nodes, how to deploy storage nodes so as to attain the minimum overall energy cost? (2) Given limited storage nodes (number of storage nodes= K), how to deploy storage nodes so as to attain the minimum overall energy cost? Most important of all, the corresponding solutions are required to be scalable to the large scale deployment in computation complexity, such that the optimized storage placement can be figured out in an efficient approach. 4. UNLIMITED NUMBER OF STORAGES When the storage capability over each sensor node is large enough to hold some raw sensor data for further usage, then any sensor node has the opportunity to become a storage node, thus we can suppose to have unlimited number of storages, every sensor node over the routing tree structure can switch to the role of storage node if necessary. 4.1 Optimized Storage Placement Structure Before analyzing on the optimized storage placement, we give the definition of unit zone. A unit zone is a fan-shaped region rooted from the sink within the deployment area, it has a complete tree based topology with the minimum granularity, which we called unit topology. Fig.2 illustrates an example of the unit zones within the gray deployment area. Due to the irregular shape of the deployment area, the maximum hop varies among the unit zones, but the degree θ is equivalent for all unit zones. Therefore, the overall deployment area can be divided into a certain number of unit zones and the whole topology is composed of these corresponding unit topologies. The optimal storage placement inside each unit zone is independent with each other among the unit zones. Unit Zone Figure 2: Unit zone We rely on the following conclusion to obtain the optimized storage structure, which is first proposed by [5]: In the optimal tree, if si is a forwarding node, all of its descendants are forwarding nodes as well. If si is a storage node, all its ancestors are storage nodes as well. According to this conclusion, we have Theorem 1. Theorem 1. With unlimited number of storages, the optimized placement structure for storage nodes in the unit zone is a fan-shaped area, where sensors inside the fanshaped area are storage nodes, and others are forwarding nodes. Proof. According to conclusion from [5], all storage nodes have storage ancestors over the tree in the optimal solution, thus we can deduce the optimized storage placement structure is a storage area surrounding the sink node. As the network topology in the unit zone is uniform and symmetric, we can prove the storage area is a fan-shaped area by contradiction. As Fig.3 shows, suppose the storage area in a unit zone is in an irregular shape instead of a regular fan shape, as the gray area depicts. Inside the storage area, we can find a fan-shaped region R with the maximum radius, then we randomly choose a storage node outside R, we denote it as node A. Due to the symmetry of the topology, any node with the same hop of node A, e.g., node B, must have the same opportunity to be storage nodes in the optimal solution. However, node B is set to be a forwarding node in the former hypothesis. Therefore we get a contradiction against the former hypothesis, hence the theorem gets proved. Figure 3: Proof of Theorem 1, we prove it by contradiction. 4.2 Calculating Optimized Parameters According to the conclusion from [5]: If αrq ≥ rd, then the optimal tree with the minimum energy cost contains only forwarding nodes (except for the root). If αrq < rd, there exits a storage area for the optimal tree., we know that if αrq ≥ rd, the optimized solution includes no storage nodes. Therefore in the remaining of this paper we only consider the αrq < rd situation. According to Theorem 1, it is proved that the optimized storage structure inside each unit zone is a fan-shaped area around the sink, there must exist an optimized hop-count kopt for the fan-shaped area, where sensors within the kopt hops are storage nodes, and sensors outside the kopt hops are forwarding nodes. Hence kopt actually denotes the boundary of the storage area. Next we will calculate the optimized parameter kopt for the optimized storage area. Without loss of generality, in the rest of this paper we utilize the equal sign “=” to denote the expected value of specified variables. Assume the degree of a unit zone is θ(0 < θ < π), the maximum hop of a specified unit topology is n, the average hop distance is d, and the network node density is ρ, thus the number of sensor nodes for the ith hop is Ni = θ(2i−1)d 2 ρ. Now we analyze the energy consumption model. To transmit s data units, the energy costs of the sender and receiver are
etrs and ere.s,where etr and ere are the energy costs Therefore the overall energy consumption can be depicted for transmitting and receiving a unit data respectively,and as: etr is also relevant to the distance between the sender and receiver.For simplicity of presentation,the receiving energy Etotal(k)= C[(E,+Eg)·N+E,·Nk+ (Ea·N) cost is assigned to the transmitter without changing the total energy cost.Here we use Si to denote an arbitrary sensor (5) node in the ith hop,when sensor Si in the ith hop sends s And we let data units to Si in the jth hop,the energy cost of S;is 0, Rr=rqasd and the energy consumed by Si is: Rd rasd (etr +ere)s if Si is parent of Si Ro =Tqsa l(etr+ere·c),s if Sj is one of S.'s children· (1) To calculate the optimal hop count kopt for minimized over- In the above equation ci is the number of Si's children.For all energy consumption,we obtain the following discussion,we normalize the energy costs by OEtotal(k) =0→ (etr +ere)for ease of presentation.Thus,to transfer s data 8k units from Si to its parent,the transmitting energy will be (Ra-Rr).k2+(Rr-Ra+2Rg).k+C=0, s and to broadcast s data units to its children,sensor S will consume bis energy,where bi=Based on where tr十er the assumption of near-symmetric topology we can denote ci as the average number of children for sensor nodes in the C=1-n(R-R)+。rRg 6 ith hop,where c.And we set Ti as the Thus kopt has two possible solutions average number of sensor nodes in the subtree for any node in the ith hop (including the ith hop node itself),thus we =BB-2)+d=B-B-2)-4 have: 2(Ra-Rr) 2(Ra-R,】 =1+=1+2- where 2i-1 6=V(R,-Ra+2Rg)2-4C(Ra-R,) As we deduce from arRr,therefore As we know the energy cost Ei for a sensor node in the ith we get: hop consists of three types of energy costs: Ra 6 .1 Ei=Eq+Er+Ed. =互--R-2Ra-R1 we obsolete k2 and finally obtain the diffusion,Er denotes energy cost for query reply,and Ea optimal hop count: denotes energy cost for raw data forwarding.Hence based on the above energy model,we calculate the energy con- kopt (Ra-R-2Ra)+8 (6) 2(Ra-R) sumption for each ith hop,here we set the sink at the Oth hop. According to Eq.(6),it is known that the optimal hop We use k to denote the last hop of the storage node area. count for the storage area inside the unit zone,i.e.kopt, depends on the maximum hop count of the corresponding For storage nodes within the kth hop (1<i<k )we have: unit topology,i.e.n.Furthermore,it can be inferred that Er rqa Tisd the larger value of n is,the larger value of kopt can be ob- Ed=0 (2) tained.Based on the above analysis,if there are unlimited Ea birasa number of storages,we can compute the optimized storage placement as follows:For a specified deployment area with Here we set the reply data size as aTisd for the snapshot irregular shape,it can be approximately divided into a cer- query only fetches the most recent sensing data,and we set tain number of fan-shaped partitions according to the irreg- E=0 because although the storage nodes generate raw ular shape.Each fan-shaped partition is rooted from the data locally,they don't need to forward these data until sink and must contain no less than one unit zones.More- they're sent out as query replies.For storage nodes right in over,the fan-shaped partition is relatively regular in shape the kth hop,(i=k),we have: as compared to the overall deployment area,thus we can av- erage thethe hop counts of leaf nodes inside the fan-shaped Er Tga Tk sd partition and approximate the maximum hop count as n. Ed=0 (3) Therefore,we can approximately compute the optimal hop E=0 count Kopt for the storage area respectively for each fan- Here we set Eg =0 because the last hop of storage nodes shaped partition.Then,by putting the pieces together,the need not to diffuse queries to their forwarding node children. whole storage area can be obtained.Fig.4 illustrates the ba- For forwarding nodes outside the kth hop (k+1si<n): sic idea of the method.Theoretically,when the deployment area is divided into more refined partitions,the solution is Er=0 closer to the ideal optimal storage structure.In extreme Ed Tirasd. (4) case,when each of the fan-shaped partitions is exactly a E=0 unit zone,the optimal storage structure can be obtained
etr · s and ere · s, where etr and ere are the energy costs for transmitting and receiving a unit data respectively, and etr is also relevant to the distance between the sender and receiver. For simplicity of presentation, the receiving energy cost is assigned to the transmitter without changing the total energy cost. Here we use Si to denote an arbitrary sensor node in the ith hop, when sensor Si in the ith hop sends s data units to Sj in the jth hop, the energy cost of Sj is 0, and the energy consumed by Si is: (etr + ere) · s if Sj is parent of Si (etr + ere · ci) · s if Sj is one of Si’s children . (1) In the above equation ci is the number of Si’s children. For the following discussion, we normalize the energy costs by (etr + ere) for ease of presentation. Thus, to transfer s data units from Si to its parent, the transmitting energy will be s and to broadcast s data units to its children, sensor Si will consume bi · s energy, where bi = etr+ere·ci etr+ere . Based on the assumption of near-symmetric topology we can denote ci as the average number of children for sensor nodes in the ith hop, where ci = Ni+1 Ni = 2i+1 2i−1 . And we set |Ti| as the average number of sensor nodes in the subtree for any node in the ith hop (including the ith hop node itself), thus we have: |Ti| = 1 + Pn i+1 Nj Ni = 1 + n 2 − i 2 2i − 1 . As we know the energy cost Ei for a sensor node in the ith hop consists of three types of energy costs: Ei = Eq + Er + Ed. In the above equation, Eq denotes energy cost for query diffusion, Er denotes energy cost for query reply, and Ed denotes energy cost for raw data forwarding. Hence based on the above energy model, we calculate the energy consumption for each ith hop, here we set the sink at the 0th hop. We use k to denote the last hop of the storage node area. For storage nodes within the kth hop (1 Rr, therefore we get: k2 = 1 2 − Rq Rd − Rr − δ 2(Rd − Rr) < 1 2 . As we have kopt ≥ 1 we obsolete k2 and finally obtain the optimal hop count: kopt = (Rd − Rr − 2Rq) + δ 2(Rd − Rr) . (6) According to Eq. (6), it is known that the optimal hop count for the storage area inside the unit zone, i.e. kopt, depends on the maximum hop count of the corresponding unit topology, i.e. n. Furthermore, it can be inferred that the larger value of n is, the larger value of kopt can be obtained. Based on the above analysis, if there are unlimited number of storages, we can compute the optimized storage placement as follows: For a specified deployment area with irregular shape, it can be approximately divided into a certain number of fan-shaped partitions according to the irregular shape. Each fan-shaped partition is rooted from the sink and must contain no less than one unit zones. Moreover, the fan-shaped partition is relatively regular in shape as compared to the overall deployment area, thus we can average the the hop counts of leaf nodes inside the fan-shaped partition and approximate the maximum hop count as nb. Therefore, we can approximately compute the optimal hop count kopt for the storage area respectively for each fanshaped partition. Then, by putting the pieces together, the whole storage area can be obtained. Fig.4 illustrates the basic idea of the method. Theoretically, when the deployment area is divided into more refined partitions, the solution is closer to the ideal optimal storage structure. In extreme case, when each of the fan-shaped partitions is exactly a unit zone, the optimal storage structure can be obtained
7000 6000 5000 4000 3000 2000 1000 10 15 The value of maximum hops Figure 5:Compute coverage area as the maximum hop count increases (d=3m) Figure 4:Compute optimized storage structure with unlimited storage case.We denote these nodes as S.Then as we increase the number of storages from K to K*,the nodes in S will obvi- 5.LIMITED NUMBER OF STORAGES ously switch from storage nodes to forwarding nodes,being In some situations we observe that not all sensor nodes have replaced by other storage nodes within the koptth hop inside each unit zone.During the process,we denote those nodes enough local storages because more storage will come along which switch from forwarding nodes to storage nodes as R. with increased hardware cost,we are only allowed to deploy Based on the above analysis.due to the optimization effect a limited number of specially designed storage nodes for en- ergy efficiency.When dealing with the storage placement of the new storage structure,we obtain the energy reduction AE(R)<AE(S).Due to the increment of K,we know that problem with limited number of storages,it means that the the size of R is larger than the size of S.According to the number of storages,K,is smaller than the optimal num- above inequality,with K limited number of storages,we can ber of storages,K*,for unlimited storages,otherwise it can propose a better optimized solution as compared to the cur- be reduced to the unlimited storage problem.As described rent solution:switch part (or:all)of the storage nodes in S above we only consider the arg<rd situation. to forwarding nodes,and switch equal number of nodes in the R to be new storage nodes,we can rebuild part of the 5.1 Optimized Storage Placement Structure new storage structure and thus achieve better performance. We first give the definition of energy reduction.The energy Hence we get the contradiction with the former assumption, reduction of a node Si means the changed energy cost for S the theorem gets proved. to switch from a forwarding node to a storage node,while the roles of other nodes remain unchanged,we denote it as AE(S).Therefore,when a node Si is selected as a storage As indicated in Theorem 2,it can be inferred that in the node in the optimal solution,it will always satisfy AE(Si)< optimal storage placement with limited number of storages. 0,and its absolute value AE(Si)denotes the amount of the storage node set is a subset of the storage nodes in the energy cost reduction.For ease of representation,in the optimal storage placement with unlimited number of stor- following part we also use the energy reduction AE(S)to ages.Therefore,the solution space for the optimized storage express the changed energy cost for a bunch of nodes S to placement is greatly reduced.Moreover,as the deployment switch from forwarding nodes to storage nodes.Similar to area is divided into a certain number of fan-shaped parti- the strategy proposed in unlimited number of storages,as tions,each fan-shaped partition is fairly regular in shape, Fig.4 illustrates,we divide the deployment area in irregular then nodes in the same hop should have the same opportu- shape into a certain number of fan-shaped partitions.Then nity to be storage nodes or forwarding nodes.Hence we can we compute the optimized storage placement inside each utilize the hop as the computation unit instead of the node partition.The optimized storage placement relies on the then the compute complexity is further reduced.As a matter following conclusion: of fact,even for a large scale sensor network,the maximum hop is very limited.Fig.5 illustrates the coverage area of THEOREM 2.In the optimal storage placement with lim- a conventional sensor network as the maximum hop count ited number of storages,the storage nodes are all within the increases.Here we set the average hop distance d 3m. optimal storage area with unlimited number of storages. We note that as the maximum hop count reaches 15,the coverage area is already near 6500 m2,which is a rather large scale deployment for conventional sensor network ap- PROOF.We prove this theorem by contradiction.Sup- plications.Thus the maximum hop count is smaller than pose in the optimal storage placement with unlimited num- 15 for conventional cases.Therefore,with limited number ber of storages,the essential number of storages is K.As- of storages,it is practical to enumerate all possible storage sume in the optimal solution with K limited number of stor- placement solutions for any unit zone and compare their ages,there exist some storage nodes outside the boundary of performance.The size of solution space is s=2opt,as the optimized storage area calculated in unlimited storage conventionally kopt 15,s is smaller than 215 =32768
Figure 4: Compute optimized storage structure with unlimited storage 5. LIMITED NUMBER OF STORAGES In some situations we observe that not all sensor nodes have enough local storages because more storage will come along with increased hardware cost, we are only allowed to deploy a limited number of specially designed storage nodes for energy efficiency. When dealing with the storage placement problem with limited number of storages, it means that the number of storages, K, is smaller than the optimal number of storages, K∗ , for unlimited storages, otherwise it can be reduced to the unlimited storage problem. As described above we only consider the αrq < rd situation. 5.1 Optimized Storage Placement Structure We first give the definition of energy reduction. The energy reduction of a node Si means the changed energy cost for Si to switch from a forwarding node to a storage node, while the roles of other nodes remain unchanged, we denote it as ∆E(Si). Therefore, when a node Si is selected as a storage node in the optimal solution, it will always satisfy ∆E(Si) < 0, and its absolute value |∆E(Si)| denotes the amount of energy cost reduction. For ease of representation, in the following part we also use the energy reduction ∆E(S) to express the changed energy cost for a bunch of nodes S to switch from forwarding nodes to storage nodes. Similar to the strategy proposed in unlimited number of storages, as Fig.4 illustrates, we divide the deployment area in irregular shape into a certain number of fan-shaped partitions. Then we compute the optimized storage placement inside each partition. The optimized storage placement relies on the following conclusion: Theorem 2. In the optimal storage placement with limited number of storages, the storage nodes are all within the optimal storage area with unlimited number of storages. Proof. We prove this theorem by contradiction. Suppose in the optimal storage placement with unlimited number of storages, the essential number of storages is K∗ . Assume in the optimal solution with K limited number of storages, there exist some storage nodes outside the boundary of the optimized storage area calculated in unlimited storage 0 5 10 15 0 1000 2000 3000 4000 5000 6000 7000 The value of maximum hops The coverage area (m 2 ) Figure 5: Compute coverage area as the maximum hop count increases (d=3m) case. We denote these nodes as S. Then as we increase the number of storages from K to K∗ , the nodes in S will obviously switch from storage nodes to forwarding nodes, being replaced by other storage nodes within the koptth hop inside each unit zone. During the process, we denote those nodes which switch from forwarding nodes to storage nodes as R. Based on the above analysis, due to the optimization effect of the new storage structure, we obtain the energy reduction ∆E(R) < ∆E(S). Due to the increment of K, we know that the size of R is larger than the size of S. According to the above inequality, with K limited number of storages, we can propose a better optimized solution as compared to the current solution: switch part (or: all) of the storage nodes in S to forwarding nodes, and switch equal number of nodes in the R to be new storage nodes, we can rebuild part of the new storage structure and thus achieve better performance. Hence we get the contradiction with the former assumption, the theorem gets proved. As indicated in Theorem 2, it can be inferred that in the optimal storage placement with limited number of storages, the storage node set is a subset of the storage nodes in the optimal storage placement with unlimited number of storages. Therefore, the solution space for the optimized storage placement is greatly reduced. Moreover, as the deployment area is divided into a certain number of fan-shaped partitions, each fan-shaped partition is fairly regular in shape, then nodes in the same hop should have the same opportunity to be storage nodes or forwarding nodes. Hence we can utilize the hop as the computation unit instead of the node, then the compute complexity is further reduced. As a matter of fact, even for a large scale sensor network, the maximum hop is very limited. Fig.5 illustrates the coverage area of a conventional sensor network as the maximum hop count increases. Here we set the average hop distance d = 3m. We note that as the maximum hop count reaches 15, the coverage area is already near 6500 m2 , which is a rather large scale deployment for conventional sensor network applications. Thus the maximum hop count is smaller than 15 for conventional cases. Therefore, with limited number of storages, it is practical to enumerate all possible storage placement solutions for any unit zone and compare their performance. The size of solution space is s = 2kopt , as conventionally kopt ≤ 15, s is smaller than 215 = 32768
Based on the above analysis,if the deployment area is a cir- follows: cular region with the sink in the center,then due to the sym- m 2opt, metry of the topology,there must exist a consistent value for the parameter kopt to denote the boundary of the stor- maximize E[j·x, (8) age area.Then we enumerate all 2pt solutions for the i=1j=1 storage placement in the unit zone and maintain two tables: subject to E1..2pt and N1..2"pt.If we use the notion item to de- note a solution in a specified unit zone,then as compared to the all forwarding node solution,E[j](1<j<2kopt) N[·x,≤K =1= is used to hold the reduced energy for the jth item and Nj(1<j<2"P)is used to hold the essential number of 2kopti storages for the jth item.Hence the problem is formulated r,·0≤:for all1≤i≤m as follows: 0=1 Here ri;denotes the number of used copies of the jth kind of item in partition Pi.The objective is to maximize the overall reduced energy cost.The first constraint means that the overall number of storage nodes should be no more than kopt K.The second constraint means that,for each partition maximize Ej· (7) P,the overall number of items should be no more than 三1 which is the number of unit zones in Pi. subject to Obviously the above two unbounded multiple knapsack prob- 2kopt lems are all NP-hard.Therefore there exist no algorithms ∑N[j·x≤K to correctly solve the problem in polynomial-time.Hence i1 in the following part we propose algorithms to solve this problem in pseudo-polynomial time. 工50≤π 5.2 Dominance Relations Based on the formulations in the unbounded knapsack prob- lem,we observe that there exists such a property which can be called Dominance Relations:For a given item i in a spec- Here rj(1<j<kopt)denotes the number of used copies ified partition,suppose we could find another item j such that for the jth kind of item,which actually means the num- ber of unit zones applied with the jth solution for storage N[]≤N[ (9) placement.The objective is to maximize the overall reduced 1E≤E, energy cost.The first constraint means that the overall num- ber of storage nodes should be no more than K.The second and the above inequalities are satisfied with at least one less- than relationship.Then item i cannot appear in the optimal constraint means that the overall number of items should be solution,since we could always improve any potential solu- no more than,which is the number of unit zones.Hence the problem is reduced to the two-dimensional unbounded tion containing i by replacing i with j.Therefore we can knapsack problem:given the 2pt kinds of items.each with disregard the item i altogether.In such cases,item j is said a value ELi],a weight N[i]and a space 0,determine the to dominate i.Therefore,solving the unbounded knapsack problem can be made easier by throwing away items that number of each item to put into the knapsack,so that the total value is maximized and the total weight and space are will never be needed.Hence,we can sufficiently reduce the respectively less than or equal to a given limit,while the size of search space from 2to a certain level by leveraging the Dominance Relations. number of copies of each kind of item is not limited Furthermore,if the deployment area is in an irregular shape Therefore,in order to find the inherent dominance relations, we propose an algorithm as illustrated in Algorithm1.Ac- it can be divided into a certain number of fan-shaped par- titions.Suppose the number of the fan-shaped partitions cording to the algorithm,let n=2pt,the compute com- is m and each partition Pi has a degree of Bi(1 <i<m) plexities of the sorting and the following iterations are re- We first estimate the parameter kopt.;for each fan-shaped spectively O(n log n)and O(n),hence the overall compute complexity is O(n).In fact,due to the remove of the domi- partition Pi according to Eq.(6).For each partition Pi nated items in time,the number of iterations can be greatly we further divide it into a certain number of unit zones and enumerate all 2opt.solutions for the storage placement in reduced.Therefore,the average compute complexity can be the unit zone.Table Efi][1..2opt.]and N[i][1..2pt]are much less than O(n).Moreover,if the deployment area is divided into multiple partitions,the above algorithm needs maintained for each partition P.As compared to the all to find the dominance relations for each particular parti- forwarding node solution,Efi is used to hold the reduced tion Pi one by one,then the overall compute complexity is energy in partition Pi for the jth item,Nij is used to O(m.n2),here m is the number of partitions. hold the essential number of storages in partition Pi for the jth item.Then,similarly,the problem is reduced to an un- bounded multiple knapsack problem,which is formulated as 5.3 Dynamic Programming based Algorithm
Based on the above analysis, if the deployment area is a circular region with the sink in the center, then due to the symmetry of the topology, there must exist a consistent value for the parameter kopt to denote the boundary of the storage area. Then we enumerate all 2kopt solutions for the storage placement in the unit zone and maintain two tables: E[1..2 kopt ] and N[1..2 kopt ]. If we use the notion item to denote a solution in a specified unit zone, then as compared to the all forwarding node solution, E[j](1 ≤ j ≤ 2 kopt ) is used to hold the reduced energy for the jth item and N[j](1 ≤ j ≤ 2 kopt ) is used to hold the essential number of storages for the jth item. Hence the problem is formulated as follows: maximize 2Xkopt j=1 E[j] · xj (7) subject to 2Xkopt i=1 N[j] · xj ≤ K 2Xkopt i=1 xj · θ ≤ π Here xj (1 ≤ j ≤ kopt) denotes the number of used copies for the jth kind of item, which actually means the number of unit zones applied with the jth solution for storage placement. The objective is to maximize the overall reduced energy cost. The first constraint means that the overall number of storage nodes should be no more than K. The second constraint means that the overall number of items should be no more than π θ , which is the number of unit zones. Hence the problem is reduced to the two-dimensional unbounded knapsack problem: given the 2kopt kinds of items, each with a value E[j], a weight N[j] and a space θ, determine the number of each item to put into the knapsack, so that the total value is maximized and the total weight and space are respectively less than or equal to a given limit, while the number of copies of each kind of item is not limited. Furthermore, if the deployment area is in an irregular shape, it can be divided into a certain number of fan-shaped partitions. Suppose the number of the fan-shaped partitions is m and each partition Pi has a degree of βi(1 ≤ i ≤ m). We first estimate the parameter kopt,i for each fan-shaped partition Pi according to Eq. (6). For each partition Pi, we further divide it into a certain number of unit zones and enumerate all 2kopt,i solutions for the storage placement in the unit zone. Table E[i][1..2 kopt,i ] and N[i][1..2 kopt,i ] are maintained for each partition Pi. As compared to the all forwarding node solution, E[i][j] is used to hold the reduced energy in partition Pi for the jth item, N[i][j] is used to hold the essential number of storages in partition Pi for the jth item. Then, similarly, the problem is reduced to an unbounded multiple knapsack problem, which is formulated as follows: maximizeXm i=1 2 kXopt,i j=1 E[i][j] · xi,j (8) subject to Xm i=1 2 kopt Xi i=1 N[i][j] · xi,j ≤ K 2 kopt Xi j=1 xi,j · θ ≤ βi for all 1 ≤ i ≤ m Here xi,j denotes the number of used copies of the jth kind of item in partition Pi. The objective is to maximize the overall reduced energy cost. The first constraint means that the overall number of storage nodes should be no more than K. The second constraint means that, for each partition Pi, the overall number of items should be no more than βi θ , which is the number of unit zones in Pi. Obviously the above two unbounded multiple knapsack problems are all NP-hard. Therefore there exist no algorithms to correctly solve the problem in polynomial-time. Hence in the following part we propose algorithms to solve this problem in pseudo-polynomial time. 5.2 Dominance Relations Based on the formulations in the unbounded knapsack problem, we observe that there exists such a property which can be called Dominance Relations: For a given item i in a specified partition, suppose we could find another item j such that N[j] ≤ N[i], E[i] ≤ E[j], (9) and the above inequalities are satisfied with at least one lessthan relationship. Then item i cannot appear in the optimal solution, since we could always improve any potential solution containing i by replacing i with j. Therefore we can disregard the item i altogether. In such cases, item j is said to dominate i. Therefore, solving the unbounded knapsack problem can be made easier by throwing away items that will never be needed. Hence, we can sufficiently reduce the size of search space from 2kopt to a certain level by leveraging the Dominance Relations. Therefore, in order to find the inherent dominance relations, we propose an algorithm as illustrated in Algorithm1. According to the algorithm, let n = 2kopt , the compute complexities of the sorting and the following iterations are respectively O(n log n) and O(n 2 ), hence the overall compute complexity is O(n 2 ). In fact, due to the remove of the dominated items in time, the number of iterations can be greatly reduced. Therefore, the average compute complexity can be much less than O(n 2 ). Moreover, if the deployment area is divided into multiple partitions, the above algorithm needs to find the dominance relations for each particular partition Pi one by one, then the overall compute complexity is O(m · n 2 ), here m is the number of partitions. 5.3 Dynamic Programming based Algorithm
Algorithm 1 Identify the dominance relations among the into.Then similarly we can recursively calculate the optimal items solution step by step.Hence the compute complexity of the 1:Put all 2kpt items into the search space S. dynamic programming is O(m·M·K·于) 2:Sort the items in decreasing order of reduced energy Ei rename the items as i(1 0 andli>0 for at least one sack i do Therefore,if we maintain a three-dimensional table Filk] 7: Pop one sack-item pair (i,j)from Q. of size M×Nx言,we can calculate each value of F[lkjg 8: if li>0 then recursively according to the formulation (10).then finally Put ri.j copies of item j into sack i,here ij= we can obtain the value of f*=FM,which is min{,间 the maximum reduced energy cost.By tracing back each 10: end if j(0<j<M)used to achieve f",we can obtain the opti- 11: k=k-x,·N[,li=l2-x,·0(i∈[1,m) mal parameters for the optimal storage placement.Since the 12:end while computation of f involves examining M items,and there are Kx values of each Fj to calculate by using items The above greedy approximation algorithm has the following up to j,the compute complexity of the dynamic program- properties in term of performance,as shown in Theorem 3. ming is O(M.K·) If the deployment area is in irregular shape,then according THEOREM 3.Assume Emar is the marimum value of items to the problem formulation in (8),we can add an additional that fit into the sack in the unbounded knapsack problem, dimension to denote which partition the current item is put then with limited number of storages,the greedy approrima-
Algorithm 1 Identify the dominance relations among the items 1: Put all 2kopt items into the search space S. 2: Sort the items in decreasing order of reduced energy E[i], rename the items as i(1 ≤ i ≤ 2 kopt ) according to the rank. 3: Sort the items in increasing order of required number of storage N[i]. 4: for i ∈ [1, 2 kopt ] and i ∈ S do 5: for j ∈ [i + 1, 2 kopt ] and j ∈ S do 6: if N[i] 0 and li > 0 for at least one sack i do 7: Pop one sack-item pair (i, j) from Q. 8: if li > 0 then 9: Put xi,j copies of item j into sack i, here xi,j = min{li, k N[j] }. 10: end if 11: k = k − xi,j · N[j], li = li − xi,j · θ(i ∈ [1, m]). 12: end while The above greedy approximation algorithm has the following properties in term of performance, as shown in Theorem 3. Theorem 3. Assume Emax is the maximum value of items that fit into the sack in the unbounded knapsack problem, then with limited number of storages, the greedy approxima-
tion algorithm is guaranteed to achieve at least a value of Kopt through Eq.(6)with constant time and space complex- ity,which we can denote as O(1).For the limited stor- age problem,we reduce it into the unbounded knapsack PROOF.In the greedy approximation algorithm,we ob- problem,and respectively propose the dynamic program- serve that the selected item(s)must occupy at least half of ming based algorithm and the greedy approximation algo- the resource,i.e.,the number of storages,in the sacks,other- rithm.The dynamic programming based algorithm main- wise,we can continue to insert the items into corresponding tains a four-dimensional table and the compute complexity sacks and attain a better solution.Suppose in the worst isO(m·M·K·a),here m is the number of fan-shaped case,the first selected item occupies more than half of the partitions,M is the size of reduced search space,K is the number of storage used,and 0 is the degree of the unit zone resource in any sack,then the rest part of the sacks can- not hold any other kind of items.As the reduced energy per Moreover,the greedy approximation algorithm has a com- storage unit,is maximized for any sack in the approxi- pute complexity of O(mM log(mM)),which is irrelevant to the number of storages,i.e.,K.As K can be a rather large mate solution,then the approximate solution can achieve the number,the compute complexity can be greatly reduced. total reduced energy as Emax.It is apparent N that in the optimal solution the total reduced energy Emax should be less than max El 7.PERFORMANCE EVALUATION K,therefore E*Emaz, the theorem gets proved. We have implemented a simulator to evaluate the perfor- mance of our solutions.In the simulation,we consider a situation where sensors are uniformly deployed in a region with randomly generated shape.We randomly select a sen- Now consider the compute complexity of Algorithm 2,we sor node inside the deployment area as the sink.The overall know that the number of the sack-item pairs (i,j)is m.M number of sensor nodes is N 1000.The deployment re- the compute complexity of the sorting is O(mMlog(mM)) moreover,the compute complexity of the following iteration gion covers an area of 6000 m.Hence on average each sensor node covers an area of 6m2.We set the average hop is O(mM).Therefore,the overall compute complexity of the greedy approximation algorithm is O(mM log(mM)),which distance d =3m.Based on the above settings,statistically in the simulations we observe that the maximum number is irrelevant to the number of storages K.As K can be a of hops n is within the range 10,15 with 95%confidence rather large number,the compute complexity can be greatly interval.Unless otherwise stated,we set the following pa- reduced,as compared to the compute complexity O(m.M rameters in our simulations:query rate re=1,query size K.)in the dynamic programming based algorithm. sa=1,data generate rate rd =1 and data size sd =1, we set the reply reduction rate a =0.5,for ease of com 6.PERFORMANCE ANALYSIS puting we set energy costs for transmitting and receiving a In this section we compare the compute complexity of our unit data as etr ere 1.In addition,we use relative optimization methodology with former research work in [5] energy cost as performance metrics.We use the case that in respect of compute complexity. no storage node except the sink is deployed as the baseline Let the energy cost for this no storage scenario be Ef.And According to the dynamic programming based algorithms let the energy cost after the storage nodes are deployed be depicted in 5,for the unlimited storage problem,they main- Es. The relative energy cost is defined as Es/Ef.In the tain a table E1..N according to all N nodes,which is following part we get the corresponding results by averaging used to hold the minimum energy cost of all subtrees rooted the results of 1000 independent experiments at node i=1,..,N.They also maintain a second table E[1..N which records the energy cost of all subtrees when 7.1 all nodes in each subtree are forwarding nodes.According Effect of the Optimized Placement with Un- to their analysis the time complexity of their dynamic pro- limited Number of Storages gramming based algorithm is O(N),where N is the number Fig.6(a)shows the energy cost with varying values of storage of nodes.For the K limited storage problem,they maintain area bound,we set ra=1.6,rd =1.Here we select a a two-dimensional (K+1).(N-1)table,Eim,l,at each specified fan-shaped partition divided from the deployment node i.According to their analysis,given a communication area as the test case,in which the topology is fairly regular tree with N nodes and at most C children for each par- As the maximum number of hops in this partition is n=10, ent,the compute complexity of their dynamic programming we vary the storage area bound from the Ist hop to the 10th based algorithm is O(K.N2(maxK,C)c-1). hop.We note that the energy cost starts to decrease from 0.97 as the storage area bound increases,when it reaches As the overall number of sensor nodes,N,can be a huge the lowest value 0.91,it starts to increase as the storage number for the large scale sensor network,thus the comput- area bound increases,finally it reaches the value 1.01 for ing complexity of the above algorithms will be extremely the full storage case.As we can see the lowest point of the large.Moreover,maintaining a two-dimensional table at energy cost curve lies in the bound equal to 5,which right each node is conventionally intolerable for sensors with lim- fits with our optimized parameter opt =5.3=5,hence it ited resources.Since in large scale sensor networks the num- demonstrates the effectiveness of the optimized parameter ber of hops,n,is greatly smaller than the number of sensor Kopt calculated by our methodologies. nodes,i.e.,nN.Therefore,in this paper,we utilize the hop as the computing unit instead of the node.Ac- In Fig.6(b)we compare the optimized storage placement cording to our optimized methodologies,for the unlimited with the random storage placement in term of energy cost storage problem,we can calculate the optimized hop-count as the reply reduction rate a varies.In the random storage
tion algorithm is guaranteed to achieve at least a value of Emax 2 . Proof. In the greedy approximation algorithm, we observe that the selected item(s) must occupy at least half of the resource, i.e., the number of storages, in the sacks, otherwise, we can continue to insert the items into corresponding sacks and attain a better solution. Suppose in the worst case, the first selected item occupies more than half of the resource in any sack, then the rest part of the sacks cannot hold any other kind of items. As the reduced energy per storage unit, E[i][j] N[j] , is maximized for any sack in the approximate solution, then the approximate solution can achieve the total reduced energy as E ∗ ≥ 1 2 max E[i][j]·K N[j] . It is apparent that in the optimal solution the total reduced energy Emax should be less than max E[i][j]·K N[j] , therefore E ∗ ≥ 1 2Emax, the theorem gets proved. Now consider the compute complexity of Algorithm 2, we know that the number of the sack-item pairs (i, j) is m · M, the compute complexity of the sorting is O(mM log(mM)), moreover, the compute complexity of the following iteration is O(mM). Therefore, the overall compute complexity of the greedy approximation algorithm is O(mM log(mM)), which is irrelevant to the number of storages K. As K can be a rather large number, the compute complexity can be greatly reduced, as compared to the compute complexity O(m · M · K · π θ ) in the dynamic programming based algorithm. 6. PERFORMANCE ANALYSIS In this section we compare the compute complexity of our optimization methodology with former research work in [5] in respect of compute complexity. According to the dynamic programming based algorithms depicted in [5], for the unlimited storage problem, they maintain a table E ∗ [1..N] according to all N nodes, which is used to hold the minimum energy cost of all subtrees rooted at node i = 1, ..., N. They also maintain a second table Ef [1..N] which records the energy cost of all subtrees when all nodes in each subtree are forwarding nodes. According to their analysis the time complexity of their dynamic programming based algorithm is O(N), where N is the number of nodes. For the K limited storage problem, they maintain a two-dimensional (K + 1) · (N − 1) table, Ei[m, l], at each node i. According to their analysis, given a communication tree with N nodes and at most C children for each parent, the compute complexity of their dynamic programming based algorithm is O(K · N 2 (max{K, C}) C−1 ). As the overall number of sensor nodes, N, can be a huge number for the large scale sensor network, thus the computing complexity of the above algorithms will be extremely large. Moreover, maintaining a two-dimensional table at each node is conventionally intolerable for sensors with limited resources. Since in large scale sensor networks the number of hops, n, is greatly smaller than the number of sensor nodes, i.e., n ≪ N. Therefore, in this paper, we utilize the hop as the computing unit instead of the node. According to our optimized methodologies, for the unlimited storage problem, we can calculate the optimized hop-count kopt through Eq.(6) with constant time and space complexity, which we can denote as O(1). For the limited storage problem, we reduce it into the unbounded knapsack problem, and respectively propose the dynamic programming based algorithm and the greedy approximation algorithm. The dynamic programming based algorithm maintains a four-dimensional table and the compute complexity is O(m · M · K · π θ ), here m is the number of fan-shaped partitions, M is the size of reduced search space, K is the number of storage used, and θ is the degree of the unit zone. Moreover, the greedy approximation algorithm has a compute complexity of O(mM log(mM)), which is irrelevant to the number of storages, i.e., K. As K can be a rather large number, the compute complexity can be greatly reduced. 7. PERFORMANCE EVALUATION We have implemented a simulator to evaluate the performance of our solutions. In the simulation, we consider a situation where sensors are uniformly deployed in a region with randomly generated shape. We randomly select a sensor node inside the deployment area as the sink. The overall number of sensor nodes is N = 1000. The deployment region covers an area of 6000 m2 . Hence on average each sensor node covers an area of 6m2 . We set the average hop distance d = 3m. Based on the above settings, statistically in the simulations we observe that the maximum number of hops n is within the range [10, 15] with 95% confidence interval. Unless otherwise stated, we set the following parameters in our simulations: query rate rq = 1, query size sq = 1, data generate rate rd = 1 and data size sd = 1, we set the reply reduction rate α = 0.5, for ease of computing we set energy costs for transmitting and receiving a unit data as etr = ere = 1. In addition, we use relative energy cost as performance metrics. We use the case that no storage node except the sink is deployed as the baseline. Let the energy cost for this no storage scenario be Ef . And let the energy cost after the storage nodes are deployed be Es. The relative energy cost is defined as Es/Ef . In the following part we get the corresponding results by averaging the results of 1000 independent experiments. 7.1 Effect of the Optimized Placement with Unlimited Number of Storages Fig.6(a) shows the energy cost with varying values of storage area bound, we set rq = 1.6, rd = 1. Here we select a specified fan-shaped partition divided from the deployment area as the test case, in which the topology is fairly regular. As the maximum number of hops in this partition is n = 10, we vary the storage area bound from the 1st hop to the 10th hop. We note that the energy cost starts to decrease from 0.97 as the storage area bound increases, when it reaches the lowest value 0.91, it starts to increase as the storage area bound increases, finally it reaches the value 1.01 for the full storage case. As we can see the lowest point of the energy cost curve lies in the bound equal to 5, which right fits with our optimized parameter kopt = ⌊5.3⌋ = 5, hence it demonstrates the effectiveness of the optimized parameter kopt calculated by our methodologies. In Fig.6(b) we compare the optimized storage placement with the random storage placement in term of energy cost, as the reply reduction rate α varies. In the random storage
Optimized Storage Placement 12- Random Storage Placement o-Random Storage Placemen 12 08 8 o. 10 04 0.5 04 090 02 n 000204060a10214102022 The Bound of Storage Area Value of reduction rate Value of query rate rq (a)Varying values of storage area bound(b)Varying values of reply reduction rate (c)Varying values of query rate ra Figure 6:Energy cost comparison with unlimited number of storages placement we randomly assign the roles (storage or forward- ing)to each sensor node.We note that as a increases from 1.04 0.1 to 1,the energy cost of random storage placement in- 106 creases from 0.33 to 1.10,and the energy cost of optimized 0.9 0.96 Random Storage Placement (Maximum) storage placement increases from 0.22 to 1.When a =0.1. 0.4 Random Stora ent(Average our optimized solution reduces 33%of the energy cost than An ate Al ic Programming based Algorithm the random storage placement;when a =1,our optimized storage solution can only reduce 9.1%of the energy cost than 0.86 the random placement.The reason is that as the reply re- duction rate increases,the transmission cost of query replies from storage nodes are increased,those storage nodes'capa- 0.74 bility of"in network processing"cannot be sufficiently lever- 0.72 人-t aged to filter enough raw sensor data to achieve the energy 0 0203004050060700 efficiency. The Number of Storages:K In Fig.6(c)we compare the optimized storage placement with the random storage placement in term of the energy Figure 7:Energy cost with varying values of K cost,as query rate ra varies.Note that as rg increases from 0.2 to 2,the energy cost of random storage placement in- creases from 0.25 to 1.2,and the energy cost of the opti- deployment area,we obtain the optimal number of storages mized storage placement increases from 0.125 to 1.When K*700 for the situation with unlimited number of stor- r=0.2,our optimized solution reduces 50%of the energy ages.We note that as the number of storages increases, cost than the random storage placement;when ra =2,our all energy costs decreases except the maximum energy cost optimized storage solution can only reduce 16.7%of the en- for the random storage placement.The dynamic program- ergy cost than the random placement.The reason is that as ming based algorithm always achieves the best performance query rate increases,the query diffusion cost will become a in term of energy cost.When K=50,it can reduce 169 quite large number as compared to the raw sensor data for- energy cost than the average energy cost of random stor- warding cost,thus the storage nodes'local filtering effect to age placement.As K increases to 700,it can further reduce reduce energy cost will be counteracted by the large amount 31%energy cost than the average energy cost of random of query diffusion cost,hence the energy cost continue to in storage placement.The greedy approximate algorithm also crease as query rate increases,and more storage nodes will achieves a good performance in term of energy cost,which be replaced by forwarding nodes in the optimized solution is just next to the dynamic programming based algorithm Moreover,among the random storage placement solutions, 7.2 Effect of the Optimized Placement with Lim- we observe that when K<150,the maximum energy cost ited Number of Storages is contributed by the all forwarding node solution,and as K In Fig.7,we compare various solutions in term of en- increases from 150,the maximum energy cost is contributed ergy cost with varying values of K(K<K*).These so- by the solution which places all storage nodes over the last lutions include the dynamic programming based algorithm, hop,which brings a large amount of query diffusion cost and reply cost. the greedy approximate algorithm and the random storage placement.In the random storage placement we randomly select K nodes as the storage nodes.Among the random 8.CONCLUSION storage placement solutions we respectively calculate the av- In this paper,we investigate into the optimized storage place- erage energy cost and the maximum energy cost for compar- ment problem over large scale sensor networks.We consider ison.Here we set rg 1.2,rd =1.Based on the specified the situation with unlimited number of storages and K lim-
0 2 4 6 8 10 0.90 0.92 0.94 0.96 0.98 1.00 1.02 1.04 Energy Cost rate Es/Ef The Bound of Storage Area (a) Varying values of storage area bound 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 Energy cost Rate Es/Ef Value of reduction rate Optimized Storage Placement Random Storage Placement (b) Varying values of reply reduction rate α 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Optimized Storage Placement Random Storage Placement Energy cost Rate Es/Ef Value of query rate rq (c) Varying values of query rate rq Figure 6: Energy cost comparison with unlimited number of storages placement we randomly assign the roles (storage or forwarding) to each sensor node. We note that as α increases from 0.1 to 1, the energy cost of random storage placement increases from 0.33 to 1.10, and the energy cost of optimized storage placement increases from 0.22 to 1. When α = 0.1, our optimized solution reduces 33% of the energy cost than the random storage placement; when α = 1, our optimized storage solution can only reduce 9.1% of the energy cost than the random placement. The reason is that as the reply reduction rate increases, the transmission cost of query replies from storage nodes are increased, those storage nodes’ capability of “in network processing” cannot be sufficiently leveraged to filter enough raw sensor data to achieve the energy efficiency. In Fig.6(c) we compare the optimized storage placement with the random storage placement in term of the energy cost, as query rate rq varies. Note that as rq increases from 0.2 to 2, the energy cost of random storage placement increases from 0.25 to 1.2, and the energy cost of the optimized storage placement increases from 0.125 to 1. When rq = 0.2, our optimized solution reduces 50% of the energy cost than the random storage placement; when rq = 2, our optimized storage solution can only reduce 16.7% of the energy cost than the random placement. The reason is that as query rate increases, the query diffusion cost will become a quite large number as compared to the raw sensor data forwarding cost, thus the storage nodes’ local filtering effect to reduce energy cost will be counteracted by the large amount of query diffusion cost, hence the energy cost continue to increase as query rate increases, and more storage nodes will be replaced by forwarding nodes in the optimized solution. 7.2 Effect of the Optimized Placement with Limited Number of Storages In Fig. 7, we compare various solutions in term of energy cost with varying values of K(K < K∗ ). These solutions include the dynamic programming based algorithm, the greedy approximate algorithm and the random storage placement. In the random storage placement we randomly select K nodes as the storage nodes. Among the random storage placement solutions we respectively calculate the average energy cost and the maximum energy cost for comparison. Here we set rq = 1.2, rd = 1. Based on the specified 0 100 200 300 400 500 600 700 800 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 1.02 1.04 Energy cost Rate Es/Ef The Number of Storages: K Random Storage Placement (Maximum) Random Storage Placement (Average) Greedy Approximate Algorithm Dynamic Programming based Algorithm Figure 7: Energy cost with varying values of K deployment area, we obtain the optimal number of storages K∗ ≈ 700 for the situation with unlimited number of storages. We note that as the number of storages increases, all energy costs decreases except the maximum energy cost for the random storage placement. The dynamic programming based algorithm always achieves the best performance in term of energy cost. When K = 50,it can reduce 16% energy cost than the average energy cost of random storage placement. As K increases to 700, it can further reduce 31% energy cost than the average energy cost of random storage placement. The greedy approximate algorithm also achieves a good performance in term of energy cost, which is just next to the dynamic programming based algorithm. Moreover, among the random storage placement solutions, we observe that when K < 150, the maximum energy cost is contributed by the all forwarding node solution, and as K increases from 150, the maximum energy cost is contributed by the solution which places all storage nodes over the last hop, which brings a large amount of query diffusion cost and reply cost. 8. CONCLUSION In this paper, we investigate into the optimized storage placement problem over large scale sensor networks. We consider the situation with unlimited number of storages and K lim-
ited number of storages,and we leverage the characteristics for sensor networks,"in CIKM,2006. of large scale sensor network to sufficiently reduce the com- [16 B.Sheng,Q.Li,and W.Mao.,"Optimize storage puting complexity of our optimization methodologies.Simu- placement in sensor networks,"IEEE Transactions on lation results confirm the performance of our methodologies Mobile Computing,vol.9,no.10,pp.1437-1450,2010 for storage placement over large scale sensor networks [17 B.Sheng,Q.Li,and W.Mao,"An approximation algorithm for data storage placement in sensor ACKNOWLEDGEMENT networks,"in WASA,2007. This work is partially supported by the National Natural Science Foundation of China under Grant No.61100196. 61073028.61021062:the National Basic Research Program of China (973)under Grant No.2009CB320705:the JiangSu Natural Science Foundation under Grant No.BK2011559. 10.REFERENCES 1 Y.Yao and J.Gehrke,"Query processing in sensor networks,"in CIDR,2003. [2]S.Madden,M.J.Franklin,J.M.Hellerstein,and W.Hong,"Tag:A tiny aggregation service for ad-hoc sensor networks."in Proceedings of OSDI.2002. [3]S.Scott,R.Sylvia,K.Brad,G.Ramesh,and E.Deborah,"Data-centric storage in sensornets." ACM SIGCOMM Computer Communication Review. vol.33.no.1,pp.137-142,2003. 4 S.Kapadia and B.Krishnamachari,"Comparative analysis of push-pull query strategies for wireless sensor networks,"in DCOSS,2006. 5 B.Sheng,Q.Li,and W.Mao,"Data storage placement in sensor networks,"in MobiHoc,2006 [6]R.Sylvia,K.Brad,S.Scott,E.Deborah,G.Ramesh, Y.Li,and Y.Fang,"Data-centric storage in sensornets with ght,a geographic hash table,"Mobile Net-works and Applications,vol.8,no.4,pp.427-442,2003. [7]G.Deepak,E.Deborah,and H.John,"Dimensions: Why do we need a new data handling architecture for sensor networks,"ACM SIGCOMM Computer Communication Review,vol.33,no.1,pp.143-148, 2003. (8]X.Li,Y.J.Kim,R.Govindan,and W.Hong, "Multi-dimensional range queries in sensor networks," in Sensys,2003. 9 R.Sarkar,X.J.Zhu,and J.Gao,"Double rulings for information brokerage in sensor networks,"in MOBICOM.2006. 10 X.Liu,Q.Huang,and Y.Zhang,"Combs, needles,haystacks:balancing push and pull for discovery in large-scale sensor networks,"in Sensys, 2004. [11]Q.Fang,J.Gao,and L.J.Guibas,"Landmark-based information storage and retrieval in sensor networks," in INFOCOM.2006. [12]N.Trigoni,Y.Yao,A.Demers,J.Gehrke,and R.Rajaraman."Hybrid push-pull query processing for sensor networks,"GI Jahrestagung,no.2,pp.370-374 13 J.Ahn and B.Krishnamachari,"Fundamental scaling laws for energy-efficient storage and querying in wireless sensor networks,"in MobiHoc,2006. 14 W.Zhang,G.Cao,and T.F.LaPorta,"Data dissemination with ring-based index for wireless sensor networks,"in ICNP,2003. [15]M.Aly,K.Pruhs,and P.K.Chrysanthis,"Kddcs:a load-balanced in-network data-centric storage scheme
ited number of storages, and we leverage the characteristics of large scale sensor network to sufficiently reduce the computing complexity of our optimization methodologies. Simulation results confirm the performance of our methodologies for storage placement over large scale sensor networks. 9. ACKNOWLEDGEMENT This work is partially supported by the National Natural Science Foundation of China under Grant No. 61100196, 61073028, 61021062; the National Basic Research Program of China (973) under Grant No. 2009CB320705; the JiangSu Natural Science Foundation under Grant No. BK2011559. 10. REFERENCES [1] Y. Yao and J. Gehrke, “Query processing in sensor networks,” in CIDR, 2003. [2] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, “Tag: A tiny aggregation service for ad-hoc sensor networks,” in Proceedings of OSDI, 2002. [3] S. Scott, R. Sylvia, K. Brad, G. Ramesh, and E. Deborah, “Data-centric storage in sensornets,” ACM SIGCOMM Computer Communication Review, vol. 33, no. 1, pp. 137–142, 2003. [4] S. Kapadia and B. Krishnamachari, “Comparative analysis of push-pull query strategies for wireless sensor networks,” in DCOSS, 2006. [5] B. Sheng, Q. Li, and W. Mao, “Data storage placement in sensor networks,” in MobiHoc, 2006. [6] R. Sylvia, K. Brad, S. Scott, E. Deborah, G. Ramesh, Y. Li, and Y. Fang, “Data-centric storage in sensornets with ght, a geographic hash table,” Mobile Net-works and Applications, vol. 8, no. 4, pp. 427–442, 2003. [7] G. Deepak, E. Deborah, and H. John, “Dimensions: Why do we need a new data handling architecture for sensor networks,” ACM SIGCOMM Computer Communication Review, vol. 33, no. 1, pp. 143–148, 2003. [8] X. Li, Y. J. Kim, R. Govindan, and W. Hong, “Multi-dimensional range queries in sensor networks,” in Sensys, 2003. [9] R. Sarkar, X. J. Zhu, and J. Gao, “Double rulings for information brokerage in sensor networks,” in MOBICOM, 2006. [10] X. Liu, Q. Huang, and Y. Zhang, “Combs, needles,haystacks: balancing push and pull for discovery in large-scale sensor networks,” in Sensys, 2004. [11] Q. Fang, J. Gao, and L. J. Guibas, “Landmark-based information storage and retrieval in sensor networks,” in INFOCOM, 2006. [12] N. Trigoni, Y. Yao, A. Demers, J. Gehrke, and R. Rajaraman, “Hybrid push-pull query processing for sensor networks,” GI Jahrestagung, no. 2, pp. 370–374. [13] J. Ahn and B. Krishnamachari, “Fundamental scaling laws for energy-efficient storage and querying in wireless sensor networks,” in MobiHoc, 2006. [14] W. Zhang, G. Cao, and T. F. LaPorta, “Data dissemination with ring-based index for wireless sensor networks,” in ICNP, 2003. [15] M. Aly, K. Pruhs, and P. K. Chrysanthis, “Kddcs: a load-balanced in-network data-centric storage scheme for sensor networks,” in CIKM, 2006. [16] B. Sheng, Q. Li, and W. Mao., “Optimize storage placement in sensor networks,” IEEE Transactions on Mobile Computing, vol. 9, no. 10, pp. 1437–1450, 2010. [17] B. Sheng, Q. Li, and W. Mao, “An approximation algorithm for data storage placement in sensor networks,” in WASA, 2007