Ranged Hash Functions and the price of Churn James Aspnes* Muli Safra Yitong Yin*S Abstract 1 Introduction Ranged hash functions generalize hash tables to the set-Hash tables are one of the oldest tools in computer ting where hash buckets may come and go over time,a science.In the classic model of a hash table,an typical case in distributed settings where hash buckets unknown data set from a large domain is assigned may correspond to unreliable servers or network connec-to a fixed number of buckets,through a consistent tions.Monotone ranged hash functions are a particular mapping(the hash function)from the data domain to class of ranged hash functions that minimize item reas- the buckets.However,in many applications in today's signments in response to churn:changes in the set of computing systems,such as peer-to-peer systems [22], available buckets.The canonical example of a mono- or Internet routers [3],not only is the current data set tone ranged hash function is the ring-based consistent (keys,network flows)unknown,but the availability of hashing mechanism of Karger et al.[13].These hash buckets(peers,peer links)also changes over time.This functions give a maximum load of(logm)when n change in the set of participants of the system is called is the number of items and m is the number of buckets.churn [5],and it is a serious concern for any large The question of whether some better bound could be distributed system. obtained using a more sophisticated hash function has Hashing has been applied to distributed systems remained open. via ranged hash functions,introduced by Karger et We resolve this question by showing two lower al.13.Ranged hash functions are hash functions that bounds.First,the maximum load of any randomized depend on the set of available buckets.A typical ranged monotone ranged hash function is (In m)when hash function hashes items to positions in some space, n =o(mlogm).This bound covers almost all of the and then assigns each item to the nearest available nontrivial case,because when n=(m logm)simple bucket;as the set of buckets changes,an item may random assignment matches the trivial lower bound of move to a new nearest available bucket.Such hash (n/m).We give a matching(though impractical)up-functions have the property of being monotone,[13] per bound that shows that our lower bound is tight over meaning that each data item has its own preference list almost all of its range.Second,for randomized mono-and hashes to the first available bucket in this list.This tone ranged hash functions derived from metric spaces, property minimizes reassignment costs [3],and perhaps there is a further trade-off between the expansion factor because of this,all practical ranged hash functions have of the metric and the load balance,which for the spe- this monotonicity property.This includes systems that cial case of growth-restricted metrics gives a bound of already do load balancing by,for example,creating (n logm),asymptotically equal to that of consistent multiple virtual locations for each bucket [13,22]. hashing.These are the first known non-trivial lower We are interested in obtaining lower bounds on the bounds for ranged hash functions.They also explain maximum load for any these monotone ranged hash why in ten years no better ranged hash functions have functions,parameterized in terms of the number of data arisen to replace consistent hashing. items n and the number of available buckets m.Our lower bounds hold even if:(a)the hash function is optimal for the given n and m;(b)the data set is optimal for the given hash function;and (c)the only power given to the adversary is the worst-case choice of available buckets.Furthermore,though our lower Department of Computer Science,Yale University. bounds are stated for a worst-case choice of buckets, TSupported in part by NSF grant CNS-0435201. Email: the proof technique used means that they continue to aspnescs.yale.edu. School of Mathematical Sciences,Tel-Aviv University.Email: hold with 1 -o(1)probability for a uniform random safraOpost.tau.ac.il. choice of buckets. SSupported by a Yale University Fellowship. Email: Randomization is fundamental to our problem,be- yitong.yineyale.edu
Ranged Hash Functions and the Price of Churn James Aspnes∗† Muli Safra‡ Yitong Yin∗§ Abstract Ranged hash functions generalize hash tables to the setting where hash buckets may come and go over time, a typical case in distributed settings where hash buckets may correspond to unreliable servers or network connections. Monotone ranged hash functions are a particular class of ranged hash functions that minimize item reassignments in response to churn: changes in the set of available buckets. The canonical example of a monotone ranged hash function is the ring-based consistent hashing mechanism of Karger et al. [13]. These hash functions give a maximum load of Θ n m log m when n is the number of items and m is the number of buckets. The question of whether some better bound could be obtained using a more sophisticated hash function has remained open. We resolve this question by showing two lower bounds. First, the maximum load of any randomized monotone ranged hash function is Ω(p n m ln m) when n = o(m log m). This bound covers almost all of the nontrivial case, because when n = Ω(m log m) simple random assignment matches the trivial lower bound of Ω(n/m). We give a matching (though impractical) upper bound that shows that our lower bound is tight over almost all of its range. Second, for randomized monotone ranged hash functions derived from metric spaces, there is a further trade-off between the expansion factor of the metric and the load balance, which for the special case of growth-restricted metrics gives a bound of Ω n m log m , asymptotically equal to that of consistent hashing. These are the first known non-trivial lower bounds for ranged hash functions. They also explain why in ten years no better ranged hash functions have arisen to replace consistent hashing. ∗Department of Computer Science, Yale University. †Supported in part by NSF grant CNS-0435201. Email: aspnes@cs.yale.edu. ‡School of Mathematical Sciences, Tel-Aviv University. Email: safra@post.tau.ac.il. §Supported by a Yale University Fellowship. Email: yitong.yin@yale.edu. 1 Introduction Hash tables are one of the oldest tools in computer science. In the classic model of a hash table, an unknown data set from a large domain is assigned to a fixed number of buckets, through a consistent mapping (the hash function) from the data domain to the buckets. However, in many applications in today’s computing systems, such as peer-to-peer systems [22], or Internet routers [3], not only is the current data set (keys, network flows) unknown, but the availability of buckets (peers, peer links) also changes over time. This change in the set of participants of the system is called churn [5], and it is a serious concern for any large distributed system. Hashing has been applied to distributed systems via ranged hash functions, introduced by Karger et al. [13]. Ranged hash functions are hash functions that depend on the set of available buckets. A typical ranged hash function hashes items to positions in some space, and then assigns each item to the nearest available bucket; as the set of buckets changes, an item may move to a new nearest available bucket. Such hash functions have the property of being monotone, [13] meaning that each data item has its own preference list and hashes to the first available bucket in this list. This property minimizes reassignment costs [3], and perhaps because of this, all practical ranged hash functions have this monotonicity property. This includes systems that already do load balancing by, for example, creating multiple virtual locations for each bucket [13, 22]. We are interested in obtaining lower bounds on the maximum load for any these monotone ranged hash functions, parameterized in terms of the number of data items n and the number of available buckets m. Our lower bounds hold even if: (a) the hash function is optimal for the given n and m; (b) the data set is optimal for the given hash function; and (c) the only power given to the adversary is the worst-case choice of available buckets. Furthermore, though our lower bounds are stated for a worst-case choice of buckets, the proof technique used means that they continue to hold with 1 − o(1) probability for a uniform random choice of buckets. Randomization is fundamental to our problem, be-
cause no lower bounds have been shown for randomized 2 Model ranged hash functions.Randomization is also important Items and buckets.Given a domain of items for practical algorithms.For the deterministic case,it is I=N,and a domain of buckets =M,where the easy to obtain terrible load balancing.In the full paper, buckets may become available and unavailable,we refer we show that all deterministic monotone ranged hash to the set S of available buckets as the state.A ranged functions experience a load of (Vn)in the worst case hash function is a function in the form of h 24xTu. when n=m.A naive deterministic implementation of We denote by hs the assignment of items to buckets consistent hashing would be even worse:an adversary imposed by h for a specific state S.Naturally,we require deleting all buckets outside of some very narrow interval that hs(T)C S for any SCu. can get nearly all n items in the first remaining bucket For any data set DCT,and any state S,the load With randomization,the hash function can do much of bucket beu is denoted as (b)=Ihs(b)nDl,and better.Nonetheless,we give two lower bounds.For we let e=maxese(b)be the maximum load of state a general monotone ranged hash function,we show a S.If the data set D is the entire domain Z,we omit D lower bound of2(√=logm)when n=o(m log m月 and write es(b)for e(b)and es for e. when n=(mlog m)the trivial lower bound of (n/m) Besides load balance,another critical performance matches the upper bound obtained from random as- parameter of ranged hash functions is smoothness, signment.We give a tight matching upper bound for which is characterized by the number of reassigned items the small-n case based on finding nearest neighbors in going from one state to another.Smoothness represents a hypercube;unfortunately,because of the difficulty a natural requirement for a data structure that the of nearest-neighbors in a hypercube,this upper bound maintenance cost should be small when the state of the does not give a practical implementation of a distributed data structure changes.In 13,the property of being hash function. monotone is introduced to characterize those ranged Our second lower bound applies to the more prac- hash functions with optimal smoothness.This property tical setting of ranged hash functions based on as- says that removing buckets only changes the position signing each item to the nearest bucket in a metric of items in the removed buckets:formally,a ranged space,where the choice of how to embed both items hash function h is monotone if for all S C T CW and buckets in the space may be based on an ar- hs(i)=hr(i)if hr(i)E S.For a monotone ranged bitrary joint distribution.We show a trade-off be- hash function,items are reassigned only if necessary. tween KR-dimension [6,12,16],a counting measure of Monotone ranged hash functions can always be doubling dimension and load balance:in a space described by a preference matrix.A preference with KR-dimension K,no randomized monotone ranged matrix a is an N x M matrix where each row mi is a hash function-even one where the embedding of items permutation of In [13],it is shown that a ranged hash and buckets is chosen according to a non-uniform non- function h is monotone if and only if there is a preference independent distribution-can do better than Q(2-2K matrix元,such that(hs()=minesπl(b)for In m).As with our general lower bound,the result every S and i,i.e.,if and only if every item possesses an continues to hold with a uniform choice of buckets with ordering of buckets,and is always assigned to the first probability 1-o(1). available bucket according to its order.Throughout this The interesting case is when the KR-dimension is paper,we use the notations of a monotone ranged hash constant,since such a growth-restricted metric al- function h and its corresponding preference matrix m lows finding nearest neighbors quickly using known tech- interchangeably. niques [12].Our lower bound shows that in this case In many applications of ranged hash functions,it is even a hash function based on a very cleverly chosen impractical to store a complete list of M possible buck- metric space,embedding,and probability distribution ets for every item.A general and natural methodology cannot beat the O(m Inm)upper bound of simple con- to efficiently represent a monotone ranged hash func- sistent hashing using a uniform independent distribu- tion is to embed I and w in a metric space and assign tion on a one-dimensional ring.Our lower bound thus each item i to the closest available bucket in the cur- hints at the reason for the continued practical use of rent state.An implementation of ranged hash function this technique. then involves a specific embedding of items and buckets Organization.We give a formal description of in a metric space,and a mechanism supporting nearest our model in Section 2.Previous work is described neighbor search (NNS)9,12 in that metric. in Section 3.Our results are described formally in Performance measures.We consider the effect Section 4,and the details are given in the following on load balance of two important properties of a ranged sections. hash function:(a)the requirement of optimal smooth-
cause no lower bounds have been shown for randomized ranged hash functions. Randomization is also important for practical algorithms. For the deterministic case, it is easy to obtain terrible load balancing. In the full paper, we show that all deterministic monotone ranged hash functions experience a load of Ω(√ n) in the worst case when n = m. A naive deterministic implementation of consistent hashing would be even worse: an adversary deleting all buckets outside of some very narrow interval can get nearly all n items in the first remaining bucket. With randomization, the hash function can do much better. Nonetheless, we give two lower bounds. For a general monotone ranged hash function, we show a lower bound of Ω p n m log m when n = o(m log m); when n = Ω(m log m) the trivial lower bound of Ω(n/m) matches the upper bound obtained from random assignment. We give a tight matching upper bound for the small-n case based on finding nearest neighbors in a hypercube; unfortunately, because of the difficulty of nearest-neighbors in a hypercube, this upper bound does not give a practical implementation of a distributed hash function. Our second lower bound applies to the more practical setting of ranged hash functions based on assigning each item to the nearest bucket in a metric space, where the choice of how to embed both items and buckets in the space may be based on an arbitrary joint distribution. We show a trade-off between KR-dimension [6, 12, 16], a counting measure of doubling dimension and load balance; in a space with KR-dimension K, no randomized monotone ranged hash function—even one where the embedding of items and buckets is chosen according to a non-uniform nonindependent distribution—can do better than Ω(2−2K · n m ln m). As with our general lower bound, the result continues to hold with a uniform choice of buckets with probability 1 − o(1). The interesting case is when the KR-dimension is constant, since such a growth-restricted metric allows finding nearest neighbors quickly using known techniques [12]. Our lower bound shows that in this case, even a hash function based on a very cleverly chosen metric space, embedding, and probability distribution cannot beat the O( n m ln m) upper bound of simple consistent hashing using a uniform independent distribution on a one-dimensional ring. Our lower bound thus hints at the reason for the continued practical use of this technique. Organization. We give a formal description of our model in Section 2. Previous work is described in Section 3. Our results are described formally in Section 4, and the details are given in the following sections. 2 Model Items and buckets. Given a domain of items I = [N], and a domain of buckets U = [M], where the buckets may become available and unavailable, we refer to the set S of available buckets as the state. A ranged hash function is a function in the form of h : 2U×I → U. We denote by hS the assignment of items to buckets imposed by h for a specific state S. Naturally, we require that hS(I) ⊆ S for any S ⊆ U. For any data set D ⊆ I, and any state S, the load of bucket b ∈ U is denoted as ` D S (b) = |h −1 S (b) ∩ D|, and we let ` D S = maxb∈S ` D S (b) be the maximum load of state S. If the data set D is the entire domain I, we omit D and write `S(b) for ` D S (b) and `S for ` D S . Besides load balance, another critical performance parameter of ranged hash functions is smoothness, which is characterized by the number of reassigned items going from one state to another. Smoothness represents a natural requirement for a data structure that the maintenance cost should be small when the state of the data structure changes. In [13], the property of being monotone is introduced to characterize those ranged hash functions with optimal smoothness. This property says that removing buckets only changes the position of items in the removed buckets: formally, a ranged hash function h is monotone if for all S ⊆ T ⊆ U, hS(i) = hT (i) if hT (i) ∈ S. For a monotone ranged hash function, items are reassigned only if necessary. Monotone ranged hash functions can always be described by a preference matrix. A preference matrix π is an N × M matrix where each row πi is a permutation of U. In [13], it is shown that a ranged hash function h is monotone if and only if there is a preference matrix π, such that π −1 i (hS(i)) = minb∈S π −1 i (b) for every S and i, i.e., if and only if every item possesses an ordering of buckets, and is always assigned to the first available bucket according to its order. Throughout this paper, we use the notations of a monotone ranged hash function h and its corresponding preference matrix π interchangeably. In many applications of ranged hash functions, it is impractical to store a complete list of M possible buckets for every item. A general and natural methodology to efficiently represent a monotone ranged hash function is to embed I and U in a metric space and assign each item i to the closest available bucket in the current state. An implementation of ranged hash function then involves a specific embedding of items and buckets in a metric space, and a mechanism supporting nearest neighbor search (NNS) [9, 12] in that metric. Performance measures. We consider the effect on load balance of two important properties of a ranged hash function: (a) the requirement of optimal smooth-
ness (i.e.being monotone);and (b)the expansion prop-15,17,19.All of these methods involve evenly cutting erties of the underlying metric space(which may dra-the unit circle by enforcing a certain pattern by which matically affect the hardness of searching nearest neigh-points in the unit circle are occupied.With this bors). method,the systems sacrifice the "off-line"property of For this purpose,we define a measure called the a hash function;the location of buckets now depends price of churn.We define this in terms of a class of on the order in which they arrive and depart,requiring ranged hash functions sharing a particular property.For coordination mechanisms to manage the assignment. a class H of ranged hash functions,let P:H denote an The question of whether such coordination is neces- arbitrary probability distribution over hash functions in sary has remained open.No new ranged hash function H.The price of churn for H is defined formally as with better performance has been proposed since consis- tent hashing.No non-trivial lower bound is known to us en(N,M,n,m) for ranged hash functions either.Although ranged hash =路期即{gg≥-1-} functions as a class of meaningful mathematical objects have been known for a decade,we have little knowledge This represents the lower bound on the maximum load about their structure,and the relation between their of all randomized ranged hash functions with property parameters. H.With I [N]and u=[M],for any randomized ranged hash function with some property H,for any 4 Our results data set D∈(A),there exists a stateS∈(),such We prove that for the class of monotone ranged hash that the maximum load e is at least (N,M,n,m) functions monotone(N,M,n,m)=2(√-Inm)for the with high probability. case that n=o(mlogm)and lzonotone(N,M,n,m)= Note that in this notion of lower bound,the data (m)for the case that n =(mlogm).This bound is set D is fixed by us,and then the state S is chosen tight for almost all settings of n and m.This shows by an adversary.This is different from a lower bound a similar property as the price of unknown data set against both a worst-case S and a worst-case D.We (in general we can do no better than random-balls- use this alternative formulation because our purpose into-bins [18]against a worst-case data set):they both is to understand specifically how a changing set of approach the asymptotic optimum O()as n grows participants may affect a system:thus we focus on the to (m log m).However,the price of churn is much costs caused solely by the unknown state,rather than smaller than the skew induced by a random balls-into- other issues such as an unknown data set (which has bins assignment. been covered by the study of hash tables).Our lower We then look at ranged hash functions arising from bound states that even if the system is fully optimized metric spaces.We explore the relation between the towards the current data set,there is still some price load balance and the expansion properties of the metric. we have to pay for unpredictable participation in the We adopt a counting measure of doubling dimension system.Giving this additional power to the ranged hash introduced in 12,which is called KR-dimension in the function only strengthens our lower bounds. literature [6,16],namely,we say the embedding of u has KR-dimension K if doubling the radius may increase 3 Previous work the size of the balls in by at most a factor of 2 So far the only practical construction of ranged hash We prove that for the class of ranged hash functions function is consistent hashing,which was introduced by implemented via metric-space embedding where w has Karger et al.13 along with the notion of a ranged KR-dimension K,ek-dimge(N,M,n,m)=(2-2K. hash function.Here items and buckets are mapped 产lnm)ifK≤}log2(是lnm).Combining with a to a uniformly random place in the continuous unit widely believed conjecture for nearest neighbor search, circle [0,1),and each item is assigned to the closet "the curse of dimensionality"[9],this trade-off provides available bucket.For any data setD=n and any state us a very interesting insight:although dimensionality S=m,the maximum load e2 is with high probability curses search.it blesses load balance. θ(是lnm): When the KR-dimension is O(1),the metric is Consistent hashing was originally designed for web called growth-restricted.In this case,the trade-off caching;however,through its utility in a wide variety becomes lo(1)-dimkR(N,M,n,m)=(n In m),which is of distributed settings,it has become the foundation exactly the bound for the maximum load of consistent of many modern distributed systems 2,8,11,21,22. hashing. Many systems that implement consistent hashing have Despite the fact that the original 0,1)metric used tried different ways to improve load balance 1,4,14, in consistent hashing is not in general growth-restricted
ness (i.e. being monotone); and (b) the expansion properties of the underlying metric space (which may dramatically affect the hardness of searching nearest neighbors). For this purpose, we define a measure called the price of churn. We define this in terms of a class of ranged hash functions sharing a particular property. For a class H of ranged hash functions, let P : H denote an arbitrary probability distribution over hash functions in H. The price of churn for H is defined formally as `H(N, M, n, m) = max S∈( U m) min D∈( I n) min P:H supn L Pr P [` D S ≥ L] = 1 − o(1)o . This represents the lower bound on the maximum load of all randomized ranged hash functions with property H. With I = [N] and U = [M], for any randomized ranged hash function with some property H, for any data set D ∈ I n , there exists a state S ∈ U m , such that the maximum load ` D S is at least `H(N, M, n, m) with high probability. Note that in this notion of lower bound, the data set D is fixed by us, and then the state S is chosen by an adversary. This is different from a lower bound against both a worst-case S and a worst-case D. We use this alternative formulation because our purpose is to understand specifically how a changing set of participants may affect a system: thus we focus on the costs caused solely by the unknown state, rather than other issues such as an unknown data set (which has been covered by the study of hash tables). Our lower bound states that even if the system is fully optimized towards the current data set, there is still some price we have to pay for unpredictable participation in the system. Giving this additional power to the ranged hash function only strengthens our lower bounds. 3 Previous work So far the only practical construction of ranged hash function is consistent hashing, which was introduced by Karger et al. [13] along with the notion of a ranged hash function. Here items and buckets are mapped to a uniformly random place in the continuous unit circle [0, 1), and each item is assigned to the closet available bucket. For any data set |D| = n and any state |S| = m, the maximum load ` D S is with high probability Θ( n m ln m). Consistent hashing was originally designed for web caching; however, through its utility in a wide variety of distributed settings, it has become the foundation of many modern distributed systems [2, 8, 11, 21, 22]. Many systems that implement consistent hashing have tried different ways to improve load balance [1, 4, 14, 15, 17, 19]. All of these methods involve evenly cutting the unit circle by enforcing a certain pattern by which points in the unit circle are occupied. With this method, the systems sacrifice the “off-line” property of a hash function; the location of buckets now depends on the order in which they arrive and depart, requiring coordination mechanisms to manage the assignment. The question of whether such coordination is necessary has remained open. No new ranged hash function with better performance has been proposed since consistent hashing. No non-trivial lower bound is known to us for ranged hash functions either. Although ranged hash functions as a class of meaningful mathematical objects have been known for a decade, we have little knowledge about their structure, and the relation between their parameters. 4 Our results We prove that for the class of monotone ranged hash functions `monotone(N, M, n, m) = Ω(p n m ln m) for the case that n = o(m log m) and `monotone(N, M, n, m) = Ω( n m ) for the case that n = Ω(m log m). This bound is tight for almost all settings of n and m. This shows a similar property as the price of unknown data set (in general we can do no better than random-ballsinto-bins [18] against a worst-case data set): they both approach the asymptotic optimum O( n m ) as n grows to Ω(m log m). However, the price of churn is much smaller than the skew induced by a random balls-intobins assignment. We then look at ranged hash functions arising from metric spaces. We explore the relation between the load balance and the expansion properties of the metric. We adopt a counting measure of doubling dimension introduced in [12], which is called KR-dimension in the literature [6,16], namely, we say the embedding of U has KR-dimension K if doubling the radius may increase the size of the balls in U by at most a factor of 2K. We prove that for the class of ranged hash functions implemented via metric-space embedding where U has KR-dimension K, `K-dimKR (N, M, n, m) = Ω(2−2K · n m ln m) if K ≤ 1 4 log2 ( n m ln m). Combining with a widely believed conjecture for nearest neighbor search, “the curse of dimensionality” [9], this trade-off provides us a very interesting insight: although dimensionality curses search, it blesses load balance. When the KR-dimension is O(1), the metric is called growth-restricted. In this case, the trade-off becomes `O(1)-dimKR (N, M, n, m) = Ω( n m ln m), which is exactly the bound for the maximum load of consistent hashing. Despite the fact that the original [0, 1) metric used in consistent hashing is not in general growth-restricted
for uniformly distributed buckets,our lower bound ap-and any n min(M,N)and m n, plies to consistent hashing because we can-without changing the structure of the algorithm-replace this 1.if H is I-hereditary,then lH(N,M,n,m)> metric with a (growth-restricted)discrete counting met- EH(n,M,n,m); ric that simply counts the number of bucket locations 2.if H is probabilistically U-hereditary,then between any two buckets.Our result thus shows that eH(N,M,n,m) ≥CH(N,,n,m)for any consistent hashing is optimal for all constructions aris- m≤x≤M. ing from growth-restricted metrics.This is particularly interesting for distributed computing for two reasons. Proof.1.Observe that for any randomized ranged First,growth-restricted metrics are among the strongest hash function h with I-hereditary property H,if metrics for which efficient exact nearest-neighbor search for some data set D∈(月)that Pr[唱L]=1-o(1)for such as constructions arising from 1-dimensional metric some state S∈(),according to the converse- space,but does not appear to generalize to more gen- negative proposition to the above observation,for eral classes of(randomized)ranged hash functions.Our all randomized ranged hash functions defined on method is to explore directly the structure of the pref- I=[N]and u [M]with property H,for erence matrices.By bounding certain measures of the all data sets D ()there must exist some richness of this structure,we can prove lower bounds for state S∈((such that Pr[eg≥)=1 general monotone ranged hash functions.For ranged o(1),i.e.(N,M,n,m)>L,which implies that hash function arising from metric spaces,we consider the connection between load balance and the density en(N,M,n,m)zEn(n,M,n,m). of balls in the metric space;using this we can prove a 2.We assume that (N,z,n,m)=L,i.e.for any trade-off between load balance and the expansion rate. randomized ranged hash functions defined on Z'= [N]and '=[x]with property H,for all data set 5 Domain reduction D∈(),it holds that Pre≥)=1-o(1) In order to avoid considering too many possibilities,it for some state S).For any randomized is helpful to be able to reduce the problem of churn to ranged hash functions h defined on I =[N]and a few representative special cases.In this section,we =[M]with property H,if H is probabilistically describe how to do so. u-hereditary,we can randomly pick x buckets, We say a property H of ranged hash functions is say []such that the restriction of h on the new Z-hereditary,if for any h e H,and any T'I, bucket domain '=[z]has property H with high the restriction of h on the domain I'still belongs to probability,therefore according to the previous H.We say a property H of ranged hash functions is assumption,for the new ranged hash function, probabilistically W-hereditary,if for any h E H,and with high probability,for all data set D ()it any ML]=1-o(1)for some state such that the restriction of h on the domain belongs to H with high probability. ()()ie.for h,for all data set De(, it holds that Pr[e >L]=1-o(1)for some state It is easy to see that the property of being mono- S∈(),which means that er(N,M,n,m)≥L,so tone is both Z-hereditary and u-hereditary,since any we prove that H(N,M,n,m)zH(N,n,m). restriction of a preference matrix is still a preference matrix. 6 Load balance vs.monotonicity The following lemma allows reducing the price of churn to a base case with small parameters. In this section,we prove the following theorem. THEOREM 6.1.For any randomized monotone ranged LEMMA 5.1.For any class H of ranged hash functions hash function with domain I =N and U =M,for
for uniformly distributed buckets, our lower bound applies to consistent hashing because we can—without changing the structure of the algorithm—replace this metric with a (growth-restricted) discrete counting metric that simply counts the number of bucket locations between any two buckets. Our result thus shows that consistent hashing is optimal for all constructions arising from growth-restricted metrics. This is particularly interesting for distributed computing for two reasons. First, growth-restricted metrics are among the strongest metrics for which efficient exact nearest-neighbor search is currently possible. Second, as argued in [12], a growth-restricted metric holds in the case of many typical Internet applications. Our results thus demonstrate that consistent hashing is optimal for such applications. Besides these specific results, we contribute new techniques to analyzing ranged hash functions. Previous approaches [13, 19] can be seen as bounding the maximum volume of Voronoi cells governed by random centers. This technique works fine for restricted cases such as constructions arising from 1-dimensional metric space, but does not appear to generalize to more general classes of (randomized) ranged hash functions. Our method is to explore directly the structure of the preference matrices. By bounding certain measures of the richness of this structure, we can prove lower bounds for general monotone ranged hash functions. For ranged hash function arising from metric spaces, we consider the connection between load balance and the density of balls in the metric space; using this we can prove a trade-off between load balance and the expansion rate. 5 Domain reduction In order to avoid considering too many possibilities, it is helpful to be able to reduce the problem of churn to a few representative special cases. In this section, we describe how to do so. We say a property H of ranged hash functions is I-hereditary, if for any h ∈ H, and any I 0 ⊆ I, the restriction of h on the domain I 0 still belongs to H. We say a property H of ranged hash functions is probabilistically U-hereditary, if for any h ∈ H, and any M0 < M, there is some distribution of U 0 ∈ U M0 such that the restriction of h on the domain U 0 belongs to H with high probability. It is easy to see that the property of being monotone is both I-hereditary and U-hereditary, since any restriction of a preference matrix is still a preference matrix. The following lemma allows reducing the price of churn to a base case with small parameters. Lemma 5.1. For any class H of ranged hash functions and any n ≤ min(M, N) and m ≤ n, 1. if H is I-hereditary, then `H(N, M, n, m) ≥ `H(n, M, n, m); 2. if H is probabilistically U-hereditary, then `H(N, M, n, m) ≥ `H(N, x, n, m) for any m ≤ x ≤ M. Proof. 1. Observe that for any randomized ranged hash function h with I-hereditary property H, if for some data set D ∈ I n that Pr[` D S < L] = Ω(1) for all S ∈ U m , the restriction of h on the new item domain I 0 = D is still a randomized ranged hash functions with property H (since H is Ihereditary), and it still holds for the new ranged hash function that Pr[`S < L] = Ω(1) for all S ∈ U m . Therefore, assuming that `H(n, M, n, m) = L, i.e. for any randomized ranged hash functions defined on I 0 = [n] and U 0 = [M] with property H, it holds that Pr[`S ≥ L] = 1 − o(1) for some state S ∈ U 0 m , according to the conversenegative proposition to the above observation, for all randomized ranged hash functions defined on I = [N] and U = [M] with property H, for all data sets D ∈ I n , there must exist some state S ∈ U m such that Pr[` D S ≥ L] = 1 − o(1), i.e. `H(N, M, n, m) ≥ L, which implies that `H(N, M, n, m) ≥ `H(n, M, n, m). 2. We assume that `H(N, x, n, m) = L, i.e. for any randomized ranged hash functions defined on I 0 = [N] and U 0 = [x] with property H, for all data set D ∈ I 0 n , it holds that Pr[` D S ≥ L] = 1 − o(1) for some state S ∈ U 0 m . For any randomized ranged hash functions h defined on I = [N] and U = [M] with property H, if H is probabilistically U-hereditary, we can randomly pick x buckets, say [x], such that the restriction of h on the new bucket domain U 0 = [x] has property H with high probability, therefore according to the previous assumption, for the new ranged hash function, with high probability, for all data set D ∈ I n , it holds that Pr[` D S ≥ L] = 1 − o(1) for some state S ∈ U 0 m ⊆ U m , i.e. for h, for all data set D ∈ I n , it holds that Pr[` D S ≥ L] = 1 − o(1) for some state S ∈ U m , which means that `H(N, M, n, m) ≥ L, so we prove that `H(N, M, n, m) ≥ `H(N, x, n, m). 6 Load balance vs. monotonicity In this section, we prove the following theorem. Theorem 6.1. For any randomized monotone ranged hash function with domain I = [N] and U = [M], for
any sufficiently large n and m that N>2n.M>2m,show that in this case we can get a large disjoint subfam- and n m,for any data set D E(),there erists a ily of neighborhoods by applying the Hajnal-Szemeredi state S(),such that the marimum load(L)Theorem [7.After that the standard analysis of devia- with high probability,where tion can be applied. Let Aj(b)denote the number of b-entries in column m≤nbe var(X)+
any sufficiently large n and m that N > 2n, M > 2m, and n ≥ m, for any data set D ∈ I n , there exists a state S ∈ U m , such that the maximum load ` D S = Ω(L) with high probability, where L = p n m ln m m ≤ n < o(m log m) n m n = Ω(m log m) . An equivalent statement is that `monotone(N, M, n, m) = Ω(L), where L is defined as above. According to Lemma 5.1, it is sufficient to consider `monotone(n, m0 , n, m) for m0 = max(n, 2m). For the rest of this section, we assume that I = [n] and U = [m0 ], that is, we only consider the n×m0 preference matrices. Since the Ω( n m ) bound is trivial, we are only interested in the case where m ≤ n < o(m log m). Instead of directly proving the lower bound for randomized ranged hash functions, we look at an arbitrary deterministic π, and show a probabilistic bound on maximum load for a uniformly random state S ∈ U m . The same bound for randomized hash functions with worstcase S follows from Yao’s min-max principle. Given a monotone ranged hash function π, for any bucket b ∈ U and any set of items A ⊆ I, we define the A-neighborhood ∆A(b) = {πij | i ∈ A and j ≤ π −1 i (b)}, i.e. ∆A(b) is the union of the set of buckets preferred by each item from A over b, including b. We say |A| as the width of the neighborhood. It is easy to see that the concept of neighborhood characterizes the load of a bucket. Lemma 6.1. For any state S ⊆ U, the load `S(b) of bucket b at state S is at least L, if and only if there is some A ⊆ I such that |A| = L and ∆A(b) ∩ S = {b}. It is thus sufficient to look at the collection of Lwide neighborhoods ∆L = {∆A(b) | b ∈ U, |A| = L}, which is a set system with ground set U, and show that for the uniformly random S ∈ U m , with high probability there exists some ∆A(b) ∈ ∆L such that ∆A(b) ∩ S = {b}. However, the intersection between neighborhoods can be extremely complicated for arbitrary π, which makes the deviation hard to analyze. To overcome this, we extend the combinatorial deletion method introduced in [10, 20], namely, we sequentially delete those buckets which cause the correlations, until the remaining family of neighborhoods has a suffi- ciently large disjoint subfamily. We do this in such a way that, conditioning on the previous deletions, the presence of each deleted bucket can only increase the maximum load. We first investigate the case where the entries in each column of π do not include many duplicates. We show that in this case we can get a large disjoint subfamily of neighborhoods by applying the Hajnal-Szemeredi Theorem [7]. After that the standard analysis of deviation can be applied. Let λj (b) denote the number of b-entries in column j of π, i.e. λj (b) = |{i | πij = b}|. Lemma 6.2. Let L = q m0 m ln m. For any constants 0 < β < 1 2 and 0 < α < q 1 2 − β, if for a n × m0 preference matrix π, it holds that for every b ∈ U and every column j ≤ αL, λj (b) = O˜(n β ), then Pr S∈( [m0] m ) [`S ≥ αL] = 1 − o(1). Proof. We first construct a family of disjoint αL-wide neighborhood {∆Ab (b)}b∈U , where |U| = Ω( ˜ n 1−2β ), and for any b ∈ U, |∆Ab (b)| ≤ α 2L 2 . Then we apply the second moment method to argue that with high probability, at least one of such neighborhoods has ∆Ab (b) ∩ S = {b}. We only consider the first αL columns of π. Since λj (b) = O˜(n β ) for every b, there are at least Ω( ˜ n 1−β ) buckets b, each of which appears at least αL times in the first αL columns; it follows that there is set V of such bs that |V | = Ω( ˜ n 1−β ), and for every b ∈ V , there is a Ab that |Ab| = αL and the whole neighborhood ∆Ab (b) is in the first αL columns. This guarantees that |∆Ab (b)| ≤ αL|Ab| = α 2L 2 . We define a graph G(V, E) with the above V as the vertex set, and (a, b) ∈ E if ∆Aa (a) intersects with ∆Ab (b). Since λj (b) = O˜(n β ) and the size of each ∆Ab (b) is bounded by α 2L 2 = O(ln n), the degree of G(V, E) is at most O˜(n β ). According to the HajnalSzemeredi Theorem [7], there is an independent set U of size Ω( ˜ n 1−2β ). Translating back from the graph to the neighborhood structure, this says that there is a family of disjoint neighborhoods {∆Ab (b)}b∈U of cardinality Ω( ˜ n 1−2β ), such that |∆Ab (b)| ≤ α 2L 2 for every b ∈ U. For any bucket b ∈ U, let Xb be the indicator that Xb = 1 if ∆Ab (b) ∩ S = {b}, and Xb = 0 if otherwise. Let X = P b∈U Xb. Due to Lemma 6.1, Pr[`S ≥ αL] ≥ Pr[∃b ∈ U, ∆Ab (b) ∩ S = {b}] = 1 − Pr[X = 0]. For any constant 0 < α < q 1 2 − β, E(X) = |U| · Pr[∆Ab (b) ∩ S = {b}] ≥ Ω( ˜ n 1−2β ) m − α 2L 2 m − 1 . m0 m ≥ Ω( ˜ n 1−2β−2α 2 ) = ω(1). For any a, b ∈ U, ∆Aa (a) and ∆Ab (b) are disjoint, thus cov(Xa, Xb) ≤ 0, therefore var(X) = P b∈U var(Xb) +
∑azcov((Xa,X)S∑bEU var(Xo)≤E(X).Due to From Chebyshev's inequality,Pr[X aL]1. Delete b from that is,let a become the restriction Let t =Ak(b).Without loss of generality,we ofπon4\{b}- assume that Tik=b for i=1,2,...,t.Let Xi be the indicator that Xi=1ifπ)是S for all j0.It is easy to see that (i1,i)E +(es≥oL(w.m)l(6hs.i only if the row i and ig of m share some entries in the firstk-1 columns.Since(a)<aLn for allj<k and a∈4,the degree of the dependency 可) graph is at mostn,thus the number of edges jE回l≤akLn巴,therefore∑it,ou(Xi,Xa)≤ 2IEl≤atkIn2=o(E(X)for any constants B0and <hence var(x)()
P a6=b cov(Xa, Xb) ≤ P b∈U var(Xb) ≤ E(X). Due to Chebyshev’s inequality, Pr[X = 0] ≤ Pr[|X − E(X)| ≥ E(X)] ≤ var(X) E2(X) ≤ 1 E(X) = o(1). Therefore Pr[`S ≥ αL] ≥ 1 − Pr[X = 0] ≥ 1 − o(1). Next we deal with the case where some bucket appears many times in some column, but in all the preceding columns, the number of appearances of any single bucket is small. Lemma 6.3. Let L = q m0 m ln m. For any constants β > 0 and 0 1. Let t = λk(b). Without loss of generality, we assume that πik = b for i = 1, 2, . . . , t. Let Xi be the indicator that Xi = 1 if πij 6∈ S for all j 0. It is easy to see that (i1, i2) ∈ E only if the row i1 and i2 of π share some entries in the first k − 1 columns. Since λj (a) 0 and 0 < α < √ β 2 , hence var(X) = o(E2 (X)). From Chebyshev’s inequality, Pr[X < αL] ≤ var(X) (E(X)−αL) 2 = o(1), therefore Pr[`S ≥ αL | b ∈ S] ≥ Pr[X ≥ αL] = 1 − o(1). Combining the two lemmas, we have the following theorem. Theorem 6.2. For any monotone ranged hash function π with I = [n] and U = [m0 ], where m0 = max(n, 2m), and n = o(m log m) it holds that, Pr S∈( [m0] m ) `S ≥ Ω r n m ln m = 1 − o(1). Proof. We denote that L(x, y) = qx y ln y. Note that L(m0 , m) ≥ p n m ln m. For an arbitrary π, we can construct a sequence of buckets b1, b2, . . . , bt by the following procedure. Initially t = 1, the constants are set as β = 1 3 and α = 1 4 . 1. If in the current π, for every 1 ≤ j ≤ αL(m0 , m) and for any b, λj (b) < αL(m0 , m)n jβ αL(m0,m) , then terminate. 2. Pick one b with minimum j such that λj (b) ≥ αL(m0 , m)n jβ αL(m0,m) . Let bt ← b and t ← t + 1. Delete b from π, that is, let π become the restriction of π on U \ {b}. 3. If t < log2 n go to Step 1, if otherwise terminate. Summing the total probability, we have Pr S∈( [m0] m ) `S ≥ 1 2 αL(m0 , m) = Xt k=1 Pr S∈( [m0] m ) h `S ≥ 1 2 αL(m0 , m) {bi}i<k ⊆ S, bk ∈ S i · Pr S∈( [m0] m ) {bi}i<k ⊆ S, bk ∈ S + Pr S∈( [m0] m ) h `S ≥ 1 2 αL(m0 , m) {bi}i≤t ⊆ S i · Pr S∈( [m0] m ) h {bi}i≤t ⊆ S i ≥ Xt k=1 Pr S∈( [m0−k+1] m ) `S ≥ αL(m0 − k + 1, m) | bk ∈ S · Pr S∈( [m0−t] m ) {bi}i<k ⊆ S, bk ∈ S
+Pr,[s≥aL(m'-t,m)] The Perturbed Cube.Intuitively,this construc- sE() tion is a Hamming cube with perturbations.Each bucket is mapped to a vertex of the cube;each item is mapped to a vertex of the cube as well.The prefer- Se( ence list is determined by Hamming distance plus some The second inequality is due to fact that L(m'- perturbation to break ties.1 O(log2n),m)>L(m',m)and the observation that More formally,letp:[nl→{0,1}logn×[0,l)be conditioning on TS is equivalent to that a becomes a mapping from [n]to a(1+logn)-dimensional vector the restriction of m on T and the probability is taken space X as follow:(k)=z where i=k/2i-1 mod 2 over uniformly random S from (Im'\T). for i =1,2,...,logn and 1+logn =k/n,i.e.the first Due to Lemma 6.3,for the first t terms Pr[es> log n entries of o(k)comprise the binary representation aL(m'-k +1,m)bk ES]1-o(1)for every k.If of k,and the last entry is k/n. For I =W=nl,let h be a monotone ranged hash taL(m'-t,m)]=1-o(1),defined above,where d is the ei distance.Next,for any therefore the total is 1-o(1).Alternatively,if t=log2n, item i E I and any state SCu,let hs(i)be some b with then Pr[fbi}ist CS]=o(1),which means that the total is at least (1-o(1)-o(1)),which is also (1-o(1)). the property that Va S,d((i),(b))0 such that 7 Tightness of the lower bound In this section we address the tightness of the lower Pr Inm =o(1): bound of (monotone(M,N,n,m),and show evidence that se(u) m the lower bound is tight in almost all settings. We first consider a simple construction of(random- Proof.Let L=a.Inm.Considering the collection ized)monotone ranged hash function.For I [N]and of all L-wide neighborhoods△L={△A(b)|b∈ W=[M].define a randomized N x M preference ma- and A∈(份)}defined on,it is sufficient to show trixΠas follow:for each i∈I,rowΠis a uniformly that with high probability,none of the neighborhoods and independently random permutation of It is easy △A(b)∈△Lhas△A(b)nS={b}.Two key observations to see that for any data set D∈(月and any state are:(1)the chance that any entry after the first L2 S(),the ranged hash function II assigns each of the column is ever reached is negligible,thus we only need n items independently to a uniformly random bucket to consider the neighborhoods contained in the first L2 in S.According to the well-known balls-into-bins re- columns of r;and(2)for every L-wide neighborhood, sult [18],when n =(mlog m),the maximum load is AA(b)=(L2).Applying these two facts,the theorem with high probability (),i.e.the lower bound is tight is proved.The detailed proof will be given in the full for the case that n=(m log m).However,for the case version of the paper. that n is close to m,there is a gap between the maxi- mum load of II and the lower bound.The gap is maxi- IThis perturbed cube construction of ranged hash function mized when n=θ(m),where the maximum load ofΠ does not imply a practical distributed hash table as consistent hashing does.This is because the nearest neighbor search in high is e(Inn/In Inn)but the lower bound is (VIn n). dimensional Hamming space is believed to be hard.We propose We then introduce a construction that approaches this construction only to show the tightness of the bound of the the lower bound when n is close to m.We first deal price of churn with monotone ranged hash functions.However, with the base case where I=4=[n].For convenience, it might turn out be useful for more centralized applications of ranged hash functions,as in Internet routers 3,where nearest- we assume that n is power of 2 and n cm for some neighbor search is unnecessary and it may be feasible to store the constant c>1. whole preference matrix in the routing table
+ Pr S∈( [m0−t] m ) `S ≥ αL(m0 − t, m) · Pr S∈( [m0−t] m ) h {bi}i≤t ⊆ S i The second inequality is due to fact that L(m0 − O(log2 n), m) ≥ 1 2 L(m0 , m) and the observation that conditioning on T ⊆ S is equivalent to that π becomes the restriction of π on U \T and the probability is taken over uniformly random S from [m0 ]\T m . Due to Lemma 6.3, for the first t terms Pr[`S ≥ αL(m0 − k + 1, m) | bk ∈ S] = 1 − o(1) for every k. If t 1. The Perturbed Cube. Intuitively, this construction is a Hamming cube with perturbations. Each bucket is mapped to a vertex of the cube; each item is mapped to a vertex of the cube as well. The preference list is determined by Hamming distance plus some perturbation to break ties.1 More formally, let φ : [n] → {0, 1} log n × [0, 1) be a mapping from [n] to a (1 + log n)-dimensional vector space X as follow: φ(k) = x where xi = k/2 i−1 mod 2 for i = 1, 2, . . . , log n and x1+log n = k/n, i.e. the first log n entries of φ(k) comprise the binary representation of k, and the last entry is k/n. For I = U = [n], let h be a monotone ranged hash function defined as follows. First, embed I and U in the same metric space (X, d) by the same mapping φ as defined above, where d is the `1 distance. Next, for any item i ∈ I and any state S ⊆ U, let hS(i) be some b with the property that ∀a ∈ S, d(φ(i), φ(b)) ≤ d(φ(i), φ(a)). Note that there are at most two such points b for each i; in case of a tie, we pick an arbitrary one. We denote by π the resulting preference matrix. We now show that the maximum load for the perturbed cube is, with high probability, O p n m ln m , for the case that n is close to m: Theorem 7.1. Let π be as constructed above. If n = o m ln m (ln ln m) 2 and n ≥ cm for some constant c > 1, then there is a constant α > 0 such that Pr S∈( U m) `S ≥ α · r n m ln m = o(1). Proof. Let L = α· p n m ln m. Considering the collection of all L-wide neighborhoods ∆L = {∆A(b) | b ∈ U and A ∈ I L } defined on π, it is sufficient to show that with high probability, none of the neighborhoods ∆A(b) ∈ ∆L has ∆A(b)∩S = {b}. Two key observations are: (1) the chance that any entry after the first L 2 column is ever reached is negligible, thus we only need to consider the neighborhoods contained in the first L 2 columns of π; and (2) for every L-wide neighborhood, |∆A(b)| = Ω(L 2 ). Applying these two facts, the theorem is proved. The detailed proof will be given in the full version of the paper. 1This perturbed cube construction of ranged hash function does not imply a practical distributed hash table as consistent hashing does. This is because the nearest neighbor search in high dimensional Hamming space is believed to be hard. We propose this construction only to show the tightness of the bound of the price of churn with monotone ranged hash functions. However, it might turn out be useful for more centralized applications of ranged hash functions, as in Internet routers [3], where nearestneighbor search is unnecessary and it may be feasible to store the whole preference matrix in the routing table.
To construct a randomized ranged hash function underlying metric.Our results demonstrate that con- against a worst-case S,we randomly rename the buck-sistent hashing on a one-dimensional ring gives optimal ets by applying a uniformly random permutation to load balance among all growth-restricted metrics. u.With I=问and4=m,for any S∈(%) Formally,we start with a metric space (X,d),and that no,the maximum load ts is with assume I C X and u X.A ranged hash function h can then be defined as follows.For every Scu high probability ovmInm).We show that the and i,hs(i)is the nearest neighbor of i in lower bound of lmonotone(n,n,n,m)is tight when n=S:specifically,it satisfies the constraint that for any () This statement can also be easily ex-b S,d(i,hs(i))0,we let By(,r)=[bY I d(,b) (InInm)? or n =S(m log m). ry(t).If there exists some bE B(,rz(t))nB(r,rz(t)), then B(y,ry(t))B(r,3r(t)),since for any a E B(y,ry(t)),d(a,z)<d(a,y)+d(y,b)+d(b,x)<3rz(t). 8 Load balance vs.expansion rate For the same reason,if∩z∈AB(z,Tz(t)≠0,we General preference matrices create scalability problems,can take the largest(t)and so B((t)) since they require storing (Mlog M)bits of informa-B(,3r(t)).Because dimkR(Y)=K,B(,3r(t)) tion for each item.Instead,we would prefer a more 22K|B(x,Tx()川=t22K. compact representation that allows each node (bucket) in the system to store at most m(1)bits of information. For any monotone ranged hash function arising A general way to do so is embedding items and buck- from a metric space (X,d),the preference list Ti for ets into a metric space,and then assign each item to each item iI is a list of buckets in u sorted by the closest bucket in the current state.Because of this increasing distance from i.It is not hard to see that scalability constraint,we are restricted to the cases that (1,i2,...,Tit}=Bu(i,ri(t))for any t,i.e.,for each n=m1+(1)-if a node cannot handle more than m(1) row of a,its first t entries is a t-point ball around i in bits,we surely do not expect it to store more than that many data items. In [12,the definition also includes a minimum threshold of In this section,we consider monotone ranged hash the size of balls.For simplicity,we drop this assumption.We functions based on this approach,and show a trade-can do so because for any size state S,the threshold has to be off between load balance and the expansion rate of the sufficiently small to enforce the nearest neighbor search,which means that the threshold is independent of
To construct a randomized ranged hash function against a worst-case S, we randomly rename the buckets by applying a uniformly random permutation to U. With I = [n] and U = [n], for any S ∈ U m that n = o m ln m (ln ln m) 2 , the maximum load `S is with high probability O p n m ln m . We show that the lower bound of `monotone(n, n, n, m) is tight when n = o m ln m (ln ln m) 2 . This statement can also be easily extended to `monotone(M, N, n, m) for N = O(n) and M = O(n). For general domain sizes N and M, we have a non-uniform construction through reduction to the base case N = M = [n]. Given I = [N] and U = [M], let r : [M] → [n] be a uniformly random projection from [M] to [n]. We construct a monotone ranged hash function h 0 by the rule that for each i ∈ [n] ⊂ I, and any S ⊆ U, hS(i) is chosen to be some b ∈ r −1 (hr(S)(i)), with arbitrary tie breaking. Intuitively, the first n items in I are assigned to the buckets in state S according to the perturbed cube h at the state r(S). For any S ∈ U m with n = o m ln m (ln ln m) 2 , we have that with high probability |r(S)| = Θ(m). Thus the maximum load contributed by these n items is asymptotically the same as the maximum load in the n-vertex perturbed cube, which is O p n m ln m . (Note that the concentration of r may only decrease the maximum load.) For h 0 , there exists a data set D ∈ U n such that for any state S ∈ U m where n = o m ln m (ln ln m) 2 , the maximum load ` D S is O p n m ln m with high probability. It follows that the lower bound of `monotone(M, N, n, m) is tight whenever n = o m ln m (ln ln m) 2 or n = Ω(m log m). 8 Load balance vs. expansion rate General preference matrices create scalability problems, since they require storing Ω(M log M) bits of information for each item. Instead, we would prefer a more compact representation that allows each node (bucket) in the system to store at most mo(1) bits of information. A general way to do so is embedding items and buckets into a metric space, and then assign each item to the closest bucket in the current state. Because of this scalability constraint, we are restricted to the cases that n = m1+o(1)—if a node cannot handle more than mo(1) bits, we surely do not expect it to store more than that many data items. In this section, we consider monotone ranged hash functions based on this approach, and show a tradeoff between load balance and the expansion rate of the underlying metric. Our results demonstrate that consistent hashing on a one-dimensional ring gives optimal load balance among all growth-restricted metrics. Formally, we start with a metric space (X, d), and assume I ⊆ X and U ⊆ X. A ranged hash function h can then be defined as follows. For every S ⊆ U and i ∈ I, hS(i) is the nearest neighbor of i in S; specifically, it satisfies the constraint that for any b ∈ S, d(i, hS(i)) ≤ d(i, b). In order to make this definition precise, we further require that the embedding is “sparse”, that is, the chance that there is more than one nearest neighbor for any given point is negligible. It is obvious that such a ranged hash function is monotone. The expansion rate of the underlying metric is important because it affects the hardness of finding nearest neighbors. Given any x ∈ X, Y ⊆ X, and r > 0, we let BY (x, r) = {b ∈ Y | d(x, b) ≤ r} be the ball of radius r around x in Y . The KR-dimension [12] of Y , denoted as dimKR(Y ), is the smallest K such that |BY (x, 2r)| ≤ 2 K|BY (x, r)| for all x ∈ X, r ≥ 0. We say Y is growth-restricted if it has constant KR-dimension, i.e. dimKR(Y ) = O(1).2 For each x ∈ X, let rx(t) be the smallest value that |BY (x, rx(t))| = t. The following lemma captures the connection between the expansion rate of the metric and the density of overlapped balls, which is central to proving the trade-off between expansion rate and load balancing. Lemma 8.1. Let Y ⊆ X with dimKR(Y ) = K. For any A ⊆ X, if T x∈A BY (x, rx(t)) 6= ∅, then | S x∈A BY (x, rx(t))| ≤ t2 2K. Proof. Consider any x, y ∈ X, and suppose that rx(t) ≥ ry(t). If there exists some b ∈ B(x, rx(t)) ∩ B(x, rx(t)), then B(y, ry(t)) ⊆ B(x, 3rx(t)), since for any a ∈ B(y, ry(t)), d(a, x) ≤ d(a, y) + d(y, b) + d(b, x) ≤ 3rx(t). For the same reason, if T x∈A B(x, rx(t)) 6= ∅, we can take the largest rx(t) and so S x∈A B(x, rx(t)) ⊆ B(x, 3rx(t)). Because dimKR(Y ) = K, |B(x, 3rx(t))| ≤ 2 2K|B(x, rx(k))| = t2 2K. For any monotone ranged hash function π arising from a metric space (X, d), the preference list πi for each item i ∈ I is a list of buckets in U sorted by increasing distance from i. It is not hard to see that {πi1, πi2, . . . , πit} = BU (i, ri(t)) for any t, i.e., for each row of π, its first t entries is a t-point ball around i in 2 In [12], the definition also includes a minimum threshold of the size of balls. For simplicity, we drop this assumption. We can do so because for any size state S, the threshold has to be sufficiently small to enforce the nearest neighbor search, which means that the threshold is independent of |U|
W.Combining this fact with Lemma 8.1,we have the Naturally,the same bound holds for randomized following proposition. ranged hash functions against a worst-case S.Specif- ically,for any family of ranged hash functions with PROPOSITION 8.1.If dimKr(u)-K.for any b dimgR(u)=K,Er-dimKn(n,m',n,m)=(2-2K. 4 and any A C T,l△A(b)l≤t22 k where t= 是lhm,whereK≤7log2(共lhm. maxiEAT(b). In [12],Karger and Ruhl show that a uniformly selected subset of w has essentially the same KR- This implies that if a neighborhood AA(b)is con- dimension with high probability,3 i.e.K-dimkR is prob- tained in the first t columns of x,the size of the neigh- borhood is at most t22K no matter how large A is. abilistically W-hereditary.It is easy to see that K- All the above facts intuitively suggest that for dimKR is also Z-hereditary,since the KR-dimension of W is irrelevant of Z.By Lemma 5.1,for all ranged hash ranged hash functions arising from a metric,the KR- functions with the same KR-dimension ofu it holds that dimension of u strongly affects the structure of the preference matrices,and thus controls load balancing. EK-dimgR(N,M,n,m)2 EK-dimKR(n,m',n,m).This gives us the following theorem. This intuition is justified by the following lemma. THEOREM 8.1.For any distribution of ranged hash LEMMA 8.2.Let n be a monotone ranged hash func- functions with I=N,lM川=M,and dimKR(亿)=K, tion with I=n,u=m',and dimKR(U)=K,for any sufficiently large n and m with N>2n,M> where m' max(n,2m),n ml+o(1),and K2m, b,Aj(b)represents the number of copies of b in j. M>2m,n m,and n ml+o(1).This bound is tight Case1:If for each j≤L and beu,,入;(b)L4/o.Without loss of generality,we assume functions,in a growth restricted metric,the price of that mij =b for i =1,2,...,t,where t =L/a.We only churn dominates the balls-into-bins skew,and unlike consider the first L2 columns in these t rows.Because the balls-into-bins bound,the price of churn does not they share the same entry b,according to Proposition approach O(n)as n grows to (mlogm).The price 8.1,l{π|1≤i≤t,1≤j≤L2=L222k,i.e,there are L222K distinct entries in the first L2 columns of of churn in this setting is also robust for all values of n and m satisfying the scalability condition n=m1+o(1). in these t rows.Effectively,assigns t=L/a items to only L222K buckets.The maximum load in these 3In fact,they show that a set S of uniformly random m points buckets is thus at least L22-2K2L,as long as none from with dimKR(亿)=K has dimKR(S)≤K+1with of these rows are totally absent from S.The probability probability 1-exp(-(t))if we only care about balls larger that there is at least one such bad row (containing than a threshold t.Because all balls in our proofs have size no element of S in its first L2 entries)is at most L (VIn m),we can easily make the probability (1-o(1)) t.PrL∈可≤Ll-nr)2=o.Therefore without affecting our argument.We also ignore the +1 on the KR-dimension.because it can only affect the maximum load by with high probability,the maximum load es >L. a constant factor
U. Combining this fact with Lemma 8.1, we have the following proposition. Proposition 8.1. If dimKR(U) = K, for any b ∈ U and any A ⊂ I, |∆A(b)| ≤ t2 2K where t = maxi∈A π −1 i (b). This implies that if a neighborhood ∆A(b) is contained in the first t columns of π, the size of the neighborhood is at most t2 2K no matter how large A is. All the above facts intuitively suggest that for ranged hash functions arising from a metric, the KRdimension of U strongly affects the structure of the preference matrices, and thus controls load balancing. This intuition is justified by the following lemma. Lemma 8.2. Let π be a monotone ranged hash function with |I| = n, |U| = m0 , and dimKR(U) = K, where m0 = max(n, 2m), n = m1+o(1), and K ≤ 1 4 log2 ( n m ln m). Then for any constant 0 2n, M > 2m, n ≥ m, n = m1+o(1), and 1 4 log2 ( n m ln m) ≥ K, it holds that for any data set D ∈ I n , there exists a state S ∈ U m , such that ` D S = Ω(2−2K · n m ln m) with high probability. Equivalently, under these conditions, `K-dimKR (N, M, n, m) = Ω(2−2K · n m ln m). This justifies our previous observation that dimensionality helps load balance even though it hurts searching. In particular, if U is growth-restricted, `O(1)-dimKR (N, M, n, m) = Ω( n m ln m) for N > 2n, M > 2m, n ≥ m, and n = m1+o(1). This bound is tight because it is achieved by standard consistent hashing, which (as discussed in Section 4) can be modeled using a discrete growth-restricted metric in place of the usual continuous [0, 1) metric. This lower bound explains where the Θ( n m ln m) bound for consistent hashing comes from: it is the best we can get from any growth-restricted metric. It also explains why no more sophisticated growth-restricted metric has arisen to displace consistent hashing. Unlike the case for general monotone ranged hash functions, in a growth restricted metric, the price of churn dominates the balls-into-bins skew, and unlike the balls-into-bins bound, the price of churn does not approach O( n m ) as n grows to Ω(m log m). The price of churn in this setting is also robust for all values of n and m satisfying the scalability condition n = m1+o(1) . 3 In fact, they show that a set S of uniformly random m points from U with dimKR(U) = K has dimKR(S) ≤ K + 1 with probability 1 − exp(−Ω(t)) if we only care about balls larger than a threshold t. Because all balls in our proofs have size L = Ω(√ ln m), we can easily make the probability (1 − o(1)) without affecting our argument. We also ignore the +1 on the KR-dimension, because it can only affect the maximum load by a constant factor
These facts suggest that in systems that yield growth- and random trees:Distributed caching protocols for re- restricted metric,which is the typical case for Internet lieving hot spots on the world wide web.In Proceedings applications,the unreliability of nodes is more critical of the 29th Annual ACM Symposium on the Theory of than the uncertainty of data. Computing (STOC 1997),pages 654-663,1997. [14]D.R.Karger and M.Ruhl.Simple efficient load balancing algorithms for peer-to-peer systems.In References Proceedings of the 16th Annual ACM Symposium on Parallel Algorithms (SPAA 2004),pages 36-43,2004. [1]M.Adler,E.Halperin,R.M.Karp,and V.V.Vazi- [15]K.Kenthapadi and G.S.Manku.Decentralized algo- rani.A stochastic process on the hypercube with ap- rithms using both local and random probes for p2p plications to peer-to-peer networks.In Proceedings of load balancing.In Proceedings of the 17th Annual the 35th Annual ACM Symposium on Theory of Com- ACM Symposium on Parallel Algorithms (SPAA 2005). puting(ST0C2003),pages575-584,2003. pages135-144,2005. [2]J.Aspnes and G.Shah.Skip graphs.In Fourteenth An- [16]R.Krauthgamer and J.Lee.Navigating nets:simple nual ACM-SIAM Symposium on Discrete Algorithms algorithms for proximity search.Proceedings of the (S0DA2003),pages384-393,Janm.2003. fifteenth annual ACM-SIAM Symposium on Discrete [3]J.Aspnes,Y.R.Yang,and Y.Yin.Path-independent Algorithms (SODA 2004),pages 798-807,2004. load balancing with unreliable machines.In Eigh- [17 G.S.Manku.Balanced binary trees for id management teenth Annual ACM-SIAM Symposium on Discrete Al- and load balance in distributed hash tables.In Pro- gorithms (SODA),pages 814-823,Jan.2007. ceedings of the 23rd Annual ACM Symposium on Prin- [4]G.Giakkoupis and V.Hadzilacos.A scheme for load ciples of Distributed Computing (PODC 2004),pages balancing in heterogenous distributed hash tables.In 197-205,2004 Proceedings of the 24th Annual ACM Symposium on 18 R.Motwani and P.Raghavan.Randomized algorithms. Principles of Distributed Computing (PODC 2005), Cambridge University Press,New York,NY,USA, pages302-311,2005. 1995. [5]P.Godfrey,S.Shenker,and I.Stoica.Minimizing [19]M.Naor and U.Wieder.Novel architectures for p2p churn in distributed systems.Proceedings of the ACM applications:the continuous-discrete approach.In SIGC0MM2006 pages147-158,2006. Proceedings of the 15th Annual ACM Symposium on [6]A.Gupta,R.Krauthgamer,and J.Lee.Bounded Parallel Algorithms (SPAA 2003),pages 50-59,2003. geometries,fractals,and low-distortion embeddings 20 V.Rodl and A.Rucinski.Threshold Functions for Proceedings of the 44th Annual IEEE Symposium on Ramsey Properties.Journal of the American Math- Foundations of Computer Science(FOCS 2003),pages ematical Society,8(4):917-942,1995. 534-543.2003. [21]A.I.T.Rowstron and P.Druschel.Pastry:Scal- [7]A.Hajnal and E.Szemeredi.Proof of a conjecture able,decentralized object location,and routing for of Erdos.Combinatorial Theory and its Applications, large-scale peer-to-peer systems.In Middleware 2001, 2:601-623,1970. IFIP/ACM International Conference on Distributed [8]K.Hildrum,J.Kubiatowicz,S.Rao,and B.Y.Zhao. Systems Platforms,pages 329-350,2001. Distributed object location in a dynamic network.In [22]I.Stoica,R.Morris,D.R.Karger,M.F.Kaashoek, Proceedings of the 14th Annual ACM Symposium on and H.Balakrishnan.Chord:A scalable peer-to-peer Parallel Algorithms and Architectures (SPAA 2002). lookup service for internet applications.In Proceedings pages41-52,2002. of ACM SIGCOMM 2001,pages 149-160,2001. [9]P.Indyk,J.Goodman,and J.O'Rourke.Nearest neighbors in high-dimensional spaces.Handbook of Discrete and Compulational Geometry,chapter 39, 2004. 10 S.Janson and A.Rucinski.The infamous upper tail.Random Structures and Algorithms.20(3):317- 342,2002. [11]M.F.Kaashoek and D.R.Karger.Koorde:A simple degree-optimal distributed hash table.In The 2nd International Workshop on Peer-to-Peer Sustems IPTPS2003,pages98-107,2003. [12]D.Karger and M.Ruhl.Finding nearest neighbors in growth-restricted metrics.Proceedings of the thiry- fourth annual ACM Symposium on Theory of Comput- ing(ST0C2002),pages741-750,2002. [13 D.R.Karger,E.Lehman,F.T.Leighton,R.Pani- grahy,M.S.Levine,and D.Lewin.Consistent hashing
These facts suggest that in systems that yield growthrestricted metric, which is the typical case for Internet applications, the unreliability of nodes is more critical than the uncertainty of data. References [1] M. Adler, E. Halperin, R. M. Karp, and V. V. Vazirani. A stochastic process on the hypercube with applications to peer-to-peer networks. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC 2003), pages 575–584, 2003. [2] J. Aspnes and G. Shah. Skip graphs. In Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pages 384–393, Jan. 2003. [3] J. Aspnes, Y. R. Yang, and Y. Yin. Path-independent load balancing with unreliable machines. In Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 814–823, Jan. 2007. [4] G. Giakkoupis and V. Hadzilacos. A scheme for load balancing in heterogenous distributed hash tables. In Proceedings of the 24th Annual ACM Symposium on Principles of Distributed Computing (PODC 2005), pages 302–311, 2005. [5] P. Godfrey, S. Shenker, and I. Stoica. Minimizing churn in distributed systems. Proceedings of the ACM SIGCOMM 2006, pages 147–158, 2006. [6] A. Gupta, R. Krauthgamer, and J. Lee. Bounded geometries, fractals, and low-distortion embeddings. Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2003), pages 534–543, 2003. [7] A. Hajnal and E. Szemeredi. Proof of a conjecture of Erdos. Combinatorial Theory and its Applications, 2:601–623, 1970. [8] K. Hildrum, J. Kubiatowicz, S. Rao, and B. Y. Zhao. Distributed object location in a dynamic network. In Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA 2002), pages 41–52, 2002. [9] P. Indyk, J. Goodman, and J. O’Rourke. Nearest neighbors in high-dimensional spaces. Handbook of Discrete and Computational Geometry, chapter 39, 2004. [10] S. Janson and A. Rucinski. The infamous upper tail. Random Structures and Algorithms, 20(3):317– 342, 2002. [11] M. F. Kaashoek and D. R. Karger. Koorde: A simple degree-optimal distributed hash table. In The 2nd International Workshop on Peer-to-Peer Systems (IPTPS 2003), pages 98–107, 2003. [12] D. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. Proceedings of the thiryfourth annual ACM Symposium on Theory of Computing (STOC 2002), pages 741–750, 2002. [13] D. R. Karger, E. Lehman, F. T. Leighton, R. Panigrahy, M. S. Levine, and D. Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th Annual ACM Symposium on the Theory of Computing (STOC 1997), pages 654–663, 1997. [14] D. R. Karger and M. Ruhl. Simple efficient load balancing algorithms for peer-to-peer systems. In Proceedings of the 16th Annual ACM Symposium on Parallel Algorithms (SPAA 2004), pages 36–43, 2004. [15] K. Kenthapadi and G. S. Manku. Decentralized algorithms using both local and random probes for p2p load balancing. In Proceedings of the 17th Annual ACM Symposium on Parallel Algorithms (SPAA 2005), pages 135–144, 2005. [16] R. Krauthgamer and J. Lee. Navigating nets: simple algorithms for proximity search. Proceedings of the fifteenth annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2004), pages 798–807, 2004. [17] G. S. Manku. Balanced binary trees for id management and load balance in distributed hash tables. In Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing (PODC 2004), pages 197–205, 2004. [18] R. Motwani and P. Raghavan. Randomized algorithms. Cambridge University Press, New York, NY, USA, 1995. [19] M. Naor and U. Wieder. Novel architectures for p2p applications: the continuous-discrete approach. In Proceedings of the 15th Annual ACM Symposium on Parallel Algorithms (SPAA 2003), pages 50–59, 2003. [20] V. Rodl and A. Rucinski. Threshold Functions for Ramsey Properties. Journal of the American Mathematical Society, 8(4):917–942, 1995. [21] A. I. T. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Middleware 2001, IFIP/ACM International Conference on Distributed Systems Platforms, pages 329–350, 2001. [22] I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of ACM SIGCOMM 2001, pages 149–160, 2001