To construct a randomized ranged hash_中国高校课件下载中心

正在加载图片...

To construct a randomized ranged hash function underlying metric.Our results demonstrate that con- against a worst-case S,we randomly rename the buck-sistent hashing on a one-dimensional ring gives optimal ets by applying a uniformly random permutation to load balance among all growth-restricted metrics. u.With I=问and4=m,for any S∈(%) Formally,we start with a metric space (X,d),and that no,the maximum load ts is with assume I C X and u X.A ranged hash function h can then be defined as follows.For every Scu high probability ovmInm).We show that the and i,hs(i)is the nearest neighbor of i in lower bound of lmonotone(n,n,n,m)is tight when n=S:specifically,it satisfies the constraint that for any () This statement can also be easily ex-b S,d(i,hs(i))<d(i,b).In order to make this tended to lmonotone(M,N,n,m)for N=O(n)and M=definition precise,we further require that the embedding O(n). is "sparse",that is,the chance that there is more than For general domain sizes N and M,we have a one nearest neighbor for any given point is negligible.It non-uniform construction through reduction to the base is obvious that such a ranged hash function is monotone. case N M=n.Given I =N andu=M, The expansion rate of the underlying metric is let r:[M][n]be a uniformly random projection important because it affects the hardness of finding from [M]to [n].We construct a monotone ranged hash nearest neighbors.Given any z E X,Y C X,and function h'by the rule that for each i [n]C T,and r >0,we let By(,r)=[bY I d(,b)<r}be the any scu,hs(i)is chosen to be some br-(h(s)(i)),ball of radius r around r in Y.The KR-dimension [12] with arbitrary tie breaking.Intuitively,the first n items of Y,denoted as dimkR(Y),is the smallest K such that in Z are assigned to the buckets in state S according to |By(x,2r川≤2 K By(x,r川for all x∈X,r≥0.We say the perturbed cube h at the state r(S). Y is growth-restricted if it has constant KR-dimension, For any se (with no),we i.e.dimKR(Y)=O(1).2 For each x E X.let r(t)be the smallest value have that with high probability r(S)=(m).Thus that By(,())=t.The following lemma captures the maximum load contributed by these n items is the connection between the expansion rate of the metric asymptotically the same as the maximum load in the and the density of overlapped balls,which is central to n-vertex perturbed cube,which is O(vn Inm).(Note proving the trade-off between expansion rate and load that the concentration of r may only decrease the balancing. maximum load.)For h',there exists a data set D E (份such that for any state S∈(%)where n- LEMMA 8.1.Let Y C X with dimKR(Y)=K. o),the maximum load isvm) For any A X,寸DTEA By(c,rz(t)≠0，then with high probability. IUEA By(x,r.(t)川≤t22 It follows that the lower bound of zonotone (M,N,n,m)is tight whenever n=o mInm Proof.Consider any z,yE X,and suppose that r(t)> (InInm)? or n =S(m log m). ry(t).If there exists some bE B(,rz(t))nB(r,rz(t)), then B(y,ry(t))B(r,3r(t)),since for any a E B(y,ry(t)),d(a,z)<d(a,y)+d(y,b)+d(b,x)<3rz(t). 8 Load balance vs.expansion rate For the same reason,if∩z∈AB(z,Tz(t)≠0，we General preference matrices create scalability problems,can take the largest(t)and so B((t)) since they require storing (Mlog M)bits of informa-B(,3r(t)).Because dimkR(Y)=K,B(,3r(t)) tion for each item.Instead,we would prefer a more 22K|B(x,Tx()川=t22K. compact representation that allows each node (bucket) in the system to store at most m(1)bits of information. For any monotone ranged hash function arising A general way to do so is embedding items and buck- from a metric space (X,d),the preference list Ti for ets into a metric space,and then assign each item to each item iI is a list of buckets in u sorted by the closest bucket in the current state.Because of this increasing distance from i.It is not hard to see that scalability constraint,we are restricted to the cases that (1,i2,...,Tit}=Bu(i,ri(t))for any t,i.e.,for each n=m1+(1)-if a node cannot handle more than m(1) row of a,its first t entries is a t-point ball around i in bits,we surely do not expect it to store more than that many data items. In [12,the definition also includes a minimum threshold of In this section,we consider monotone ranged hash the size of balls.For simplicity,we drop this assumption.We functions based on this approach,and show a trade-can do so because for any size state S,the threshold has to be off between load balance and the expansion rate of the sufficiently small to enforce the nearest neighbor search,which means that the threshold is independent of.To construct a randomized ranged hash function against a worst-case S, we randomly rename the buckets by applying a uniformly random permutation to U. With I = [n] and U = [n], for any S ∈ U m that n = o m ln m (ln ln m) 2 , the maximum load `S is with high probability O p n m ln m . We show that the lower bound of `monotone(n, n, n, m) is tight when n = o m ln m (ln ln m) 2 . This statement can also be easily extended to `monotone(M, N, n, m) for N = O(n) and M = O(n). For general domain sizes N and M, we have a non-uniform construction through reduction to the base case N = M = [n]. Given I = [N] and U = [M], let r : [M] → [n] be a uniformly random projection from [M] to [n]. We construct a monotone ranged hash function h 0 by the rule that for each i ∈ [n] ⊂ I, and any S ⊆ U, hS(i) is chosen to be some b ∈ r −1 (hr(S)(i)), with arbitrary tie breaking. Intuitively, the first n items in I are assigned to the buckets in state S according to the perturbed cube h at the state r(S). For any S ∈ U m with n = o m ln m (ln ln m) 2 , we have that with high probability |r(S)| = Θ(m). Thus the maximum load contributed by these n items is asymptotically the same as the maximum load in the n-vertex perturbed cube, which is O p n m ln m . (Note that the concentration of r may only decrease the maximum load.) For h 0 , there exists a data set D ∈ U n such that for any state S ∈ U m where n = o m ln m (ln ln m) 2 , the maximum load ` D S is O p n m ln m with high probability. It follows that the lower bound of `monotone(M, N, n, m) is tight whenever n = o m ln m (ln ln m) 2 or n = Ω(m log m). 8 Load balance vs. expansion rate General preference matrices create scalability problems, since they require storing Ω(M log M) bits of information for each item. Instead, we would prefer a more compact representation that allows each node (bucket) in the system to store at most mo(1) bits of information. A general way to do so is embedding items and buckets into a metric space, and then assign each item to the closest bucket in the current state. Because of this scalability constraint, we are restricted to the cases that n = m1+o(1)—if a node cannot handle more than mo(1) bits, we surely do not expect it to store more than that many data items. In this section, we consider monotone ranged hash functions based on this approach, and show a tradeoff between load balance and the expansion rate of the underlying metric. Our results demonstrate that consistent hashing on a one-dimensional ring gives optimal load balance among all growth-restricted metrics. Formally, we start with a metric space (X, d), and assume I ⊆ X and U ⊆ X. A ranged hash function h can then be defined as follows. For every S ⊆ U and i ∈ I, hS(i) is the nearest neighbor of i in S; specifically, it satisfies the constraint that for any b ∈ S, d(i, hS(i)) ≤ d(i, b). In order to make this definition precise, we further require that the embedding is “sparse”, that is, the chance that there is more than one nearest neighbor for any given point is negligible. It is obvious that such a ranged hash function is monotone. The expansion rate of the underlying metric is important because it affects the hardness of finding nearest neighbors. Given any x ∈ X, Y ⊆ X, and r > 0, we let BY (x, r) = {b ∈ Y | d(x, b) ≤ r} be the ball of radius r around x in Y . The KR-dimension [12] of Y , denoted as dimKR(Y ), is the smallest K such that |BY (x, 2r)| ≤ 2 K|BY (x, r)| for all x ∈ X, r ≥ 0. We say Y is growth-restricted if it has constant KR-dimension, i.e. dimKR(Y ) = O(1).2 For each x ∈ X, let rx(t) be the smallest value that |BY (x, rx(t))| = t. The following lemma captures the connection between the expansion rate of the metric and the density of overlapped balls, which is central to proving the trade-off between expansion rate and load balancing. Lemma 8.1. Let Y ⊆ X with dimKR(Y ) = K. For any A ⊆ X, if T x∈A BY (x, rx(t)) 6= ∅, then | S x∈A BY (x, rx(t))| ≤ t2 2K. Proof. Consider any x, y ∈ X, and suppose that rx(t) ≥ ry(t). If there exists some b ∈ B(x, rx(t)) ∩ B(x, rx(t)), then B(y, ry(t)) ⊆ B(x, 3rx(t)), since for any a ∈ B(y, ry(t)), d(a, x) ≤ d(a, y) + d(y, b) + d(b, x) ≤ 3rx(t). For the same reason, if T x∈A B(x, rx(t)) 6= ∅, we can take the largest rx(t) and so S x∈A B(x, rx(t)) ⊆ B(x, 3rx(t)). Because dimKR(Y ) = K, |B(x, 3rx(t))| ≤ 2 2K|B(x, rx(k))| = t2 2K. For any monotone ranged hash function π arising from a metric space (X, d), the preference list πi for each item i ∈ I is a list of buckets in U sorted by increasing distance from i. It is not hard to see that {πi1, πi2, . . . , πit} = BU (i, ri(t)) for any t, i.e., for each row of π, its first t entries is a t-point ball around i in 2 In [12], the definition also includes a minimum threshold of the size of balls. For simplicity, we drop this assumption. We can do so because for any size state S, the threshold has to be sufficiently small to enforce the nearest neighbor search, which means that the threshold is independent of |U|

<<向上翻页向下翻页>>