正在加载图片...
identities,and requires storing only O(d)identities per node leaves of the tree to communicate with other supern- Our algorithm is constructed from three components: odes,with internal nodes aggregating the incoming data for delivery to the root.This increases the tree's 1.A randomized pairing algorithm that,starting from effective bandwidth proportional to the size of the tree a degree-d weakly-connected graph,pairs off a con- while keeping the contention down to the minimum stant fraction of the nodes on average in O(1)time. O(d)necessitated by the degree of the original knowl- This algorithm is described in Section 3.1.The prob- edge graph.The cost of this strategy is that incoming lem solved is similar to the problem of constructing a messages are effectively delayed by the depth w of distributed matching 11,20,except that there is no the Patricia tree,adding a factor of W slowdown to requirement that paired nodes be joined by an edge in the algorithm.Using the entire tree also means that the original knowledge graph.A complication is that the simulated supernode must wait for a merge to com- since the original knowledge graph is directed,at any plete before starting a new iteration of the pairing al- time a node may learn of the existence of new neigh- gorithm,which prevents its from taking advantage of bors,and care needs to be taken to prevent deadlocks. pipelined merges.We discuss some ideas for how such The output of the pairing algorithm is used to join in- bottlenecks might be avoided in Section 5. dividual nodes into simulated supernodes that then participate in subsequent iterations of the pairing algo- We also consider how to use the tree to build a ring (Sec rithm.These supernodes are in turn joined into larger tion 3.4)and the effects of churn (Section 3.5). supernodes,until only a single supernode (consisting In an asynchronous model,the total time for the expected of all the nodes in the system)remains,after an ex- O(log n)iterations of pairing multiplied by the O(W)merg- pected O(log n)iterations.For this construction to ing and communications costs of each iteration is O(W log n) work,we need two additional mechanisms In a synchronous model,this can be improved by using the simple pairing algorithm described above to construct a bal- 2.A distributed merging algorithm for combining bal- anced tree of depth O(log n)in O((d+log n)log n)time (the anced trees of nodes.In a synchronous model,this d factor vanishes if nodes are allowed to send more than one algorithm can be very simple:because all the trees outgoing message per time unit),and the nodes can then be constructed after k rounds will have depth O(k),it is sorted using pipelined Patricia tree merges in an additional enough to recruit a new root node to join two trees O(W +logn)time for a total of O(W+(d+logn)logn) together into a tree of depth O(k+1).In an asyn- time.All of these algorithms have contention at most O(d). chronous model,an adversarial scheduler can arrange use messages of size O(W),and store O(dW)bits of state for particularly fast nodes to merge early followed by a per node. slower succession of singleton nodes,leading to a tree These limits compare favorably to previously known algo- of depth (n)using the simple algorithm.Instead. rithms in this model for resource discovery 2,10,13,14,16 we describe an algorithm obtained by parallelizing the or leader election [4,which also construct trees over nodes sequential Patricia tree merge procedure of Okasaki that initially form a weakly-connected graph.In these al- and Gill [18]:this algorithm,described in Section 3.2 gorithms,a single participant may receive messages from a assigns a single leaf node and at most one internal very large number of other participants in a short amount of node of the Patricia tree to each physical node in the time;messages are often impractically long,conveying in the system,and merges two Patricia trees of depth W in worst case the identities of every node in the system;and the time O(W).where W is the length of a node identity resulting trees have very high degree,which not only leads to Though we do not use this fact in our main result,our high contention in any algorithm that uses them but limits merging algorithm can be pipelined:a depth-k tree of performance if nodes are also limited in how many messages up to 2merge operations can be carried out in parallel they can send per time unit.We discuss these results further in O(k+W)time with O(1)contention. in Section 1.1. Note that because Patricia trees are a form of binary Though our algorithm is reasonably efficient,we do not search tree,a consequence of using Patricia trees to believe that it is optimal.The best lower bound we know represent supernodes is that the leaves are automati- how to prove for constructing a sorted list of nodes start- cally sorted.We use this fact to generate the sorted ing from a weakly-connected graph with maximum degree d ring of physical nodes that our main result promises, in the synchronous model is (d+log n);here the d term but the ability to rapidly generate a binary search tree comes from the assumption that a node can only send to one with low contention starting from a weakly-connected recipient per round,and the logn term comes from the time knowledge graph may also be useful for other applica- to communicate from one end of a length-n line to the other tions. using pointer jumping (see Section 4 for details).Our sus- picion is that this lower bound is in close to the true upper 3.A supernode simulation that allows trees of ordi- bound,and that an algorithm that interleaved pairing and nary physical nodes to simulate a single supernode in merging operations more tightly could achieve something the pairing algorithm (Section 3.3).Though the es- very close to it with high probability. sential idea of this simulation is simple-have the root Due to space limitations,most proofs in this extended of each tree simulate the supernode-some care needs abstract are omitted or only briefly sketched. to be taken to keep the root from being swamped with information.The actual simulation algorithm uses the 11 Related Work IIn a synchronous model,time is measured in rounds.In an The problem of organizing a weakly-connected set of nodes asynchronous model,time is measured by assuming a con- into a useful data structure combines aspects of both sorting stant maximum message delay,and assigning all events the and resource discovery.We discuss the extensive prior lit- latest possible time consistent with this assumption.Details erature on resource discovery first,and then consider some are given in Section 2. other algorithms that solve problems closer to ours.identities, and requires storing only O(d) identities per node. Our algorithm is constructed from three components: 1. A randomized pairing algorithm that, starting from a degree-d weakly-connected graph, pairs off a con￾stant fraction of the nodes on average in O(1) time.1 This algorithm is described in Section 3.1. The prob￾lem solved is similar to the problem of constructing a distributed matching [11, 20], except that there is no requirement that paired nodes be joined by an edge in the original knowledge graph. A complication is that since the original knowledge graph is directed, at any time a node may learn of the existence of new neigh￾bors, and care needs to be taken to prevent deadlocks. The output of the pairing algorithm is used to join in￾dividual nodes into simulated supernodes that then participate in subsequent iterations of the pairing algo￾rithm. These supernodes are in turn joined into larger supernodes, until only a single supernode (consisting of all the nodes in the system) remains, after an ex￾pected O(log n) iterations. For this construction to work, we need two additional mechanisms. 2. A distributed merging algorithm for combining bal￾anced trees of nodes. In a synchronous model, this algorithm can be very simple: because all the trees constructed after k rounds will have depth O(k), it is enough to recruit a new root node to join two trees together into a tree of depth O(k + 1). In an asyn￾chronous model, an adversarial scheduler can arrange for particularly fast nodes to merge early followed by a slower succession of singleton nodes, leading to a tree of depth Θ(n) using the simple algorithm. Instead, we describe an algorithm obtained by parallelizing the sequential Patricia tree merge procedure of Okasaki and Gill [18]; this algorithm, described in Section 3.2, assigns a single leaf node and at most one internal node of the Patricia tree to each physical node in the system, and merges two Patricia trees of depth W in time O(W), where W is the length of a node identity. Though we do not use this fact in our main result, our merging algorithm can be pipelined: a depth-k tree of up to 2k merge operations can be carried out in parallel in O(k + W) time with O(1) contention. Note that because Patricia trees are a form of binary search tree, a consequence of using Patricia trees to represent supernodes is that the leaves are automati￾cally sorted. We use this fact to generate the sorted ring of physical nodes that our main result promises, but the ability to rapidly generate a binary search tree with low contention starting from a weakly-connected knowledge graph may also be useful for other applica￾tions. 3. A supernode simulation that allows trees of ordi￾nary physical nodes to simulate a single supernode in the pairing algorithm (Section 3.3). Though the es￾sential idea of this simulation is simple—have the root of each tree simulate the supernode—some care needs to be taken to keep the root from being swamped with information. The actual simulation algorithm uses the 1 In a synchronous model, time is measured in rounds. In an asynchronous model, time is measured by assuming a con￾stant maximum message delay, and assigning all events the latest possible time consistent with this assumption. Details are given in Section 2. leaves of the tree to communicate with other supern￾odes, with internal nodes aggregating the incoming data for delivery to the root. This increases the tree’s effective bandwidth proportional to the size of the tree while keeping the contention down to the minimum O(d) necessitated by the degree of the original knowl￾edge graph. The cost of this strategy is that incoming messages are effectively delayed by the depth W of the Patricia tree, adding a factor of W slowdown to the algorithm. Using the entire tree also means that the simulated supernode must wait for a merge to com￾plete before starting a new iteration of the pairing al￾gorithm, which prevents its from taking advantage of pipelined merges. We discuss some ideas for how such bottlenecks might be avoided in Section 5. We also consider how to use the tree to build a ring (Sec￾tion 3.4) and the effects of churn (Section 3.5). In an asynchronous model, the total time for the expected O(log n) iterations of pairing multiplied by the O(W) merg￾ing and communications costs of each iteration is O(W log n). In a synchronous model, this can be improved by using the simple pairing algorithm described above to construct a bal￾anced tree of depth O(log n) in O((d+log n) log n) time (the d factor vanishes if nodes are allowed to send more than one outgoing message per time unit), and the nodes can then be sorted using pipelined Patricia tree merges in an additional O(W + log n) time for a total of O(W + (d + log n) log n) time. All of these algorithms have contention at most O(d), use messages of size O(W), and store O(dW) bits of state per node. These limits compare favorably to previously known algo￾rithms in this model for resource discovery [2,10,13,14,16] or leader election [4], which also construct trees over nodes that initially form a weakly-connected graph. In these al￾gorithms, a single participant may receive messages from a very large number of other participants in a short amount of time; messages are often impractically long, conveying in the worst case the identities of every node in the system; and the resulting trees have very high degree, which not only leads to high contention in any algorithm that uses them but limits performance if nodes are also limited in how many messages they can send per time unit. We discuss these results further in Section 1.1. Though our algorithm is reasonably efficient, we do not believe that it is optimal. The best lower bound we know how to prove for constructing a sorted list of nodes start￾ing from a weakly-connected graph with maximum degree d in the synchronous model is Ω(d + log n); here the d term comes from the assumption that a node can only send to one recipient per round, and the log n term comes from the time to communicate from one end of a length-n line to the other using pointer jumping (see Section 4 for details). Our sus￾picion is that this lower bound is in close to the true upper bound, and that an algorithm that interleaved pairing and merging operations more tightly could achieve something very close to it with high probability. Due to space limitations, most proofs in this extended abstract are omitted or only briefly sketched. 1.1 Related Work The problem of organizing a weakly-connected set of nodes into a useful data structure combines aspects of both sorting and resource discovery. We discuss the extensive prior lit￾erature on resource discovery first, and then consider some other algorithms that solve problems closer to ours
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有