正在加载图片...
PROPOSING:If chosen=v,then let THEOREM 1.The expected running time of the pairing state←-PAIRED and obj←-u,and protocol is O((M+D)logn) reply with (u,v,accept); if otherwise,reply with (u,v,reject-propose); For the Patricia trees described in Section 3.2.D and PROPOSED:Reply with (u,v,reject-propose); M are both proportional to the maximum depth W given PAIRED:Reply with (u,v,paired). by the length in bits of node identifiers,giving a cost of Upon receiving message (pred,u,pair,w)from node O(W log n).In the synchronous model,an additional delay pred do: of O(d)time units is imposed on each probing step,because each leaf node may have to probe up to d neighbors and is If state≠PAIRED then: limited to sending one message per time unit.This gives a Let state-PAIRED and obj-w; synchronous running time of O((d+W)logn). If state PROPOSED then: Send a message (u,waiting,paired)to node 3.2 Merging waiting. In this section,we describe a distributed implementation Upon receiving message (pred,u,no_pair)from node of a variant on big-endian Patricia trees [18.This im- pred do: plementation permits two trees to be merged in time O(W) If state=PROBED then: with O(1)contention and O(min(n+m.W min(n,m)))mes- Let state-PROPOSING. sages of size O(W)each,where W is the length in bits of else if state PROPOSED then: an identifier and n and m are the number of elements in the Let state+PAIRED and obj+waiting: two trees. A Patricia tree 17 is similar to a binary trie 5,6 with all Send a message (u,waiting,accept)to node keys stored in the leaves,except that paths with no branches waiting. are compressed to single edges.We assume that all keys The algorithm terminates when there is only one node are bit-strings with fixed width W:shorter strings can be remaining. padded with zeros.Each node in the tree stores a prefix that is common to all of the keys in its subtree.The two children 3.1.2 Analysis of a node with prefix z store prefixes that begin with r0 and The analysis of the pairing algorithm is quite involved, r1;it is possible that their prefixes will be longer if all nodes and can be found in the full paper.We give some highlights with prefix rb have additional prefix bits in common of the argument here.The basic idea is to analyze the se In our implementation,keys are identifiers of processes quence of graphs Gt derived from the neighbor lists N at and there is a natural one-to-one correspondence between each time t.We use M for the time to perform a merge keys,processes,and the leaves of the tree.To allow opera- operation and D for the maximum message delay;note that tions to be performed in parallel on internal nodes,we must since the nodes in the protocol may in fact be trees simu- also assign processes to these nodes.Because Patricia trees lating single supernodes,D can be as large as the depth of are binary trees,there are exactly n-1 internal nodes in the tree. a Patricia tree with n leaf nodes.Thus we can assign one First,we show that for any edge (u,v)in the initial knowl- internal node to each process,leaving one process left over. edge graph Go,at least one of (u,v)or (v,u)(taking into Each process is thus responsible for simulating at most two account new identities assumed by supernodes that absorb nodes in the tree.If we think of the nodes in the tree as them)appears in Ge until the nodes merge;this implies simulated processes,the contention on any real process is at that the operation of the algorithm does not disconnect the most twice the contention on any simulated process. graph. The unused "internal node"of the leftover process is kept Second.we define an iteration as an interval between in reserve as an extra node by the root of the tree.When times when the node enters the ISOLATED state,and show two trees merge,the extra node from one of them is used to by case analysis that during each iteration,a node remains supply the new internal node required for the merge,and the at most O(D)time in the PROBED,PROPOSED,and other is kept in reserve for a subsequent merge.Note that in PROPOSING states,including time waiting for a neighbor the initial state of any process,it is both leaf and root of a to respond in the PROPOSING state,and at most M+O(D) singleton tree,and thus acts as its own extra node.The use time in the PAIRED state.since this state leads immedi- of such extra nodes to avoid the need for a global allocation ately to a merge.Bounding the time in the ISOLATED mechanism for internal nodes is the main technical trick that state requires a more detailed analysis.but by considering distinguishes our merge algorithm from Okasaki-Gill. all possible states of the node's neighbors we can show that Let us now describe the merge procedure.Intuitively T(ISOLATED)<T(PROBED)+T(PROPOSING) when two Patricia trees merge,either their roots share a +T(PROPOSED)+T(PAIRED)+O(D)=O(M+D).Since common prefix,in which case the roots are combined and each state can be entered at most once during a single iter- the matching subtrees of each tree are merged in parallel; ation,the maximum time Tr for an iteration is O(M+D) or one root's prefix is a prefix of another,in which case the Finally,we show by an argument similar to that sketched root with the longer prefix is merged into the appropriate out for the simple pairing algorithm that during an iteration, child of the other;or the prefixes are incomparable,in which for each node there is a probability of at least 1/2 that the case the two old roots become children of a new root.What node either is paired or is the unique unpaired successor makes it possible to pipeline this procedure is that wave of of a predecessor that is paired.The situation is slightly merging proceeds down the trees one layer at a time,and as complicated by the fact that iterations are not synchronized soon as the roots of merging subtrees have communicated across nodes,but with some additional work it is possible they can determine immediately which node becomes the to show that the expected number of surviving supernodes root of the new subtree and which nodes are denoted to at the end of any interval of 2T time units is at most 8/9 extra nodes in inferior merges,merged with children,etc of the number at the start.This suffices to prove: This immediate determination of the new root means thatPROPOSING: If chosen=v, then let state←PAIRED and obj←v, and reply with (u, v, accept); if otherwise, reply with (u, v, reject propose); PROPOSED: Reply with (u, v, reject propose); PAIRED: Reply with (u, v, paired). Upon receiving message (pred, u, pair, w) from node pred do: If state6=PAIRED then: Let state←PAIRED and obj←w; If state = PROPOSED then: Send a message (u, waiting, paired) to node waiting. Upon receiving message (pred, u, no pair) from node pred do: If state=PROBED then: Let state←PROPOSING; else if state = PROPOSED then: Let state←PAIRED and obj←waiting; Send a message (u, waiting, accept) to node waiting. The algorithm terminates when there is only one node remaining. 3.1.2 Analysis The analysis of the pairing algorithm is quite involved, and can be found in the full paper. We give some highlights of the argument here. The basic idea is to analyze the se￾quence of graphs Gt derived from the neighbor lists Nu at each time t. We use M for the time to perform a merge operation and D for the maximum message delay; note that since the nodes in the protocol may in fact be trees simu￾lating single supernodes, D can be as large as the depth of the tree. First, we show that for any edge (u, v) in the initial knowl￾edge graph G0, at least one of (u, v) or (v, u) (taking into account new identities assumed by supernodes that absorb them) appears in Gt until the nodes merge; this implies that the operation of the algorithm does not disconnect the graph. Second, we define an iteration as an interval between times when the node enters the ISOLATED state, and show by case analysis that during each iteration, a node remains at most O(D) time in the PROBED, PROPOSED, and PROPOSING states, including time waiting for a neighbor to respond in the PROPOSING state, and at most M+O(D) time in the PAIRED state, since this state leads immedi￾ately to a merge. Bounding the time in the ISOLATED state requires a more detailed analysis, but by considering all possible states of the node’s neighbors we can show that T(ISOLATED) ≤ T(PROBED) + T(PROPOSING) +T(PROPOSED)+T(PAIRED)+O(D) = O(M+D). Since each state can be entered at most once during a single iter￾ation, the maximum time TI for an iteration is O(M + D). Finally, we show by an argument similar to that sketched out for the simple pairing algorithm that during an iteration, for each node there is a probability of at least 1/2 that the node either is paired or is the unique unpaired successor of a predecessor that is paired. The situation is slightly complicated by the fact that iterations are not synchronized across nodes, but with some additional work it is possible to show that the expected number of surviving supernodes at the end of any interval of 2TI time units is at most 8/9 of the number at the start. This suffices to prove: Theorem 1. The expected running time of the pairing protocol is O((M + D) log n). For the Patricia trees described in Section 3.2, D and M are both proportional to the maximum depth W given by the length in bits of node identifiers, giving a cost of O(W log n). In the synchronous model, an additional delay of O(d) time units is imposed on each probing step, because each leaf node may have to probe up to d neighbors and is limited to sending one message per time unit. This gives a synchronous running time of O((d + W) log n). 3.2 Merging In this section, we describe a distributed implementation of a variant on big-endian Patricia trees [18]. This im￾plementation permits two trees to be merged in time O(W) with O(1) contention and O(min(n+m, W min(n, m))) mes￾sages of size O(W) each, where W is the length in bits of an identifier and n and m are the number of elements in the two trees. A Patricia tree [17] is similar to a binary trie [5,6] with all keys stored in the leaves, except that paths with no branches are compressed to single edges. We assume that all keys are bit-strings with fixed width W: shorter strings can be padded with zeros. Each node in the tree stores a prefix that is common to all of the keys in its subtree. The two children of a node with prefix x store prefixes that begin with x0 and x1; it is possible that their prefixes will be longer if all nodes with prefix xb have additional prefix bits in common. In our implementation, keys are identifiers of processes, and there is a natural one-to-one correspondence between keys, processes, and the leaves of the tree. To allow opera￾tions to be performed in parallel on internal nodes, we must also assign processes to these nodes. Because Patricia trees are binary trees, there are exactly n − 1 internal nodes in a Patricia tree with n leaf nodes. Thus we can assign one internal node to each process, leaving one process left over. Each process is thus responsible for simulating at most two nodes in the tree. If we think of the nodes in the tree as simulated processes, the contention on any real process is at most twice the contention on any simulated process. The unused “internal node” of the leftover process is kept in reserve as an extra node by the root of the tree. When two trees merge, the extra node from one of them is used to supply the new internal node required for the merge, and the other is kept in reserve for a subsequent merge. Note that in the initial state of any process, it is both leaf and root of a singleton tree, and thus acts as its own extra node. The use of such extra nodes to avoid the need for a global allocation mechanism for internal nodes is the main technical trick that distinguishes our merge algorithm from Okasaki-Gill. Let us now describe the merge procedure. Intuitively, when two Patricia trees merge, either their roots share a common prefix, in which case the roots are combined and the matching subtrees of each tree are merged in parallel; or one root’s prefix is a prefix of another, in which case the root with the longer prefix is merged into the appropriate child of the other; or the prefixes are incomparable, in which case the two old roots become children of a new root. What makes it possible to pipeline this procedure is that wave of merging proceeds down the trees one layer at a time, and as soon as the roots of merging subtrees have communicated they can determine immediately which node becomes the root of the new subtree and which nodes are denoted to extra nodes in inferior merges, merged with children, etc. This immediate determination of the new root means that
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有