The code at the root is similar to the case for an inter- from the parent call,for the right subtrees,or by taking the nal node,except that a singleton surviving element of V minimum of the leftmost values of the right subtrees,for the becomes the chosen successor as in the original pairing al- left subtrees.Details are given in the full paper. gorithm. Note the addition of the root variable to the probe mes- 3.5 Failures,Latecomers,and Churn sage,which tracks the root of the component that u belongs An important issue in building peer-to-peer systems is to.We assume that this variable is updated as part of the churn,the rapid arrival and departure of component nodes. merge protocol. Arrival of new nodes is not a problem:we can simply treat Incoming probes are similarly handled by a convergecast them as very slow nodes in the asynchronous pairing al- operation.The idea is that any node receiving one or more gorithm.But our algorithms do not deal well with node probes forwards the first to its parent and rejects any others. departures;indeed,the failure of any node during the tree This allows the root to accept exactly one incoming probe construction procedure could in principle lead to deadlock on behalf of the simulated supernode.Enforcement of the of the entire system.We believe that a judicious use of time reject-all-but-one strategy is handled using a flag probed that outs combined with restarting parts of the algorithm could is reset during the merge procedure.Note that this flag handle such difficulties,but avoiding deadlocks or inconsis- appears in both leaf and internal nodes;a process simulating tencies will require substantial further work. two such nodes must maintain two separate flags. Upon receiving message (v,u,probe,root')from node v do: 4.LOWER BOUNDS Let neighbors-neighborsuv. The upper bound of O((d+W)log n)on time to sort nodes in a weakly-connected graph contrasts with our best current If root'=root send (u,v,same_component)to v. lower bound of (d+log n),which is proved in this section. Else If probed =0:send (u,parent,probed,u,root') We suspect that the lower bound is closer to the optimal to parent and set probed-1. time;this issue is discussed further in Section 5. Else Send (u,v.reject)to v. The model we use here is a simplification of the one de- fined in Section 2.During the algorithm,each vertex in Upon receiving message (v,u,probed,leaf,root')from a weakly connected directed graph G represents a process node u do: with some unique identifier u,maintaining a knowledge set Ku which contains the identifiers of endpoints of all its If probed 0:send (u,parent,probed,leaf,root') outgoing edges.A communication is denoted as a triple to parent and set probed -1. (u,v,w),where u,v and w are processes identifiers as well Else Send (leaf,v,reject)to v. as v,w E Ku,and after the communication,K becomes K.Uw,as well as a new directed edge from v to w is The root responds to probed messages as if they were added to G.A procedure is defined as a sequence of such probes:accepting and switching to a PROBED state if in triples,which are arranged into different time units,where an ISOLATED state,and rejecting otherwise. in each time unit there is at most one triple starting with It is not difficult to see that these procedures correctly u for any u in G.Besides,a total order of all identifiers is simulate the behavior of the pairing algorithm in handling given and for each identifier u,there is a unique successor, probe messages.The only tricky part of this analysis is to denoted as succ(u).A procedure is said to yield a sorted argue that message arrivals are serialized properly:in par- list from G,if after the procedure,for any u in G,it holds ticular,responses to probes arrive at times that are consis- that succ(u)∈Ku tent with the behavior of a single node running the pairing One may easily notice that this model only captures the algorithm.But here the assumption that the system is asyn- aspect of exchanges of knowledge of identifiers.However, chronous works for us,as the algorithm guarantees that a since the knowledge availability is a necessary condition for probe is rejected only if some other probe can be accepted, the sorting problem,this model is sufficient to build the and the delay in propagating any probe that is ultimately lower bounds. accepted up the tree can be hidden behind asynchronous In the full paper,we show that (a)any graph with diam- message delays.For a synchronous system,we can instead eter A requires (log A)time to sort in the worst case,and explicitly delay responding to any probes until the converge- (b)any graph that can be separated into d components by cast has had time to terminate.We omit the details. the removal of a single vertex requires (d)time to sort in 3.4 Building a Ring the worst case.Considering a graph consisting of a degree-d star with a chain of n-d-I nodes attached to one of its The preceding sections allow us to build a tree of nodes outer vertices then gives: quickly,but do not quite achieve our original goal of building the sorted base ring of a ring-structured distributed data THEOREM 3.For any n >d >0,there erists a degree- structure like Chord or a skip graph.Building such a ring d weakly-connected graph of n nodes with some identifier is,fortunately,an easy extension of building a binary tree, permutation,such that,for any procedure,yielding a sorted as it is enough for each leaf node to know its successor, list from this graph requires (d+log(n-d+1))time. which can be computed quickly from the tree structure.A natural way to do this is to have each node in the tree keep PROOF.Let G be a graph whose underlying graph is a track of its minimum and maximum leaf,values which can d-degree star with a chain of n-d-1 vertices connected easily be updated during the merge procedure.To inform to one of its outer vertices.Notice that the diameter of G's a leaf node of its successor,we pass with each recursive underlying graph is n-d+1,and after removing the central call to Merge the identity of the first node to the right of vertex of star,G is separated into d components.Then by the trees being merged (the minimum-key leaf node in the the above two lemmas,it is easy to obtain the lower bound case of the rightmost trees).This value is either obtained on the running time (d+log(n-d+1)).The code at the root is similar to the case for an internal node, except that a singleton surviving element of V becomes the chosen successor as in the original pairing algorithm. Note the addition of the root variable to the probe message, which tracks the root of the component that u belongs to. We assume that this variable is updated as part of the merge protocol. Incoming probes are similarly handled by a convergecast operation. The idea is that any node receiving one or more probes forwards the first to its parent and rejects any others. This allows the root to accept exactly one incoming probe on behalf of the simulated supernode. Enforcement of the reject-all-but-one strategy is handled using a flag probed that is reset during the merge procedure. Note that this flag appears in both leaf and internal nodes; a process simulating two such nodes must maintain two separate flags. Upon receiving message (v, u, probe, root′ ) from node v do: Let neighbors←neighbors∪{v}. If root′ = root send (u, v, same component) to v. Else If probed = 0: send (u, parent, probed, u, root′ ) to parent and set probed ← 1. Else Send (u, v, reject) to v. Upon receiving message (v, u, probed, leaf, root′ ) from node v do: If probed = 0: send (u, parent, probed, leaf, root′ ) to parent and set probed ← 1. Else Send (leaf, v, reject) to v. The root responds to probed messages as if they were probes: accepting and switching to a PROBED state if in an ISOLATED state, and rejecting otherwise. It is not difficult to see that these procedures correctly simulate the behavior of the pairing algorithm in handling probe messages. The only tricky part of this analysis is to argue that message arrivals are serialized properly: in particular, responses to probes arrive at times that are consistent with the behavior of a single node running the pairing algorithm. But here the assumption that the system is asynchronous works for us, as the algorithm guarantees that a probe is rejected only if some other probe can be accepted, and the delay in propagating any probe that is ultimately accepted up the tree can be hidden behind asynchronous message delays. For a synchronous system, we can instead explicitly delay responding to any probes until the convergecast has had time to terminate. We omit the details. 3.4 Building a Ring The preceding sections allow us to build a tree of nodes quickly, but do not quite achieve our original goal of building the sorted base ring of a ring-structured distributed data structure like Chord or a skip graph. Building such a ring is, fortunately, an easy extension of building a binary tree, as it is enough for each leaf node to know its successor, which can be computed quickly from the tree structure. A natural way to do this is to have each node in the tree keep track of its minimum and maximum leaf, values which can easily be updated during the merge procedure. To inform a leaf node of its successor, we pass with each recursive call to Merge the identity of the first node to the right of the trees being merged (the minimum-key leaf node in the case of the rightmost trees). This value is either obtained from the parent call, for the right subtrees, or by taking the minimum of the leftmost values of the right subtrees, for the left subtrees. Details are given in the full paper. 3.5 Failures, Latecomers, and Churn An important issue in building peer-to-peer systems is churn, the rapid arrival and departure of component nodes. Arrival of new nodes is not a problem: we can simply treat them as very slow nodes in the asynchronous pairing algorithm. But our algorithms do not deal well with node departures; indeed, the failure of any node during the treeconstruction procedure could in principle lead to deadlock of the entire system. We believe that a judicious use of timeouts combined with restarting parts of the algorithm could handle such difficulties, but avoiding deadlocks or inconsistencies will require substantial further work. 4. LOWER BOUNDS The upper bound of O((d+W) log n) on time to sort nodes in a weakly-connected graph contrasts with our best current lower bound of Ω(d + log n), which is proved in this section. We suspect that the lower bound is closer to the optimal time; this issue is discussed further in Section 5. The model we use here is a simplification of the one de- fined in Section 2. During the algorithm, each vertex in a weakly connected directed graph G represents a process with some unique identifier u, maintaining a knowledge set Ku which contains the identifiers of endpoints of all its outgoing edges. A communication is denoted as a triple (u, v, w), where u, v and w are processes identifiers as well as v, w ∈ Ku, and after the communication, Kv becomes Kv ∪ {w}, as well as a new directed edge from v to w is added to G. A procedure is defined as a sequence of such triples, which are arranged into different time units, where in each time unit there is at most one triple starting with u for any u in G. Besides, a total order of all identifiers is given and for each identifier u, there is a unique successor, denoted as succ(u). A procedure is said to yield a sorted list from G, if after the procedure, for any u in G, it holds that succ(u) ∈ Ku. One may easily notice that this model only captures the aspect of exchanges of knowledge of identifiers. However, since the knowledge availability is a necessary condition for the sorting problem, this model is sufficient to build the lower bounds. In the full paper, we show that (a) any graph with diameter ∆ requires Ω(log ∆) time to sort in the worst case, and (b) any graph that can be separated into d components by the removal of a single vertex requires Ω(d) time to sort in the worst case. Considering a graph consisting of a degree-d star with a chain of n − d − 1 nodes attached to one of its outer vertices then gives: Theorem 3. For any n > d > 0, there exists a degreed weakly-connected graph of n nodes with some identifier permutation, such that, for any procedure, yielding a sorted list from this graph requires Ω(d + log(n − d + 1)) time. Proof. Let G be a graph whose underlying graph is a d-degree star with a chain of n − d − 1 vertices connected to one of its outer vertices. Notice that the diameter of G’s underlying graph is n−d+ 1, and after removing the central vertex of star, G is separated into d components. Then by the above two lemmas, it is easy to obtain the lower bound on the running time Ω(d + log(n − d + 1))