let p be the longest common prefix of r.prefix and y.prefix let b be the first bit of r.prefix that differs from y.prefix 010 allocate a new node z=extra with: z.prefix p z.child b=x 000111 001010 010011 0101 z.child 1]=y return z 010100 010110 3.2.2 Merging Two Patricia Trees:Local View As described above,we think of each node in the tree Figure 1:A Patricia tree storing the strings 000111, as represented by a "virtual process"whose operations are 001010,010011,010100,and010110, carried out by one of the actual processes in the system. Allocating a new node consists of allocating a new virtual process at some physical process.We extend our message- it can begin a second merge in O(1)time,waiting only to passing model to allow virtual processes to communicate fix the identities of its new children.It follows that the first with each other:this involves the corresponding physical merge takes O(W)time,but that subsequent merges add processes sending messages on the virtual processes'behalf. only 0(1)additional time,allowing up to 2 merge opera- with appropriate tags to distinguish between a physical pro- tions to be completed in O(k+W)time if the tree of merges cess's multiple virtual processes. is perfectly balanced.Unfortunately,our pairing algorithm Each call to Merge is handled by the process is not sophisticated enough to construct a balanced tree of that holds node r,and is triggered by a message merges in parallel with the merges occurring,but we can (v,merge,y)from some initiator v,typically r's par- imagine situations where such pipelining may be useful. ent. In response,x first queries y for its state Formally,a non-root node z in the tree stores a bit string with a message (r,y,getState)and y responds with z.prefix and two child pointers r.child[0]and r.child 1].For (y,returnState,(y.prefix,y.child[o],y.child(1)).3 Upon leaf nodes,r.prefix is a key of width W and both child point- receiving y's state,r applies one of the four cases of the ers are null (L).For internal nodes,prefix is the common Merge procedure,issuing up to two merge messages and prefix of all keys in the subtree rooted at z and both child waiting for corresponding return messages before sending pointers are non-null.A root node stores in addition to these its own return back to the initiator v. values a pointer z.extra to the extra node for the tree. Some additional machinery is needed to avoid sending We write x y if z is a prefix of y.The invariant for merge operations to children before they are ready;details the tree is:For each b∈{O,1},ifx.child[bl≠⊥,then are given in the full paper r.child[1-and z.prefix+b z.child[ol.prefix.See Figure 1 for an example THEOREM 2.Merging two Patricia trees of size n and m with W-bit keys requires O(W)time.O(1)contention.and 3.2.1 Merging Two Patricia Trees:Global View O(min(n +m,W min(n,m)))messages of O(W)bits each. Below is a global description of the merging algorithm PROOF.Start with the time bound.Consider a single call Though this is given as pseudocode for a centralized con- to Merge as executed by some process r.The time for x to troller,the reader should not be misled into thinking that complete this merge operation is the maximum time of any a centralized controller is required;instead,all steps of the recursive merges,plus O(1)time to send and receive all the merge can be carried out by direct communication between messages at the top level.Note that except for possibly re- nodes in the trees,as we will describe later. ceiving a return message,r is idle during the recursive calls, /Merge two Patricia trees in parallel,returning the new so its physical process can simulate other virtual processes root with at most one time unit of delay (to receive the return) /extra is the unused node needed for this merge procedure during this time without increasing the contention beyond procedure Merge(x,y,extra) O(1)as long as it simulates only one per branch of the tree if x.prefix =y.prehx (Case1)】 The running time increases by O(1)for each level of the tree, /combine roots and merge matching subtrees in par- giving a total time of O(W). allel: Now let us consider the total number of messages.Each x.child 0-Merge(r.child 0,y.child 0,extra) recursive call to Merge generates O(1)additional messages. //y is freed and used as the extra node for submerge so we just need to bound the number of such calls.Each call x.child 1-Merge(r.child 1,y.child 1,y) returns a distinct node in the combined tree,and the number return z of internal nodes in the combined tree is bounded by n+m else if r.prefix y.prefix (Case 2) the number of leaves;this gives a bound of n+m.To get /merge y with appropriate subtree of the other side of the bound,let T(w,n,m)be the number let b be the first bit in y.prefix that is not in z.prefix of calls needed to merge two trees where w is the number x.child b-Merge(z.child b,y,extra) of bits that are not in the common prefix of both trees,and return z n and m are the number of elements in the two trees.Let else if y.prefix Cr.prefix (Case 3) no and nI be the number of elements in the left and right /merge x with appropriate subtree of y subtrees of the first tree,and mo and mI be the number of let b be the first bit in r.prefix that is not in y.prefix elements in the corresponding subtrees of the second tree y.child]-Merge(y.child[o],r,extra) Then we have return y else if r.prefix and y.prefix are incomparable (Case 4) T(w,n,m)<1+T(w-1,no,mo)+T(w-1,n1,m1), /use the extra node to create a new node that holds 3To escape the one-identifier-per-message restriction,this the common prefix can be sent as three consecutive messages0 00 010 001010 010011 0101 010110 000111 010100 Figure 1: A Patricia tree storing the strings 000111, 001010, 010011, 010100, and 010110. it can begin a second merge in O(1) time, waiting only to fix the identities of its new children. It follows that the first merge takes O(W) time, but that subsequent merges add only O(1) additional time, allowing up to 2k merge operations to be completed in O(k+W) time if the tree of merges is perfectly balanced. Unfortunately, our pairing algorithm is not sophisticated enough to construct a balanced tree of merges in parallel with the merges occurring, but we can imagine situations where such pipelining may be useful. Formally, a non-root node x in the tree stores a bit string x.prefix and two child pointers x.child[0] and x.child[1]. For leaf nodes, x.prefix is a key of width W and both child pointers are null (⊥). For internal nodes, x.prefix is the common prefix of all keys in the subtree rooted at x and both child pointers are non-null. A root node stores in addition to these values a pointer x.extra to the extra node for the tree. We write x ⊑ y if x is a prefix of y. The invariant for the tree is: For each b ∈ {0, 1}, if x.child[b] 6= ⊥, then x.child[1 − b] 6= ⊥ and x.prefix + b ⊑ x.child[b].prefix. See Figure 1 for an example. 3.2.1 Merging Two Patricia Trees: Global View Below is a global description of the merging algorithm. Though this is given as pseudocode for a centralized controller, the reader should not be misled into thinking that a centralized controller is required; instead, all steps of the merge can be carried out by direct communication between nodes in the trees, as we will describe later. // Merge two Patricia trees in parallel, returning the new root // extra is the unused node needed for this merge procedure procedure Merge(x, y, extra) if x.prefix = y.prefix (Case 1) // combine roots and merge matching subtrees in parallel: x.child[0] ←Merge(x.child[0], y.child[0], extra) // y is freed and used as the extra node for submerge x.child[1] ←Merge(x.child[1], y.child[1], y) return x else if x.prefix ⊑ y.prefix (Case 2) // merge y with appropriate subtree of x let b be the first bit in y.prefix that is not in x.prefix x.child[b] ←Merge(x.child[b], y, extra) return x else if y.prefix ⊑ x.prefix (Case 3) // merge x with appropriate subtree of y let b be the first bit in x.prefix that is not in y.prefix y.child[b] ←Merge(y.child[b], x, extra) return y else if x.prefix and y.prefix are incomparable (Case 4) // use the extra node to create a new node that holds the common prefix let p be the longest common prefix of x.prefix and y.prefix let b be the first bit of x.prefix that differs from y.prefix allocate a new node z = extra with: z.prefix = p z.child[b] = x z.child[1 − b] = y return z 3.2.2 Merging Two Patricia Trees: Local View As described above, we think of each node in the tree as represented by a “virtual process” whose operations are carried out by one of the actual processes in the system. Allocating a new node consists of allocating a new virtual process at some physical process. We extend our messagepassing model to allow virtual processes to communicate with each other; this involves the corresponding physical processes sending messages on the virtual processes’ behalf, with appropriate tags to distinguish between a physical process’s multiple virtual processes. Each call to Merge is handled by the process that holds node x, and is triggered by a message (v, x, merge, y) from some initiator v, typically x’s parent. In response, x first queries y for its state with a message (x, y, getState) and y responds with (y, x, returnState,(y.prefix, y.child[0], y.child[1])).3 Upon receiving y’s state, x applies one of the four cases of the Merge procedure, issuing up to two merge messages and waiting for corresponding return messages before sending its own return back to the initiator v. Some additional machinery is needed to avoid sending merge operations to children before they are ready; details are given in the full paper. Theorem 2. Merging two Patricia trees of size n and m with W-bit keys requires O(W) time, O(1) contention, and O(min(n + m,W min(n, m))) messages of O(W) bits each. Proof. Start with the time bound. Consider a single call to Merge as executed by some process x. The time for x to complete this merge operation is the maximum time of any recursive merges, plus O(1) time to send and receive all the messages at the top level. Note that except for possibly receiving a return message, x is idle during the recursive calls, so its physical process can simulate other virtual processes with at most one time unit of delay (to receive the return) during this time without increasing the contention beyond O(1) as long as it simulates only one per branch of the tree. The running time increases by O(1) for each level of the tree, giving a total time of O(W). Now let us consider the total number of messages. Each recursive call to Merge generates O(1) additional messages, so we just need to bound the number of such calls. Each call returns a distinct node in the combined tree, and the number of internal nodes in the combined tree is bounded by n+ m, the number of leaves; this gives a bound of n + m. To get the other side of the bound, let T(w, n, m) be the number of calls needed to merge two trees where w is the number of bits that are not in the common prefix of both trees, and n and m are the number of elements in the two trees. Let n0 and n1 be the number of elements in the left and right subtrees of the first tree, and m0 and m1 be the number of elements in the corresponding subtrees of the second tree. Then we have T(w, n, m) ≤ 1 + T(w − 1, n0, m0) + T(w − 1, n1, m1), 3To escape the one-identifier-per-message restriction, this can be sent as three consecutive messages