正在加载图片...
simultaneous requests, will naturally form a kind of mul- 4 5 70 2 3 13 14 ticast tree for retrieving the web page. Once any Coral- distance(nodeids xor 4) Proxy obtains the full file, it inserts a much longer-lived reference to itself(e. g, I hour ) Because the insertion al- gorithm accounts for TTL, these longer-lived references RPC#1(2) will overwrite shorter-lived ones, and they can be stored 4.5,7 R on well-selected nodes even under high insertion load, as CoralProxies periodically renew referrals to resources L0 later described in section 4.2 34 67 RPC#2(1 their caches. A proxy should not evict a web object from its cache while a reference to it may persist in the DSHT. Ideally, proxies would adaptively set TTLs based on cache capacity, though this is not yet implemented 67 PC#3(0) 4 Coral: A Hierarchical Indexing system Figure 2: Example of routing operations in a system contain- This section describes the Coral indexing infrastructure, ing eight nodes with IDs (4, 5, 7, 0, 2, 3, 13, 14. In this illus- which CoralCDN leverages to achieve scalability, self- tration, node R with id 14 is looking up the node closest to organization, and efficient data retrieval. We describe how key k= 4, and we have sorted the nodes by their distance to Coral implements the put and get operations that form k. The top boxed row illustrates XOr distances for the nodes the basis of its distributed sloppy hash table(DSHT) ab- 10, 2, 3, 13, 14] that are initially known by R. R fi rst contacts a straction:the underlying key-based routing layer(4.1), known peer whose distance to k is closest to half of R's distance the dSht algorithms that balance load(4.2), and the (10/2= 5); in this illustration, this peer is node zero, whose changes that enable latency and data-placement optimiza- distance to k is 0 e 4=4. Data in RPC requests and responses tions within a hierarchical set of DSHTs(4.3). Finally, are shown in parentheses and braces, respectively: R asks node we describe the clustering mechanisms that manage this zero for its peers that are half-way closer to k, i.e., those at dis- hierarchical structure(4.4 tance ==2. R inserts these new references into its routing table (middle row ). R now repeats this process, contacting node fi ve. signed IDs in the same 160-bit identifier space. A nod, s L ose distance I is closest to Finally, R contacts node four, 4.1 Corals Key-Based Routing Layer ose distance is 0, and completes its search(bottom row) Corals keys are opaque 160-bit ID values; nodes are ID is the sha-1 hash of its ip address. Coral defines a est node to k or some node whose distance to k is at least distance metric on IDs. Henceforth, we describe a node one bit shorter than Rs. This permits R to visit a se- as being close to a key if the distance between the key and quence of nodes with monotonically decreasing distances the node's ID is small. A Coral put operation stores a [ d1 I to k, such that the encoding of di +1 as a bi- key/value pair at a node close to the key. a get operation nary number has one fewer bit than di. As a result, the searches for stored key/value pairs at nodes successively expected number of iterations for R to discover the clos- closer to the key. To support these operations, a node re- est node to k is logarithmic in the number of node quires some mechanism to discover other nodes close to Figure 2 illustrates the Coral routing algorithm, which any art successively visits nodes whose distances to the key Every DShT contains a routing table. For any key k, a approximately halved each iteration. Traditional key node R's routing table allows it to find a node closer to h, based routing layers attempt to route directly to the node unless R is already the closest node. These routing tables closest to the key whenever possible [25, 26, 31, 35],re- are based on Kademlia [17], which defines the distance sorting to several intermediate hops only when faced with between two values in the ID-space to be their bitwise incomplete routing information. By caching additional exclusive or(XOR), interpreted as an unsigned integer. routing state-beyond the necessary log(n)references- Using the XOR metric, IDs with longer matching prefixes these systems in practice manage to achieve routing in a (of most significant bits) are numerically closer constant number of hops. We observe that frequent refer- The size of a nodes routing table in a DSHT is logarith- ences to the same key can generate high levels of traffic in mic in the total number of nodes comprising the dShT. nodes close to the key. This congestion, called tree satul- If a node R is not the closest node to some key k, then ration, was first identified in shared-memory interconnec- R's routing table almost always contains either the clos- tion networks [24fetches a web page, all CoralProxies, other than the first simultaneous requests, will naturally form a kind of mul￾ticast tree for retrieving the web page. Once any Coral￾Proxy obtains the full file, it inserts a much longer-lived reference to itself (e.g., 1 hour). Because the insertion al￾gorithm accounts for TTL, these longer-lived references will overwrite shorter-lived ones, and they can be stored on well-selected nodes even under high insertion load, as later described in Section 4.2. CoralProxies periodically renew referrals to resources in their caches. A proxy should not evict a web object from its cache while a reference to it may persist in the DSHT. Ideally, proxies would adaptively set TTLs based on cache capacity, though this is not yet implemented. 4 Coral: A Hierarchical Indexing System This section describes the Coral indexing infrastructure, which CoralCDN leverages to achieve scalability, self￾organization, and efficient data retrieval. We describe how Coral implements the put and get operations that form the basis of its distributed sloppy hash table (DSHT) ab￾straction: the underlying key-based routing layer (4.1), the DSHT algorithms that balance load (4.2), and the changes that enable latency and data-placement optimiza￾tions within a hierarchical set of DSHTs (4.3). Finally, we describe the clustering mechanisms that manage this hierarchical structure (4.4). 4.1 Coral’s Key-Based Routing Layer Coral’s keys are opaque 160-bit ID values; nodes are as￾signed IDs in the same 160-bit identifier space. A node’s ID is the SHA-1 hash of its IP address. Coral defines a distance metric on IDs. Henceforth, we describe a node as being close to a key if the distance between the key and the node’s ID is small. A Coral put operation stores a key/value pair at a node close to the key. A get operation searches for stored key/value pairs at nodes successively closer to the key. To support these operations, a node re￾quires some mechanism to discover other nodes close to any arbitrary key. Every DSHT contains a routing table. For any key k, a node R’s routing table allows it to find a node closer to k, unless R is already the closest node. These routing tables are based on Kademlia [17], which defines the distance between two values in the ID-space to be their bitwise exclusive or (XOR), interpreted as an unsigned integer. Using the XOR metric, IDs with longer matching prefixes (of most significant bits) are numerically closer. The size of a node’s routing table in a DSHT is logarith￾mic in the total number of nodes comprising the DSHT. If a node R is not the closest node to some key k, then R’s routing table almost always contains either the clos- 6 7 9 4 6 7 9 10 0 1 3 4 10 RPC#3 (0) R 0 2 3 13 14 {4, 5, 7} 0 1 3 4 6 7 9 10 4 5 7 {4, 5, 7} RPC#1 (2) target 2 RPC#2 (1) target 0 distance (nodeids xor 4) target 5 nodeids Figure 2: Example of routing operations in a system contain￾ing eight nodes with IDs {4, 5, 7, 0, 2, 3, 13, 14}. In this illus￾tration, node R with id = 14 is looking up the node closest to key k = 4, and we have sorted the nodes by their distance to k. The top boxed row illustrates XOR distances for the nodes {0, 2, 3, 13, 14} that are initially known by R. R first contacts a known peer whose distance to k is closest to half of R’s distance (10/2 = 5); in this illustration, this peer is node zero, whose distance to k is 0 ⊕ 4 = 4. Data in RPC requests and responses are shown in parentheses and braces, respectively: R asks node zero for its peers that are half-way closer to k, i.e., those at dis￾tance 4 2 =2. R inserts these new references into its routing table (middle row). R now repeats this process, contacting node five, whose distance 1 is closest to 4 2 . Finally, R contacts node four, whose distance is 0, and completes its search (bottom row). est node to k, or some node whose distance to k is at least one bit shorter than R’s. This permits R to visit a se￾quence of nodes with monotonically decreasing distances [d1, d2, . . .] to k, such that the encoding of di+1 as a bi￾nary number has one fewer bit than di . As a result, the expected number of iterations for R to discover the clos￾est node to k is logarithmic in the number of nodes. Figure 2 illustrates the Coral routing algorithm, which successively visits nodes whose distances to the key are approximately halved each iteration. Traditional key￾based routing layers attempt to route directly to the node closest to the key whenever possible [25, 26, 31, 35], re￾sorting to several intermediate hops only when faced with incomplete routing information. By caching additional routing state—beyond the necessary log(n) references— these systems in practice manage to achieve routing in a constant number of hops. We observe that frequent refer￾ences to the same key can generate high levels of traffic in nodes close to the key. This congestion, called tree satu￾ration, was first identified in shared-memory interconnec￾tion networks [24]. 5
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有