正在加载图片...
→- closest Number of i3 Servers Number of i3 servers Figure 9: The 90th percentile latency stretch in the case of (a)a power-law random network topology with 5000 nodes, and (b)a transit-stub topology with 5000 nodes. The 23 servers are randomly assigned to all nodes in case(a), and only to the stub nodes in case(b). dentifier and (2)the r-1 immediate successors of that fi theory, this allows us to use any of the proposed lookup algorithms ger. This heuristic was originally proposed in [5] that performs exact matching Both insert trigger requests and data packets share a common Closest finger set Each server s chooses log, N fingers as header of 48 bytes. In addition, data packets can carry a stack of successor(s +8), where(i< log, N)and b< 2. To to four triggers(this feature isnt used in the experiments). Trig- route a packet, server s considers only the closest log2 N gers need to be updated every 30 seconds or they will expire. The fingers in terms of network distances among all its log b N control protocol to maintain the overlay network is minimal. Each server performs stabilization every 30 seconds(see [26]).Dur ing every stabilization period all servers generate approximatel Figure 9 plots the 90th percentile latency stretch as a function N log N control messages. Since in our experiments the number of i3's size for the baseline Chord protocol and the two heuris- of servers N is in the order of tens, we neglect the overhead due to tics. The number of replicas r is 10, and b is chosen such that the control protocol The testbed used for all of our experiments was a cluster of Pet siders roughly the same number of routing entries. We vary the tium Il700 MHz machines running Linux. We ran tests on systems number of i3 servers from 20 to 2 6 and in each case we aver of up to 32 nodes, with each node running on its own processor. age routing latencies over 1000 routing queries. In all cases the 23 The nodes communicated over a shared I Gbps Ethernet. For time server identifiers are randomly generated measurements, we use the Pentium timestamp counter (TSC). This As shown in Figure 9, both heuristics can reduce the 90th per method gives very accurate wall clock times, but sometime it in- centile latency stretch up to 2-3 times as compared to the default cludes interrupts and context switches as well. For this reason, Chord protocol. In practice, we choose the"closest finger set high extremes in the data are unreliable heuristic. While this heuristic achieves comparable latency stretch with"closest finger replica", it is easier to implement and does not 5.4 Perfo orman require to increase the routing table size. The only change in the In the section, we present the overhead of the main operations Chord protocol is to sample the identifier space using base b in- performed by 23. Since these results are based on a very prelim stead of 2, and store only the closest log N fingers among the inary implementation, they should be seen as a proof of feasibil- nodes sampled so far. ity and not as a proof of efficiency. Other Chord related perfor 5.3 Implementation and Experiments mance metrics such as the route length and system robustness are presented in 5] We have implemented a bare-bones version of 23 using the Chord Trigger insertion: We consider the overhead of handling an protocol. The control protocol used to maintain the overlay net- insert trigger request locally, as opposed to forwarding a request work is fully asynchronous and is implemented on top of UDP. The to another server. Triggers are maintained in a hash table, so the aplementation uses 256 bit(m= 256)identifiers and assumes time is practically independent of the number of triggers. Inserting that the matching procedure requires exact matching on the 128 a trigger involves just a hash table lookup and a memory alloca- most significant bits(k= 128). This choice makes it very unlikely tion. The average and the standard deviation of the trigger inser- that a packet will erroneously match a trigger, and at the same time tion operation over 10,000 insertions are 12.5 usec, and 7 12 gives applications up to 128 bits to encode application specific in- respectively. This is mostly the time it takes the operating formation such as the host location(see Section 2.4.3) to process the packet and to hand it to the application. by ce For simplicity, in the current implementation we assume that all ison, memory allocation time is just 0. 25 usec on the test machine triggers that share the first 128 bits are stored on the same server. In Note that since each trigger is updated every 30 sec, a server would0 2 4 6 8 10 12 14 x 104 2 4 6 8 10 12 14 16 Number of i3 Servers 90th Percentile Latency Stretch default chord closest finger set closest finger replica (a) 0 2 4 6 8 10 12 14 x 104 2 4 6 8 10 12 14 16 Number of i3 Servers 90th Percentile Latency Stretch default chord closest finger set closest finger replica (b) Figure 9: The 90th percentile latency stretch in the case of (a) a power-law random network topology with 5000 nodes, and (b) a transit-stub topology with 5000 nodes. The ☎ ✆ servers are randomly assigned to all nodes in case (a), and only to the stub nodes in case (b). identifier and (2) the ❈★✪ ◗ immediate successors of that fin￾ger. This heuristic was originally proposed in [5]. ✶ Closest finger set Each server ☎ chooses ✠☞☛✌ ✓ ✍ fingers as ☎✞✄✄✆☛✆☛✂✠☎✠☎✖✌✎❈❂✷✗☎ ✆✁￾✛ ✻ , where ✷ ☎ ❊ ✠☞☛✌ ✓ ✍✻ and ￾ ❊ ▲. To route a packet, server ☎ considers only the closest ✠☞☛✌ ✮ ✍ fingers in terms of network distances among all its ✠☞☛✂✌ ✓ ✍ fingers. Figure 9 plots the 90th percentile latency stretch as a function of ☎✝✆’s size for the baseline Chord protocol and the two heuris￾tics. The number of replicas ❈ is 10, and ￾ is chosen such that ✠☞☛✌ ✓ ✍ ❏ ❈ ￾ ✠☞☛✌ ✮ ✍. Thus, with both heuristics, a server con￾siders roughly the same number of routing entries. We vary the number of ☎✝✆ servers from ▲ ■ ✂ to ▲ ■ ✞ , and in each case we aver￾age routing latencies over 1000 routing queries. In all cases the ☎✝✆ server identifiers are randomly generated. As shown in Figure 9, both heuristics can reduce the 90th per￾centile latency stretch up to ▲★✪ ✆ times as compared to the default Chord protocol. In practice, we choose the “closest finger set” heuristic. While this heuristic achieves comparable latency stretch with “closest finger replica”, it is easier to implement and does not require to increase the routing table size. The only change in the Chord protocol is to sample the identifier space using base ￾ in￾stead of ▲, and store only the closest ✠☞☛✌ ✮ ✍ fingers among the nodes sampled so far. 5.3 Implementation and Experiments We have implemented a bare-bones version of ☎ ✆ using the Chord protocol. The control protocol used to maintain the overlay net￾work is fully asynchronous and is implemented on top of UDP. The implementation uses 256 bit (❇ ❏ ▲✤◆✎❖) identifiers and assumes that the matching procedure requires exact matching on the 128 most significant bits (❉P❏❘◗☛▲✎❙). This choice makes it very unlikely that a packet will erroneously match a trigger, and at the same time gives applications up to 128 bits to encode application specific in￾formation such as the host location (see Section 2.4.3). For simplicity, in the current implementation we assume that all triggers that share the first 128 bits are stored on the same server. In theory, this allows us to use any of the proposed lookup algorithms that performs exact matching. Both insert trigger requests and data packets share a common header of 48 bytes. In addition, data packets can carry a stack of up to four triggers (this feature isn’t used in the experiments). Trig￾gers need to be updated every 30 seconds or they will expire. The control protocol to maintain the overlay network is minimal. Each server performs stabilization every 30 seconds (see [26]). Dur￾ing every stabilization period all servers generate approximately ✍ ✠☛✌✎✍ control messages. Since in our experiments the number of servers ✍ is in the order of tens, we neglect the overhead due to the control protocol. The testbed used for all of our experiments was a cluster of Pen￾tium III 700 MHz machines running Linux. We ran tests on systems of up to 32 nodes, with each node running on its own processor. The nodes communicated over a shared 1 Gbps Ethernet. For time measurements, we use the Pentium timestamp counter (TSC). This method gives very accurate wall clock times, but sometime it in￾cludes interrupts and context switches as well. For this reason, the high extremes in the data are unreliable. 5.4 Performance In the section, we present the overhead of the main operations performed by ☎✝✆. Since these results are based on a very prelim￾inary implementation, they should be seen as a proof of feasibil￾ity and not as a proof of efficiency. Other Chord related perfor￾mance metrics such as the route length and system robustness are presented in [5]. Trigger insertion: We consider the overhead of handling an insert trigger request locally, as opposed to forwarding a request to another server. Triggers are maintained in a hash table, so the time is practically independent of the number of triggers. Inserting a trigger involves just a hash table lookup and a memory alloca￾tion. The average and the standard deviation of the trigger inser￾tion operation over 10,000 insertions are 12.5 ✂sec, and 7.12 ✂sec, respectively. This is mostly the time it takes the operating system to process the packet and to hand it to the application. By compar￾ison, memory allocation time is just 0.25 ✂sec on the test machine. Note that since each trigger is updated every 30 sec, a server would
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有