正在加载图片...
(e.g, some Solaris versions) to 511 seconds(e.g, most BSDs). (e.g, multiple 30-minute averaged samples with a 100% loss rate his gives us a metric that characterizes when batch applications may have been the result of the same problem) or higher(e.g, a will fail; from our own experience, we believe that most users run- time-averaged loss rate of 50% could have resulted from multiple ng interactive applications have a far lower threshold for declar- link outages of a few minutes each) ing their Internet connectivity"dead. We emphasize that Internet2 paths were never used to improve a The problem with this small time-scale definition is that robustly non-Internet2 connections performance. In fact, the vast measuring it and differentiating between a high packet loss rate and of the problems corrected by ron involved only commerci true disconnections is hard. However, since distributed applications net connections, as is shown in brackets] by the number of c suffer in either case, we assert that it is operationally useful to de- when we remove all Internet2 paths from consideration fine a path outage as the length of time over which the packet loss- rate is larger than some threshold. We define outage Loss Rate Ron Win No Change RON Loss the observed packet loss rate averaged over an interval is larger 526[51758[5147145] than on the path, and O otherwise. For values of T on the order of 4[] M minutes, a measured value of han 30% degrades TCP performance by orders of magnitude, by forcing TCP into fre 23[23] quent timeout-based retransmissions [16] 20[20 15[15 000000 14[14] 10 overlapping points 90% 12[2]0 00000000 100%10[0]0 30% RON loss Table 3: Outage data for RONI. A"RON win"at means that the loss rate of the direct Internet path was y%and the ron loss rate was Numbers in brackets sHow the contribution to the total oltage number after eliminating all he(typically more reliable) Internet2 paths, which reflects the ublic internet better les The numbers and percentage of outages for RON2(Table 4) 30-minute avg RON loss rate are noticeably higher than in RON ing the variability of path reliability in the Internet today. RON2 had 34,000 30-minute samples, about 2.5X more samples than RON igure 9: Packet loss rate averaged over 30-min m出x Loss Rate n Win No Chan for direct Internet paths vs. ron paths for the RO There are 32 points above the=0.3 horizontal l 112 points above =0.5, including overlapping points. RONS loss oj ting router avoided these failures and never 40% 7 experienced a 30-minute loss-rate larger than 30% 106 7 How often is outage(T,)=1, and how often is RON able to route around these situationis? Figure 9 shows a scatterplot of the provement in loss-rate, averaged over T= 1800 s achieved by 62748 0 on for the RONI the 32 points above the =0.3 line parallel to the horizontal axis, which signifies a conditon bad enough to kill most applications There are no points to the right of the 0.3 line parallel to the Table 4: Outage data for RON vertical axis. The scatterplot conceals most of the data in the lower left corner, we will revisit this data in the form of a CdF (Figure 1 in the next section These results show that ron offers substantial improvements during a large fraction of outages, but is not infallible in The precise number of times out age(T, ) was equal to I for the best path at lower outage rates when is between 0 T:= 1800 s is shown in Table 3. These statstics are obtained by calculating 13, 650 30-minute loss-rate averages of a 5l-hour subset However,in RONI, RON,'s outage deteetion and path of the RoNi packet trace, involving 132 different communication machinery was able to successfully route around all the outag uations! This is especially revealing because it suggests that al paths. We count a"RON Win"if the time-averaged loss rate on the outages in RONI were not on"edge"links connecting the site the Internet was> and the loss rate with RON was< %, "No to the Internet, but elsewhere where path diversity allowed RON Change"and"RONoss"are analogously defined. We find that the to provide connectivity. In RON2, about 60% of the serious out number of complete communication outages was 10 in this dataset, age situations were overcome; the remaining 40% were almost all which means that there were 10 instances when the sampled paths due to individual sites being unreachable from any other site in the he num bers in this table are no RON er of link or routing failures observed in the Internet across the sam pled paths, the number of such failures could have been lower 'The one situation where Ron made things worse at a 100%loss(e.g., some Solaris versions) to 511 seconds (e.g., most BSDs). This gives us a metric that characterizes when batch applications will fail; from our own experience, we believe that most users run￾ning interactive applications have a far lower threshold for declar￾ing their Internet connectivity “dead.” The problem with this small time-scale definition is that robustly measuring it and differentiating between a high packet loss rate and true disconnections is hard. However, since distributed applications suffer in either case, we assert that it is operationally useful to de- fine a path outage as the length of time over which the packet loss￾rate is larger than some threshold. We define outage(; p)=1 if the observed packet loss rate averaged over an interval  is larger than p on the path, and 0 otherwise. For values of  on the order of several minutes, a measured value of p larger than 30% degrades TCP performance by orders of magnitude, by forcing TCP into fre￾quent timeout-based retransmissions [16]. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 30−minute avg Internet loss rate 30−minute avg RON loss rate samples 10 overlapping points at (0,1) 30% internet loss line 30% RON loss line x=y Figure 9: Packet loss rate averaged over 30-minute intervals for direct Internet paths vs. RON paths for the RON1 dataset. There are 32 points above the p = 0:3 horizontal line, and 20 points above p = 0:5, including overlapping points. In contrast, RON’s loss optimizing router avoided these failures and never experienced a 30-minute loss-rate larger than 30%. How often is outage(; p)=1, and how often is RON able to route around these situations? Figure 9 shows a scatterplot of the improvement in loss-rate, averaged over  = 1800 s achieved by RON for the RON1 measurements. To identify outages consider the 32 points above the p = 0:3 line parallel to the horizontal axis, which signifies a condition bad enough to kill most applications. There are no points to the right of the p = 0:3 line parallel to the vertical axis. The scatterplot conceals most of the data in the lower￾left corner; we will revisit this data in the form of a CDF (Figure 11) in the next section. The precise number of times outage(; p) was equal to 1 for  = 1800 s is shown in Table 3. These statistics are obtained by calculating 13,650 30-minute loss-rate averages of a 51-hour subset of the RON1 packet trace, involving 132 different communication paths. We count a “RON Win” if the time-averaged loss rate on the Internet was  p% and the loss rate with RON was < p%; “No Change” and “RON Loss” are analogously defined. We find that the number of complete communication outages was 10 in this dataset, which means that there were 10 instances when the sampled paths had a 100% loss rate. The numbers in this table are not the num￾ber of link or routing failures observed in the Internet across the sampled paths; the number of such failures could have been lower (e.g., multiple 30-minute averaged samples with a 100% loss rate may have been the result of the same problem) or higher (e.g., a time-averaged loss rate of 50% could have resulted from multiple link outages of a few minutes each). We emphasize that Internet2 paths were never used to improve a non-Internet2 connection’s performance. In fact, the vast majority of the problems corrected by RON involved only commercial Inter￾net connections, as is shown [in brackets] by the number of outages when we remove all Internet2 paths from consideration. Loss Rate RON Win No Change RON Loss 10% 526 [517] 58 [51] 47 [45] 20% 142 [140] 4 [3] 15 [15] 30% 32 [32] 0 0 40% 23 [23] 0 0 50% 20 [20] 0 0 60% 19 [19] 0 0 70% 15 [15] 0 0 80% 14 [14] 0 0 90% 12 [12] 0 0 100% 10 [10] 0 0 Table 3: Outage data for RON1. A “RON win” at p% means that the loss rate of the direct Internet path was  p% and the RON loss rate was < p%. Numbers in brackets show the contribution to the total outage number after eliminating all the (typically more reliable) Internet2 paths, which reflects the public Internet better. The numbers and percentage of outages for RON2 (Table 4) are noticeably higher than in RON1, showing the variability of path reliability in the Internet today. RON2 had 34,000 30-minute samples, about 2.5X more samples than RON1 . Loss Rate RON Win No Change Ron Loss 10% 557 165 113 20% 168 112 33 30% 131 84 18 40% 110 75 7 50% 106 69 7 60% 100 62 5 70% 93 57 1 80% 87 54 0 90% 85 48 2 100% 67 45 1 Table 4: Outage data for RON2. These results show that RON offers substantial improvements during a large fraction of outages, but is not infallible in picking the best path at lower outage rates when p is between 0 and 20%. However, in RON1, RON’s outage detection and path selection machinery was able to successfully route around all the outage sit￾uations! This is especially revealing because it suggests that all the outages in RON1 were not on “edge” links connecting the site to the Internet, but elsewhere where path diversity allowed RON to provide connectivity. In RON2, about 60% of the serious out￾age situations were overcome; the remaining 40% were almost all due to individual sites being unreachable from any other site in the RON.2 2 The one situation where RON made things worse at a 100% loss
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有