Congestion Control for high bandwidth-Delay product Networks Dina Katabi- Mark handley Charlie rohrst MIT-LCS ICSI Tellabs dk@mit. edu mjh@icsi. berkeley. edu crhors @mit. edu ABSTRACT General terms Theory and experiments show that as the per-flow product of band- Performance Design. width and latency increases, TCP becomes inefficient and prone to instability, regardless of the queuing scheme. This failing becomes Keywords creasingly important as the Internet evolves to incorporate high-bandwidth optical links and more large-delay satellite links Congestion control, high-speed networks, large bandwidth-delay To address this problem, we develop a novel approach to Inter- net congestion control that outperforms TCP in conventional en- vironments, and remains efficient, fair, scalable, and stable as the 1. introduction andwidth-delay product increases. This new eXplicit Control Pro- tocol, XCP, generalizes the Explicit Congestion Notification pre For the internet to continue to thrive. its col sal (ECN). In addition, XCP introduces the new concept of de- anism must remain effective as the network evolves. Technology trends indicate that the future Internet will have a large more flexible and analytically tractable protocol design and opens num可wys如 new avenues for service differentiation Using a control theory framework, we model XCP and demon- These trends are problematic because TCP reacts adversely to strate it is stable and efficient regardless of the link capacity, creases in bandwidth or delay Mathematical analysis of current congestion control simulations show that XCP outperforms TCP in both conventional algorithms reveals that, regardless of the queuing scheme, as the air bandwidth allocat gh utilization, small standing queu rone to instability. By casting the problem into a control theory ize,and near-zero packet drops, with both steady and highly var framework, Low et al. [23] show that as capacity or delay ing traffic. Additionally, the new protocol does not maintain an Random Early Discard(RED)[13], Random Early Marking (REM per-flow state in routers and requires few CPU cycles per packet 5], Proportional Integral Controller [15], and Virtual Queue [14 all eventually become oscillatory and prone to instability. They further argue that it is unlikely that any Active Queue Management scheme(AQM) can maintain stability over very high-capacity or Categories and subject Descriptors large-delay links. Furthermore, Katabi and Blake 19]show that Adaptive Virtual Queue(AvQ)[22] also becomes pron C.2.5 [Computer Systems Organization]: Computer Communi- bility when the link capacity is large enough(e.g, gigabit links) ation Networks-lLocal and wide-Area netorks Inefficiency is another problem facing TCP in the future Inter net. As the delay-bandwidth product increases, performance de- D Katabi was supported by ARPA Agreement J958100 grades. TCP's additive increase policy limits its ability to acquire contract F30602-00-20553. the and conclusions cont spare bandwidth to one packet per RTT. Since the bandwidth-delay erein are those of the authors and should not be interpret product of a single flow over very-high-bandwidth links may be representing the official policy or endorsements of DARPA many thousands of packets, TCP might waste thousands of RTts ramping up to full utilization following a burst of congestion TC. Rohrs was supported by the office of Naval Research under Further, the increase in link capacity does not improve the trans contract 6872 fer delay of short flows(the majority of the flows in the Internet) Short TCP flows cannot acquire the spare bandwidth faster than slow start" and will waste valuable RTTs ramping up even when 'ermission to make digital or hard copies of all or part of this work for Additionally, since TCP's throughput is inversely proportional to personal or classroom use is granted without fee provided that copies are the rtt, fairness too might become an issue as more flows in the Internet traverse satellite links or wireless WANs [25]. As user bear this notice and the full citation on the first page. To copy otherwise, to with substantially different RTTs compete for the same bottleneck ost on servers or to red Although the full impact of large delay-bandwidth products is yet to come, we can see the seeds of these problems in the cur-
Congestion Control for High Bandwidth-Delay Product Networks Dina Katabi Mark Handley Charlie Rohrs✁ MIT-LCS ICSI Tellabs dk@mit.edu mjh@icsi.berkeley.edu crhors@mit.edu ABSTRACT Theory and experiments show that as the per-flow product of bandwidth and latency increases, TCP becomes inefficient and prone to instability, regardless of the queuing scheme. This failing becomes increasingly important as the Internet evolves to incorporate very high-bandwidth optical links and more large-delay satellite links. To address this problem, we develop a novel approach to Internet congestion control that outperforms TCP in conventional environments, and remains efficient, fair, scalable, and stable as the bandwidth-delay product increases. This new eXplicit Control Protocol, XCP, generalizes the Explicit Congestion Notification proposal (ECN). In addition, XCP introduces the new concept of decoupling utilization control from fairness control. This allows a more flexible and analytically tractable protocol design and opens new avenues for service differentiation. Using a control theory framework, we model XCP and demonstrate it is stable and efficient regardless of the link capacity, the round trip delay, and the number of sources. Extensive packet-level simulations show that XCP outperforms TCP in both conventional and high bandwidth-delay environments. Further, XCP achieves fair bandwidth allocation, high utilization, small standing queue size, and near-zero packet drops, with both steady and highly varying traffic. Additionally, the new protocol does not maintain any per-flow state in routers and requires few CPU cycles per packet, which makes it implementable in high-speed routers. Categories and Subject Descriptors C.2.5 [Computer Systems Organization]: Computer Communication Networks—Local and Wide-Area Networks D. Katabi was supported by ARPA Agreement J958100, under contract F30602-00-20553. the views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policy or endorsements of DARPA or the US government. ✁ C. Rohrs was supported by the office of Naval Research under contract 6872400-ONR. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGCOMM’02, August 19-23, 2002, Pittsburgh, Pennsylvania, USA. Copyright 2002 ACM 1-58113-570-X/02/0008 ...$5.00. General Terms Performance, Design. Keywords Congestion control, high-speed networks, large bandwidth-delay product. 1. INTRODUCTION For the Internet to continue to thrive, its congestion control mechanism must remain effective as the network evolves. Technology trends indicate that the future Internet will have a large number of very high-bandwidth links. Less ubiquitous but still commonplace will be satellite and wireless links with high latency. These trends are problematic because TCP reacts adversely to increases in bandwidth or delay. Mathematical analysis of current congestion control algorithms reveals that, regardless of the queuing scheme, as the delay-bandwidth product increases, TCP becomes oscillatory and prone to instability. By casting the problem into a control theory framework, Low et al. [23] show that as capacity or delay increases, Random Early Discard (RED) [13], Random Early Marking (REM) [5], Proportional Integral Controller [15], and Virtual Queue [14] all eventually become oscillatory and prone to instability. They further argue that it is unlikely that any Active Queue Management scheme (AQM) can maintain stability over very high-capacity or large-delay links. Furthermore, Katabi and Blake [19] show that Adaptive Virtual Queue (AVQ) [22] also becomes prone to instability when the link capacity is large enough (e.g., gigabit links). Inefficiency is another problem facing TCP in the future Internet. As the delay-bandwidth product increases, performance degrades. TCP’s additive increase policy limits its ability to acquire spare bandwidth to one packet per RTT. Since the bandwidth-delay product of a single flow over very-high-bandwidth links may be many thousands of packets, TCP might waste thousands of RTTs ramping up to full utilization following a burst of congestion. Further, the increase in link capacity does not improve the transfer delay of short flows (the majority of the flows in the Internet). Short TCP flows cannot acquire the spare bandwidth faster than “slow start” and will waste valuable RTTs ramping up even when bandwidth is available. Additionally, since TCP’s throughput is inversely proportional to the RTT, fairness too might become an issue as more flows in the Internet traverse satellite links or wireless WANs [25]. As users with substantially different RTTs compete for the same bottleneck capacity, considerable unfairness will result. Although the full impact of large delay-bandwidth products is yet to come, we can see the seeds of these problems in the cur-
ent Internet. For example, TCP over satellite links has revealed present possible deployme etwork utilization issues and TCPs undesirable bias against long RTT Hows [4]. Currently, these problems are mitigated using ad 2. DESIGN RATIONALE hoc mechanisms such as ack spacing, split connection [4], or per- formance enhancing proxies [8] Our initial objective is to step back and rethink interne This paper develops a novel protocol for congestion control that tion control without caring about backward compatibil outperforms TCP in conventional environments, and further remains ployment. If we were to build a new congestion control arc rehired efficient, fair, and stable as the link bandwidth or the round-trip de- om scratch, what might it look like? lay increases. This new eXplicit Control Protocol, XCP, general The first observation is that packet loss is a poor signal of con- izes the Explicit Congestion Notification proposal(ECN)[27]. In- gestion. While we do not believe a cost-effective network can al- ways avoid loss, dropping packets should be a congestion signal of stead of the one bit congestion indication used by ECN, our routers last resort. As an implicit signal, loss is bad because congestion is inform the senders about the degree of congestion at the bottleneck. not the only source of loss, and because a definite decision that a fairness control. To control utilization, the new protocol adjusts packet was lost cannot be made quickly. As a binary signal, loss its aggressiveness according to the spare bandwidth in the network only signals whether there is congestion(a loss)or not(no loss) and the feedback delay. This prevents oscillations, provides stabil Thus senders must probe the network to the point of congestion ty in face of high bandwidth or large delay, and ensures efficient before backing off. Moreover, as the feedback is imprecise, the in- utilization of network resources. To control fairness, the protocol crease policy must be conservative and the decrease policy must be reclaims bandwidth from flows whose rate is above their fair share aggressive and reallocates it to other flows Tight congestion control requires explicit and precise congestion By putting the control state in the packets, XCP needs no per- flow state in routers and can scale to any number of flows. Further, nalling should reflect the degree of congestion. We propose using only a few CPU cycles ecise congestion signalling, where the network explicitly tells the per packet, making it practical even for high-speed routers sender the state of congestion and how to react to it. This allows the Using a control theory framework motivated by previous work senders to decrease their sending windows quickly when the bottle- [22, 15, 23], we show in$ 4 that a fluid model of the protocol is neck is highly congested, while performing small reductions when the sending rate is close to the bottleneck capacity. The resulting stable for any link capacity, feedback delay, or number of sources. protocol is both more responsive and less oscillatory contrast to the various AQM schemes where depend on the capacity, delay, or number of sources, our analysis Second, the aggressiveness of the sources should be adjusted ac- shows how to set the parameters of the new protocol to constant cording to the delay in the feedback-loop. The dynamics of co values that are effective independent of the environment. Our extensive packet-level simulations in$5 show that, regard- delay. A fundamental characteristic of such a system is that it be- less of the queuing scheme, TCP'S performance degrades signifi- destabilizing effect, the system must slow down as the feedback de- lay increases. In the context of congestion control, this means that rotocol achieves high utilization, small queues, and almost no as delay increases, the sources should change their sending rates drops, independent of capacity or delay. Even in conventional en more slowly. This issue has been raised by other researchers 23 ter fairness, higher utilization, and smaller queue size, with almost 26], but the important question is how exactly feedback should de- no packet drops. Further, it maintains good performance in dy pend on delay to establish stability. Using tools from control theory, we conjecture that congestion feedback based on rate-mismatch namic environments with many short web-like flows, and has no should be inversely proportional to delay, and feedback based on bias against long RTT flows. a unique characteristic of the new protocol is its ability to operate with almost zero drops queue-mismatch should be inversely proportional to the square Although we started with the goal of solving TCP's limitations delay. in high-bandwidth large-delay environments, our design has several Robustness to congestion should be independent of unknown and quickly changing parameters, such as the number of flows. A fun- First, decoupling fairness control from utilization control open damental principle from control theory states that a controller must new avenues for service differentiation using schemes that provide react as quickly as the dynamics of the controlled signal; other desired bandwidth apportioning, yet are too aggressive or too weak wise the controller will always lag behind the controlled system and for controlling congestion. In $6, we present a simple scheme that will be ineffective. In the context of current proposals for conges- implements the shadow prices model [21] tion control, the controller is an Active Queue Management scheme Second, the protocol facilitates distinguishing error losses from (AQM). The controlled signal is the aggregate traffic traversing the ongestion losses, which makes it useful for wireless environments. ink. The controller seeks to match input traffic to link capacity In XCP, drops caused by congestion are highly uncommon(e.g However, this objective might be unachievable when the input traf- than one in a million packets in simulations). Further, since fic consists of TCP flows, because the dynamics of a TCPaggregate depend on the number of flows (N). The aggregate rate increases ion drop is likely to be preceded by an explicit feedback that by N packets per RTT, or decreases proportionally to 1/N.Since tells the source to decrease its congestion window. Losses that are the number of flows in the aggregate is not constant and changes preceded and followed by an explicit increase feedback are likely ver time, no AQM controller with constant parameters can be fast error losses enough to operate with an arbitrary number of TCP flows. Thus,a Third, as shown in 7, XCP facilitates the detection of mise- third objective of our system is to make the dynamics of the aggre- ite traffic independent from the number of flows Finally, XCP's performance provides an incentive for both end This leads to the need for decoupling efficiency control (i.e, con- users and network providers to deploy the protocol. In s8 we trol of utilization or congestion)from fairness control.Robustness to congestion requires the behavior of aggregate traffic to be inde
rent Internet. For example, TCP over satellite links has revealed network ✂ utilization issues and TCP’s undesirable bias against long RTT flows [4]. Currently, these problems are mitigated using ad hoc mechanisms such as ack spacing, split connection [4], or performance enhancing proxies [8]. This paper develops a novel protocol for congestion control that outperforms TCP in conventional environments, and further remains efficient, fair, and stable as the link bandwidth or the round-trip delay increases. This new eXplicit Control Protocol, XCP, generalizes the Explicit Congestion Notification proposal (ECN) [27]. Instead of the one bit congestion indication used by ECN, our routers inform the senders about the degree of congestion at the bottleneck. Another new concept is the decoupling of utilization control from fairness control. To control utilization, the new protocol adjusts its aggressiveness according to the spare bandwidth in the network and the feedback delay. This prevents oscillations, provides stability in face of high bandwidth or large delay, and ensures efficient utilization of network resources. To control fairness, the protocol reclaims bandwidth from flows whose rate is above their fair share and reallocates it to other flows. By putting the control state in the packets, XCP needs no per- flow state in routers and can scale to any number of flows. Further, our implementation (Appendix A), requires only a few CPU cycles per packet, making it practical even for high-speed routers. Using a control theory framework motivated by previous work [22, 15, 23], we show in ✄ 4 that a fluid model of the protocol is stable for any link capacity, feedback delay, or number of sources. In contrast to the various AQM schemes where parameter values depend on the capacity, delay, or number of sources, our analysis shows how to set the parameters of the new protocol to constant values that are effective independent of the environment. Our extensive packet-level simulations in ✄ 5 show that, regardless of the queuing scheme, TCP’s performance degrades signifi- cantly as either capacity or delay increases. In contrast, the new protocol achieves high utilization, small queues, and almost no drops, independent of capacity or delay. Even in conventional environments, the simulations show that our protocol exhibits better fairness, higher utilization, and smaller queue size, with almost no packet drops. Further, it maintains good performance in dynamic environments with many short web-like flows, and has no bias against long RTT flows. A unique characteristic of the new protocol is its ability to operate with almost zero drops. Although we started with the goal of solving TCP’s limitations in high-bandwidth large-delay environments, our design has several additional advantages. First, decoupling fairness control from utilization control opens new avenues for service differentiation using schemes that provide desired bandwidth apportioning, yet are too aggressive or too weak for controlling congestion. In ✄ 6, we present a simple scheme that implements the shadow prices model [21]. Second, the protocol facilitates distinguishing error losses from congestion losses, which makes it useful for wireless environments. In XCP, drops caused by congestion are highly uncommon (e.g., less than one in a million packets in simulations). Further, since the protocol uses explicit and precise congestion feedback, a congestion drop is likely to be preceded by an explicit feedback that tells the source to decrease its congestion window. Losses that are preceded and followed by an explicit increase feedback are likely error losses. Third, as shown in ✄ 7, XCP facilitates the detection of misbehaving sources. Finally, XCP’s performance provides an incentive for both end users and network providers to deploy the protocol. In ✄ 8 we present possible deployment paths. 2. DESIGN RATIONALE Our initial objective is to step back and rethink Internet congestion control without caring about backward compatibility or deployment. If we were to build a new congestion control architecture from scratch, what might it look like? The first observation is that packet loss is a poor signal of congestion. While we do not believe a cost-effective network can always avoid loss, dropping packets should be a congestion signal of last resort. As an implicit signal, loss is bad because congestion is not the only source of loss, and because a definite decision that a packet was lost cannot be made quickly. As a binary signal, loss only signals whether there is congestion (a loss) or not (no loss). Thus senders must probe the network to the point of congestion before backing off. Moreover, as the feedback is imprecise, the increase policy must be conservative and the decrease policy must be aggressive. Tight congestion control requires explicit and precise congestion feedback. Congestion is not a binary variable, so congestion signalling should reflect the degree of congestion. We propose using precise congestion signalling, where the network explicitly tells the sender the state of congestion and how to react to it. This allows the senders to decrease their sending windows quickly when the bottleneck is highly congested, while performing small reductions when the sending rate is close to the bottleneck capacity. The resulting protocol is both more responsive and less oscillatory. Second, the aggressiveness of the sources should be adjusted according to the delay in the feedback-loop. The dynamics of congestion control may be abstracted as a control loop with feedback delay. A fundamental characteristic of such a system is that it becomes unstable for some large feedback delay. To counter this destabilizing effect, the system must slow down as the feedback delay increases. In the context of congestion control, this means that as delay increases, the sources should change their sending rates more slowly. This issue has been raised by other researchers [23, 26], but the important question is how exactly feedback should depend on delay to establish stability. Using toolsfrom control theory, we conjecture that congestion feedback based on rate-mismatch should be inversely proportional to delay, and feedback based on queue-mismatch should be inversely proportional to the square of delay. Robustness to congestion should be independent of unknown and quickly changing parameters, such as the number of flows. A fundamental principle from control theory states that a controller must react as quickly as the dynamics of the controlled signal; otherwise the controller will always lag behind the controlled system and will be ineffective. In the context of current proposals for congestion control, the controller is an Active Queue Management scheme (AQM). The controlled signal is the aggregate traffic traversing the link. The controller seeks to match input traffic to link capacity. However, this objective might be unachievable when the input traf- fic consists of TCP flows, because the dynamics of a TCP aggregate depend on the number of flows (☎). The aggregate rate increases by ☎ packets per RTT, or decreases proportionally to ✆✞✝✟☎. Since the number of flows in the aggregate is not constant and changes over time, no AQM controller with constant parameters can be fast enough to operate with an arbitrary number of TCP flows. Thus, a third objective of our system is to make the dynamics of the aggregate traffic independent from the number of flows. This leads to the need for decoupling efficiency control (i.e., control of utilization or congestion) from fairness control. Robustness to congestion requires the behavior of aggregate traffic to be inde-
lled in H_cwnd (set to senders current cwnd) back reaches the receiver. it is returned to the sender in an acknow- sender's rtt estimate) Initialized augment packet, and the sender updates its cwnd accordingly 3.2 The Congestion Header H- feedback(initialized to senders demands) nodified by Each XCP packet carries a congestion header 1), which the routers is used to communicate a flow 's state to routers back from Figure 1: Congestion header. the routers on to the receivers. The field h_cun e sender's current congestion window, whereas H_rtt is the sender's current RTT estimate. These are filled in by the sender and never modified pendent of the number of fiows in it. However, any fair bandwidth The remaining field, H_feedback, takes positive or negative values and is initialized by the sender the bottleneck. Thus, the rule for dividing bandwidth among indi- modify this field to directly control the cor vidual flows in an aggregate should be independent from the control ingestion windows of the law that governs the dynamics of the aggregate Traditionally, efficiency and fairness are coupled since the same 3.3 The XCP sender control law(such as AIMD in TCP)is used to obtain both fair ness and efficiency simultaneously 3, 9, 17, 18, 16]. Conceptually As with TCP. an XCP sender maintains a congestion window of however, efficiency and fairness are independent. Efficiency in the outstanding packets, cwnd, and an estimate of the round trip time rtt. On packet departure, the sender attaches a congestion curr y the aggregate traffic's behavior. When the input traffic header to the packet and sets the H_cwnd field to its current cwnd the link capacity, no queue builds and utilization is op timal. Fairness, on the other hand, involves the relative throughput and H_rtt to its current rtt. In the first packet of a flow, h_rtt of flows sharing a link. a scheme is fair when the flows sharing a set to zero to indicate to the routers that the source does not yet link have the same throughput irrespective of congestion. ave a valid estimate of the rtt. In our new paradigm, a router has both an efficiency controller The sender initializes the H-feedback field to request its de- ired window increase. For example, when the application has a (EC)and a fairness controller(FC). This separation simplifies the desired rate r, the sender sets H-feedback to the desired increase design and analysis of each controller by reducing the requirements in the congestion window (r-- cwnd)divided by the number imposed. It also permits modifying one of the controllers with- of packets in the current congestion window. If bandwidth is avail- out redesigning or re-analyzing the other. Furthermore, it provides able, this initialization allows the sender to reach the desired rate a fiexible framework for integrating differential bandwidth alloca tions. For example, allocating bandwidth to senders according to after one rtt their priorities or the price they pay requires changing only the fair- Whenever a new acknowledgment arrives, positive feedback in- ness controller and does not affect the efficiency or the congestion reases the senders cwnd and negative feedback reduces it: (cwnd+ H-feedback, s) where s is the packet size 3. PROTOCOL n addition to direct feedback, XCP still needs to respond to XCP provides a joint design of end-systems and routers. like osses although they are rare. It does this in a similar manner to TCP, XCP is a window-based congestion control protocol intended TCP for best effort traffic. However, its flexible architecture can easily 3.4 The XCP Receiver support differentiated services as explained in $6. The description of XCP in this section assumes a pure XCP network. In$8, we An XCP receiver is similar to a TCP receiver except that when show that xcp can coexist with tcp in the same internet and be acknowledging a packet, it copies the congestion header from the data packet to its acknowledgment 3.1 Framework 3.5 The XCP router: The Control Laws the network, then in$3.5 we explain feedback compualon ows in First we give an overview of how control information flc The job of an XCP router is to compute the feedback to cause the system to converge to optimal efficiency and min-max fairness Thus, XCP does not drop packets. It operates on top time rttI and communicate these to the routers via a congestion policy such as DropTail, RED, or AvQ. The objective of XCP is header in every packet. Routers monitor the input traffic rate to to prevent, as much as possible, the queue from building up to the each of their output queues. Based on the difference between the link bandwidth and its input traffic rate, the router tells the fows To compute the feedback, an XCP router uses an efficiency con sharing that link to increase or decrease their congestion wind troller and a fairness controller. Both of th ongestion header of data over the average rt t of the flows traversing the link which smooths Feedback is divided between flows based on their cwnd and rtt the burstiness of a window-based control protocol. Estimating pa- values so that the system converges to fairness. A more congested rameters over intervals longer than the average rtt leads to slug- router later in the path can further reduce the feedback in i contain gish se, while estimating parameters over shorter intervals gestion header by overwriting it. Ultimately, the packet wil leads to us estimates. The average RTT is computed using the feedback from the bottleneck along the path. When the feed. the information in the congestion header. XCP controllers make a single control decision every average In this document, the notation RTT refers to the physical round T(the control interval). This is motivated by the need to ob- trip time rtt refers to the variable maintained by the source's serve the results of previous control decisions before attempting a software, and H-rtt refers to a field in the congestion header new control. For example, if the router tells the sources to increase
Figure 1: Congestion header. pendent of the number of flows in it. However, any fair bandwidth allocation intrinsically depends on the number of flows traversing the bottleneck. Thus, the rule for dividing bandwidth among individual flowsin an aggregate should be independent from the control law that governs the dynamics of the aggregate. Traditionally, efficiency and fairness are coupled since the same control law (such as AIMD in TCP) is used to obtain both fairness and efficiency simultaneously [3, 9, 17, 18, 16]. Conceptually, however, efficiency and fairness are independent. Efficiency involves only the aggregate traffic’s behavior. When the input traffic rate equals the link capacity, no queue builds and utilization is optimal. Fairness, on the other hand, involves the relative throughput of flows sharing a link. A scheme is fair when the flows sharing a link have the same throughput irrespective of congestion. In our new paradigm, a router has both an efficiency controller (EC) and a fairness controller (FC). This separation simplifies the design and analysis of each controller by reducing the requirements imposed. It also permits modifying one of the controllers without redesigning or re-analyzing the other. Furthermore, it provides a flexible framework for integrating differential bandwidth allocations. For example, allocating bandwidth to senders according to their priorities or the price they pay requires changing only the fairness controller and does not affect the efficiency or the congestion characteristics. 3. PROTOCOL XCP provides a joint design of end-systems and routers. Like TCP, XCP is a window-based congestion control protocol intended for best effort traffic. However, its flexible architecture can easily support differentiated services as explained in ✄ 6. The description of XCP in this section assumes a pure XCP network. In ✄ 8, we show that XCP can coexist with TCP in the same Internet and be TCP-friendly. 3.1 Framework First we give an overview of how control information flows in the network, then in ✄ 3.5 we explain feedback computation. Senders maintain their congestion window cwnd and round trip time rtt1 and communicate these to the routers via a congestion header in every packet. Routers monitor the input traffic rate to each of their output queues. Based on the difference between the link bandwidth and its input traffic rate, the router tells the flows sharing that link to increase or decrease their congestion windows. It does this by annotating the congestion header of data packets. Feedback is divided between flows based on their cwnd and rtt values so that the system converges to fairness. A more congested router later in the path can further reduce the feedback in the congestion header by overwriting it. Ultimately, the packet will contain the feedback from the bottleneck along the path. When the feed- ✠ In this document, the notation RTT refers to the physical round trip time, rtt refers to the variable maintained by the source’s software, and ✡ ☛✞☞✌☞ refers to a field in the congestion header. back reaches the receiver, it is returned to the sender in an acknowledgment packet, and the sender updates its cwnd accordingly. 3.2 The Congestion Header Each XCP packet carries a congestion header (Figure 1), which is used to communicate a flow’s state to routers and feedback from the routers on to the receivers. The field ✡ ✍✏✎✒✑✔✓ is the sender’s current congestion window, whereas ✡ ☛✕☞✖☞ is the sender’s current RTT estimate. These are filled in by the sender and never modified in transit. The remaining field, ✡ ✗✙✘✞✘✚✓✜✛✟✢✣✍✥✤, takes positive or negative values and is initialized by the sender. Routers along the path modify this field to directly control the congestion windows of the sources. 3.3 The XCP Sender As with TCP, an XCP sender maintains a congestion window of the outstanding packets, cwnd, and an estimate of the round trip time rtt. On packet departure, the sender attaches a congestion header to the packet and sets the ✡ ✍✏✎✒✑✔✓ field to its current cwnd and ✡ ☛✕☞✖☞ to its current rtt. In the first packet of a flow, ✡ ☛✕☞✖☞ is set to zero to indicate to the routers that the source does not yet have a valid estimate of the RTT. The sender initializes the ✡ ✗✙✘✕✘✚✓✜✛✏✢✦✍✟✤ field to request its desired window increase. For example, when the application has a desired rate ☛, the sender sets ✡ ✗✙✘✞✘✚✓✜✛✟✢✣✍✥✤ to the desired increase in the congestion window (☛✣✧ rtt - cwnd) divided by the number of packets in the current congestion window. If bandwidth is available, this initialization allows the sender to reach the desired rate after one RTT. Whenever a new acknowledgment arrives, positive feedback increases the senders cwnd and negative feedback reduces it: ✍✏✎✒✑✔✓✩★✫✪✭✬✞✮✔✯✰✍✟✎✱✑✔✓✳✲✴✡ ✗✙✘✕✘✚✓✜✛✏✢✦✍✟✤✔✵✷✶✕✸✏✵ where ✶ is the packet size. In addition to direct feedback, XCP still needs to respond to losses although they are rare. It does this in a similar manner to TCP. 3.4 The XCP Receiver An XCP receiver is similar to a TCP receiver except that when acknowledging a packet, it copies the congestion header from the data packet to its acknowledgment. 3.5 The XCP Router: The Control Laws The job of an XCP router is to compute the feedback to cause the system to converge to optimal efficiency and min-max fairness. Thus, XCP does not drop packets. It operates on top of a dropping policy such as DropTail, RED, or AVQ. The objective of XCP is to prevent, as much as possible, the queue from building up to the point at which a packet has to be dropped. To compute the feedback, an XCP router uses an efficiency controller and a fairness controller. Both of these compute estimates over the average RTT of the flowstraversing the link, which smooths the burstiness of a window-based control protocol. Estimating parameters over intervals longer than the average RTT leads to sluggish response, while estimating parameters over shorter intervals leads to erroneous estimates. The average RTT is computed using the information in the congestion header. XCP controllers make a single control decision every average RTT (the control interval). This is motivated by the need to observe the results of previous control decisions before attempting a new control. For example, if the router tells the sources to increase
their congestion windows, it should wait to see how much spare deallocation of bandwidth such that the total traffic rate(and conse- bandwidth remains before telling them to increase quently the efficiency )does not change, yet the throughput of each The router maintains a per-link estimation-control timer that is individual fow changes gradually to approach the fows fair share set to the most recent estimate of the average RTt on that link The shuffled trafic is computed as follows Upon timeout the router updates its estimates and its control deci sions. In the remainder of this paper, we refer to the routers current h=max(0,·y estimate of the average RTTas d to emphasize this is the feedback where y is the input traffic in an average rTt and y is a constant set to 0. 1. This equation ensures that, every average RTT, at least 3.5.1 The Eficiency Controller(EC) 10% of the traffic is redistributed according to aImd. the choice of 10% is a tradeoff between the time to converge to fairness and The efficiency controller's purpose is to maximize link the disturbance the shuffling imposes on a system that is around tion while minimizing drop rate and persistent queues. It only at aggregate traffic and need not care about fairness Next, we compute the per-packet feedback that allow such as which flow a packet belongs to to enforce the above policies. Since the increase law is As XCP is window-based, the EC computes a desired increase or whereas the decrease is multiplicative it is convenient to decrease in the number of bytes that the aggregate traffic transmits the feedback assigned to packet i as the combination of a positive in a control interval (i. e an average RTt). This aggregate feedback feedback pi and a negative feedback ni o is computed each control interval H-feedback o=a- S-B. Q First, we compute the case when the aggregate fee a and B are constant parameters, whose values are set based on our stability analysis(3 4)to 0.4 and 0.226, respectively. The term d is all flows by the same amount. Thus, we want the he through p onstant ference between the input traffic rate and link capacity. (Note that (i.e, Throughput i o con stant). Since we are dealing with a can be negative. )Finally,Q is the persistent queue size (i.e, the window-based protocol, we want to compute the change in con- queue that does not drain in a round trip propagation delay ),as gestion window rather than the change in throughput. The change sed to a transient queue that results from the bursty nature of all window-based protocols. We compute Q by taking the minimum ut multiplied by its RTT. Hence, the change in the queue seen by an arriving packet during the last propagation delay, window of flow i should be proportional to the flow's s RTT,(i.e which we estimate by subtracting the local queuing delay from the Acwndi o rtti) average rtt. Equation I makes the feedback proportional to the spare band- window to per-packet feedback that will be reported in the conges- width because, when S>0, the link is underutilized and we want tion header. The total change in congestion window of a flow is the to send positive feedback, while when S0, allocate it so that the increase in throughput ofall flows is the sa where L is the number of packets seen by the router in an average lfo<0, allocate it so that the decrease in throughput of aflow is RTT(the sum is over packets). From this, sp can be derived as proportional to its current throughpu This ensures col us convergence to fairness as long as the s,、h+max gregate feedback is not zero. To prevent convergence stalling hen efficiency is around optimal (o a 0), we introduce the con- Similarly, we compute the per-packet negative feedback given cept of bandwidth shuffling This is the simultaneous allocation and when the aggregate feedback is negative (o 0). In this
their congestion windows, it should wait to see how much spare bandwidth ✹ remains before telling them to increase again. The router maintains a per-link estimation-control timer that is set to the most recent estimate of the average RTT on that link. Upon timeout the router updates its estimates and its control decisions. In the remainder of this paper, we refer to the router’s current estimate of the average RTT as ✓ to emphasize this is the feedback delay. 3.5.1 The Efficiency Controller (EC) The efficiency controller’s purpose is to maximize link utilization while minimizing drop rate and persistent queues. It looks only at aggregate traffic and need not care about fairness issues, such as which flow a packet belongs to. As XCP is window-based, the EC computes a desired increase or decrease in the number of bytes that the aggregate traffic transmits in a control interval (i.e., an average RTT). This aggregate feedback ✺ is computed each control interval: ✺ ★✼✻✽✧✚✓✾✧✕✿✽❀❂❁❃✧✚❄❅✵ (1) ✻ and ❁ are constant parameters, whose values are set based on our stability analysis ( ✄ 4) to ❆✣❇ ❈ and ❆✣❇ ❉❊❉❊❋, respectively. The term ✓ is the average RTT, and ✿ is the spare bandwidth defined as the difference between the input traffic rate and link capacity. (Note that ✿ can be negative.) Finally, ❄ is the persistent queue size (i.e., the queue that does not drain in a round trip propagation delay), as opposed to a transient queue that results from the bursty nature of all window-based protocols. We compute ❄ by taking the minimum queue seen by an arriving packet during the last propagation delay, which we estimate by subtracting the local queuing delay from the average RTT. Equation 1 makes the feedback proportional to the spare bandwidth because, when ✿❍●✼❆, the link is underutilized and we want to send positive feedback, while when ✿❏■❍❆, the link is congested and we want to send negative feedback. However this alone is insufficient because it would mean we give no feedback when the input traffic matches the capacity, and so the queue does not drain. To drain the persistent queue we make the aggregate feedback proportional to the persistent queue too. Finally, since the feedback is in bytes, the spare bandwidth ✿ is multiplied by the average RTT. To achieve efficiency, we allocate the aggregate feedback to single packets as ✡ ✗✙✘✞✘✚✓✜✛✟✢✣✍✥✤. Since the EC deals only with the aggregate behavior, it does not care which packets get the feedback and by how much each individual flow changes its congestion window. All the EC requires is that the total traffic changes by ✺ over this control interval. How exactly we divide the feedback among the packets (and hence the flows) affects only fairness, and so is the job of the fairness controller. 3.5.2 The Fairness Controller (FC) The job of the fairness controller (FC) is to apportion the feedback to individual packets to achieve fairness. The FC relies on the same principle TCP uses to converge to fairness, namely AdditiveIncrease Multiplicative-Decrease (AIMD). Thus, we want to compute the per-packet feedback according to the policy: If ✺✽❑ ❆, allocate it so that the increase in throughput of all flows is the same. If ✺ ■❍❆, allocate it so that the decrease in throughput of a flow is proportional to its current throughput. This ensures continuous convergence to fairness as long as the aggregate feedback ✺ is not zero. To prevent convergence stalling when efficiency is around optimal (✺❂▲ ❆), we introduce the concept of bandwidth shuffling. This is the simultaneous allocation and deallocation of bandwidth such that the total traffic rate (and consequently the efficiency) does not change, yet the throughput of each individual flow changes gradually to approach the flow’s fair share. The shuffled traffic is computed as follows: ▼ ★✼✪✭✬✕✮✔✯◆❆✣✵P❖✭✧✥◗❘❀❍❙ ✺ ❙ ✸✏✵ (2) where ◗ is the input traffic in an average RTT and ❖ is a constant set to 0.1. This equation ensures that, every average RTT, at least 10% of the traffic is redistributed according to AIMD. The choice of 10% is a tradeoff between the time to converge to fairness and the disturbance the shuffling imposes on a system that is around optimal efficiency. Next, we compute the per-packet feedback that allows the FC to enforce the above policies. Since the increase law is additive whereas the decrease is multiplicative, it is convenient to compute the feedback assigned to packet ❚ as the combination of a positive feedback ❯❲❱ and a negative feedback ✑❲❱ . ✡ ✗✙✘✕✘✚✓✜✛✏✢✦✍✟✤❱ ★❳❯❱ ❀❨✑❱ ❇ (3) First, we compute the case when the aggregate feedback is positive (✺❍❑ ❆). In this case, we want to increase the throughput of all flows by the same amount. Thus, we want the change in the throughput of any flow ❚ to be proportional to the same constant, (i.e., ❩❬☞▼ ☛✕❭❊❪❴❫▼❯✙❪❴☞✷❱✳❵❛✍✟❭✕✑❜✶✟☞❝✢✣✑❞☞ ). Since we are dealing with a window-based protocol, we want to compute the change in congestion window rather than the change in throughput. The change in the congestion window of flow ❚ is the change in its throughput multiplied by its RTT. Hence, the change in the congestion window of flow ❚ should be proportional to the flow’s RTT, (i.e., ❩❘✍✏✎✒✑✔✓✣❱❡❵✼☛✞☞✌☞❝❱ ). The next step is to translate this desired change of congestion window to per-packet feedback that will be reported in the congestion header. The total change in congestion window of a flow is the sum of the per-packet feedback it receives. Thus, we obtain the perpacket feedback by dividing the change in congestion window by the expected number of packets from flow ❚ that the router sees in a control interval ✓. This number is proportional to the flow’s congestion window divided by its packet size (both in bytes), ❢✌❣❞❤❊✐✏❥ ❦ ❥ , and inversely proportional to itsround trip time, ☛✕☞✖☞❧❱ . Thus, the perpacket positive feedback is proportional to the square of the flow’s RTT, and inversely proportional to its congestion window divided by its packet size, (i.e., ❯❞❱♠❵ ♥✷♦♣♦♣q❥ ❢❝❣❞❤❊✐ ❥✖r ❦ ❥ ). Thus, positive feedback ❯❲❱ is given by: ❯❞❱❜★ts✟✉ ☛✕☞✖☞✷✈❱ ✧✞✶ ❱ ✍✏✎✒✑✔✓❱ ✵ (4) where s✉ is a constant. The total increase in the aggregate traffic rate is ✇✞①③②⑤④❝⑥✞⑦⑨⑧✜⑩ ❶✷❷ ✐ , where ❸❹✢✦❺♠✯ ✺ ✵❝❆❻✸ ensures that we are computing the positive feedback. This is equal to the sum of the increase in the rates of all flows in the aggregate, which is the sum of the positive feedback a flow has received divided by its RTT, and so: ▼ ✲❏✪✭✬✕✮✔✯ ✺ ✵✷❆❻✸ ✓ ★ ❼❽ ❯❞❱ ☛✞☞✌☞❝❱ ✵ (5) where ❾ is the number of packets seen by the router in an average RTT (the sum is over packets). From this, s✏✉ can be derived as: s✟✉✾★ ▼ ✲❳✪✭✬✕✮✔✯ ✺ ✵✷❆❿✸ ✓➀✧✞➁ ♥➂♦➃♦ ❥✌➄ ❦ ❥ ❢❝❣❞❤✕✐ ❥ ❇ (6) Similarly, we compute the per-packet negative feedback given when the aggregate feedback is negative (✺ ■➅❆). In this
ase. we want the decrease in the ow i to be proportional to its current of regardless of their current rate. Therefore, it is the multiplicative- decrease that helps converging to fairness. In TCP, multiplicative- Athroughput; x throughput). Consequently, th decrease is tied to the occurrence of a drop, which should be a rare the flows congestion window is proportional to its current con- event. In contrast, with XCP multiplicative-decrease is decoupled gestion window (i.e, Acwndi o condi). Again, the desired per- from drops and is performed every average rTT acket feedback is the desired change in the congestion window XCP is fairly robust to estimation errors. For example, we esti- divided by the expected number of packets from this flow that the hate the value of sp every d and use it as a prediction of sp during router sees in an interval d. Thus, we finally find that the per-packet the following control interval (i. e the following d ) If we underes- negative feedback should be proportional to the packet size multi timate Sp, we will fail to allocate all of the positive feedback in the plied by its fows RTT(i.e, ni ortti'si). Thus negative feedbac current control interval. Nonetheless the bandwidth we fail to allo- ni is given by cate will appear in our next estimation of the input traffic as a spare bandwidth, which will be allocated (or partially allocated) in the rtt following control interval. Thus, in every control interval, a po where En is a constant. tion of the spare bandwidth is allocated until none is left. Since our As with the increase case, the total decrease in the aggregate underestimation of Sp causes reduced allocation, the convergence traffic rate is the sum of the decrease in the rates of all flows in the to efficiency is slower than if our prediction of sp had been correct. aggregate Yet the error does not stop XCP from reaching full utilization. Sim ilarly, if we overestimate sp then we will allocate more feedback to flows at the beginning of a control interval and run out of aggregate (8) feedback quickly. This uneven spread of feedback over As so, En can be derived as tion interval does not affect convergence to utilization down convergence to fairness. A similar argument about other estimation errors they mainly affect the d∑s time rather than the correctness of the controllet XCP's parameters( d B) are constant whose values are here the sum is over all packets in a control interval(average independent of the number of sources, the delay, and the capacity of the bottleneck. This is a significant improvement over previ- ous approaches where specific values for the parameters work only 3.5.3 Notes on the efficiencyand Fairness Controllers in specific environments(e.g, RED), or the parameters have to be a This section summarizes the important points about the design chosen differently depending on the number of sources, the capac- the efficiency controller and the fairness controller. ity, and the delay(e.g, AVQ). In $4, we show how these constant As mentioned earlier, the efficiency and fairness controllers are values are chosen decoupled. Specifically, the efficiency controller uses a Finally, implementing the efficiency and fairness Multiplicative-Increase Multiplicative-Decrease law(MIMD), which fairly simple and requires only a few lines of code increases the traffic rate proportionally to the spare bandwidth Appendix A. We note that an XCP router performs the system(instead of increasing by one packet/RTT/flow as TCP additions and 3 multiplications per packet, making it does). This allows XCP to quickl re the positive spare band- choice even as a backbone router width even over high capacity links. The fairness controller, on the other hand, uses an Additive-Increase Multiplicative-Decrease law 4. STABILITY ANALYSIS (AIMD), which converges to fairness [10]. Thus, the decoupling allows each controller to use a suitable control law We use a fluid model of the traffic to analyze the stability of The particular control laws used by the efficiency controller XCP. Our analysis considers a single link traversed by multiple (MIMD) and the fairness controller(AIMD) are not the only poss XCP flows. For the sake of simplicity and tractability, similarly ble choices. For example, in [20] we describe a fairness controller to previous work [22, 15, 23, 24], our analysis assumes all fiows have a common, finite, and positive round trip delay, and neglects that uses a binomial law similar to those described in[6]. We chose boundary conditions (i.e, queues are bounded, rates cannot be neg- the control laws above because our analysis and simulation demon- ative). Later, we demonstrate through extensive simulations that rmance even with larger topologies, different RTTs, and boundary condi- We note that the efficiency controller satisfies the requirements tions, our results still hold in $2. The dynamics of the aggregate traffic are specified by the The main result can be stated as follows aggregate feedback and stay independent of the number of flows traversing the link. Additionally, in contrast to TCP where the in- THEOREM 1. Suppose the round trip delay is d Ithe parame crease/decrease rules are indifferent to the degree of congestion in rs a and B satisfy the network, the aggregate feedback sent by the EC is proportional to the degree of under- or over-utilization. Furthermore since the aggregate feedback is given over an average rtt, XCP becomes less aggressive as the round trip delay increases then the system is stable independently of delay, capaciry, and num Although the fairness controller uses AIMD, it converges to fair- ber of sources. ness faster than TCP. Note that in AIMD, all fiows increase equally 3There is one type of error that may preven complet hich is the unbalanced a to fully grasp from Equation 1. We refer the reader all of the shuffied traffic but fa vent us from reachi mismatch Is inversely proportional to the square of delay on queue- lization. Yet note that the shuffled traffic is only 10% of the traffic. Furthermore, shufil ing exists only when |ol<.1,input
case, we want the decrease in the throughput of flow ❚ to be proportional to its current throughput (i.e., ❩throughput❱➆❵ throughput❱ ). Consequently, the desired change in the flow’s congestion window is proportional to its current congestion window (i.e., ❩cwnd❱➇❵ cwnd❱ ). Again, the desired perpacket feedback is the desired change in the congestion window divided by the expected number of packets from this flow that the router sees in an interval ✓. Thus, we finally find that the per-packet negative feedback should be proportional to the packet size multiplied by its flow’s RTT (i.e., ✑♠❱❡❵✼☛✞☞✌☞❝❱✚✧➃✶✚❱ ). Thus negative feedback ✑❱ is given by: ✑③❱♠★✼s❤ ✧✚☛✞☞✌☞P❱❞✧✕✶✚❱ (7) where s ❤ is a constant. As with the increase case, the total decrease in the aggregate traffic rate is the sum of the decrease in the rates of all flows in the aggregate: ▼ ✲❳✪✭✬✕✮✔✯❝❀ ✺ ✵✷❆❿✸ ✓ ★ ❼❽ ✑③❱ ☛✕☞✖☞ ❱ ❇ (8) As so, s ❤ can be derived as: s ❤ ★ ▼ ✲❏✪✭✬✕✮✔✯❝❀ ✺ ✵P❆❿✸ ✓✾✧✕➁➈✶✞❱ ✵ (9) where the sum is over all packets in a control interval (average RTT). 3.5.3 Notes on the Efficiency and FairnessControllers This section summarizes the important points about the design of the efficiency controller and the fairness controller. As mentioned earlier, the efficiency and fairness controllers are decoupled. Specifically, the efficiency controller uses a Multiplicative-Increase Multiplicative-Decrease law (MIMD), which increases the traffic rate proportionally to the spare bandwidth in the system (instead of increasing by one packet/RTT/flow as TCP does). This allows XCP to quickly acquire the positive spare bandwidth even over high capacity links. The fairness controller, on the other hand, uses an Additive-Increase Multiplicative-Decrease law (AIMD), which converges to fairness [10]. Thus, the decoupling allows each controller to use a suitable control law. The particular control laws used by the efficiency controller (MIMD) and the fairness controller (AIMD) are not the only possible choices. For example, in [20] we describe a fairness controller that uses a binomial law similar to those described in [6]. We chose the control laws above because our analysis and simulation demonstrate their good performance. We note that the efficiency controller satisfies the requirements in ✄ 2. The dynamics of the aggregate traffic are specified by the aggregate feedback and stay independent of the number of flows traversing the link. Additionally, in contrast to TCP where the increase/decrease rules are indifferent to the degree of congestion in the network, the aggregate feedback sent by the EC is proportional to the degree of under- or over-utilization. Furthermore, since the aggregate feedback is given over an average RTT, XCP becomes less aggressive as the round trip delay increases.2 Although the fairness controller uses AIMD, it converges to fairness faster than TCP. Note that in AIMD, all flows increase equally ✈ The relation between XCP’s dynamics and feedback delay is hard to fully grasp from Equation 1. We refer the reader to Equation 16, which shows that the change in throughput based on rate-mismatch is inversely proportional to delay, and the change based on queuemismatch is inversely proportional to the square of delay. regardless of their current rate. Therefore, it is the multiplicativedecrease that helps converging to fairness. In TCP, multiplicativedecrease is tied to the occurrence of a drop, which should be a rare event. In contrast, with XCP multiplicative-decrease is decoupled from drops and is performed every average RTT. XCP is fairly robust to estimation errors. For example, we estimate the value of s✉ every ✓ and use it as a prediction of s✉ during the following control interval (i.e., the following ✓). If we underestimate s✉ , we will fail to allocate all of the positive feedback in the current control interval. Nonetheless, the bandwidth we fail to allocate will appear in our next estimation of the input traffic as a spare bandwidth, which will be allocated (or partially allocated) in the following control interval. Thus, in every control interval, a portion of the spare bandwidth is allocated until none is left. Since our underestimation of s➂✉ causes reduced allocation, the convergence to efficiency is slower than if our prediction of s✉ had been correct. Yet the error does not stop XCP from reaching full utilization. Similarly, if we overestimate s✉ then we will allocate more feedback to flows at the beginning of a control interval and run out of aggregate feedback quickly. This uneven spread of feedback over the allocation interval does not affect convergence to utilization but it slows down convergence to fairness. A similar argument can be made about other estimation errors; they mainly affect the convergence time rather than the correctness of the controllers.3 XCP’s parameters (i.e., ✻ and ❁) are constant whose values are independent of the number of sources, the delay, and the capacity of the bottleneck. This is a significant improvement over previous approaches where specific values for the parameters work only in specific environments (e.g, RED), or the parameters have to be chosen differently depending on the number of sources, the capacity, and the delay (e.g., AVQ). In ✄ 4, we show how these constant values are chosen. Finally, implementing the efficiency and fairness controllers is fairly simple and requires only a few lines of code as shown in Appendix A. We note that an XCP router performs only a few additions and 3 multiplications per packet, making it an attractive choice even as a backbone router. 4. STABILITY ANALYSIS We use a fluid model of the traffic to analyze the stability of XCP. Our analysis considers a single link traversed by multiple XCP flows. For the sake of simplicity and tractability, similarly to previous work [22, 15, 23, 24], our analysis assumes all flows have a common, finite, and positive round trip delay, and neglects boundary conditions (i.e., queues are bounded, rates cannot be negative). Later, we demonstrate through extensive simulations that even with larger topologies, different RTTs, and boundary conditions, our results still hold. The main result can be stated as follows. THEOREM 1. Suppose the round trip delay is ✓. If the parameters ✻ and ❁ satisfy: ❆✩■➉✻❏■ ➊ ❈✣➋❉ and ❁➌★✫✻✈ ➋❉✣✵ then the system is stable independently of delay, capacity, and number of sources. ➍ There is one type of error that may prevent the convergence to complete efficiency, which is the unbalanced allocation and deallocation of the shuffled traffic. For example, if by the end of a control interval we deallocate all of the shuffled traffic but fail to allocate it, then the shuffling might prevent us from reaching full link utilization. Yet note that the shuffled traffic is only 10% of the input traffic. Furthermore, shuffling exists only when ❙ ✺ ❙✣■❍❆✣❇➎✆✟◗
Bottleneck ●R1,R2Rn Trattic Figure 2: A single bottleneck topolog Bottleneck 碧88m史型 Figure 3: A parking lot topology Bottleneck Capacity(Mb/s) The details of the proof are given in Appendix B. The idea un- derlying the stability proof is the following. Given the assumptions above, our system is a linear feedback system with delay. The sta bility of such systems may be studied by plotting their open-loop transfer function in a Nyquist plot. We prove that by choosing a and B as stated above, the system satisfies the Nyquist stability cri- 05001000150020002500 terion. Further, the gain margin is greater than one and the phas margin is positive independently of delay, capacity, and number of Figure 4: XCP significantly outperforms TCP in high band- width environments. The graphs compare the efficiency of XCP with that of TCP over reD, CsfQ, reM, and AvQ as a fune- 5. PERFORMANCE In this section, we demonstrate through extensive simulations that XCP outperforms TCP both in conventional and high bandwidth- delay environments Random Early Marking (R experiments set REM Our simulations also show that XCP has the unique characteristic of almost never dropping packets their code. In particular, p 0.001, the update interval We also demonstrate that by complying with the conditions is set to the transmission time of 10 packets, and ref is set to one Theorem 1, we can choose constant values for a and B that work third of the buffer size with any capacity and delay, as well as any number of sources. Our Adaptive Virtual Queue(AvQ [22]). As recommended by the simulations cover capacities in[1.5 Mb/s, 4 Gb/s], propagation de- authors, our experiments use y = 0.98 and compute a based on lays in [10 ms, 1. 4 sec], and number of sources in [1, 1000]. Fur- the equation in[22]. Yet, as shown in [19], the equation for setting ther,we simulate 2-way traffic(with the resulting ack compression) a does not admit a solution for high capacities. In these cases, and dynamic environments with arrivals and departures of short use a=0. 15 as used in [22] web-like flows. In all of these simulations we set a =0.4 and Core Stateless Fair Queuing(CSFQ [28 ). In contrast to the 0.226 showing the robustness of our results above AQMs, whose goal is to achieve high utilization and small Additionally, the simulations show that in contrast to TCP, the new protocol dampens oscillations and smoothly converges to high queue size, CSFQ aims for providing high fairness in a network cloud with no per-fiow state in core routers. We compare CSFQ utilization, small queue size, and fair bandwidth allocation. We with XCP to show that XCP can be used within the CSFQ frame- also demonstrate that the protocol is robust to highly varying traffic work to improve its fairness and efficiency. Again, the parameters demands and high variance in flows round trip times are set to the values chosen by the authors in their ns implementa- 5.1 Simulation Setup tion Our simulations use the packet-level simulator ns-2 11 The simulator code for these AQM schemes is provided by their authors. Further, to allow these schemes to exhibit their best per- e have extended with an XCP module. We compare XCP with formance, we simulate them with eCN enabled TCP Reno the following queuing disciplines: In all of our simulations, the xCP parameters are set to a=0.4 Random Early Discard(RED [131). Our experiments use the and B=0. 226. We experimented with XCP with both Drop-Tail gentle"mode and set the parameters according to the authors'rec- d red dropping policies. There was no difference between the ommendations in [2]. The minimum and the maximum threshold wo cases because XCP almost never dropped packets are set to one third and two thirds the buffer size, respectively Most of our simulations use the topology in Figure 2. The bot Th Is Dh se gnargin is the frequency at which the according to the objective of the experiment. The buffer size is ai magnitude of the transfer function becomes 1. They are used to the de 1000 bytes. Simulations over the topology in Figure 3 are used to Thecodeisavailableatwww.ana.lcs.mit.edu/dina/xcp show that our results generalize to larger and more complex topolo-
Figure 2: A single bottleneck topology. Figure 3: A parking lot topology. The details of the proof are given in Appendix B. The idea underlying the stability proof is the following. Given the assumptions above, our system is a linear feedback system with delay. The stability of such systems may be studied by plotting their open-loop transfer function in a Nyquist plot. We prove that by choosing ✻ and ❁ as stated above, the system satisfies the Nyquist stability criterion. Further, the gain margin is greater than one and the phase margin is positive independently of delay, capacity, and number of sources.4 5. PERFORMANCE In this section, we demonstrate through extensive simulations that XCP outperforms TCP both in conventional and high bandwidthdelay environments. Our simulations also show that XCP has the unique characteristic of almost never dropping packets. We also demonstrate that by complying with the conditions in Theorem 1, we can choose constant values for ✻ and ❁ that work with any capacity and delay, as well as any number of sources. Our simulations cover capacities in [1.5 Mb/s, 4 Gb/s], propagation delays in [10 ms, 1.4 sec], and number of sources in [1, 1000]. Further, we simulate 2-way traffic (with the resulting ack compression) and dynamic environments with arrivals and departures of short web-like flows. In all of these simulations, we set ✻➏★➐❆✦❇ ❈ and ❁➌★➑❆✣❇ ❉❿❉❊❋ showing the robustness of our results. Additionally, the simulations show that in contrast to TCP, the new protocol dampens oscillations and smoothly converges to high utilization, small queue size, and fair bandwidth allocation. We also demonstrate that the protocol is robust to highly varying traffic demands and high variance in flows’ round trip times. 5.1 Simulation Setup Our simulations use the packet-level simulator ns-2 [1], which we have extended with an XCP module.5 We compare XCP with TCP Reno over the following queuing disciplines: Random Early Discard (RED [13]). Our experiments use the “gentle” mode and set the parameters according to the authors’ recommendations in [2]. The minimum and the maximum thresholds are set to one third and two thirds the buffer size, respectively. ➒ The gain margin is the magnitude of the transfer function at the frequency ❀ ➊ . The phase margin is the frequency at which the magnitude of the transfer function becomes 1. They are used to prove robust stability. ➓ The code is available at www.ana.lcs.mit.edu/dina/XCP. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 3000 3500 4000 Bottleneck Utilization ➔ Bottleneck Capacity (Mb/s) XCP RED CSFQ REM AVQ 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 3500 4000 Average Bottleneck Queue (packets) Bottleneck Capacity (Mb/s) XCP RED CSFQ REM AVQ 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 0 500 1000 1500 2000 2500 3000 3500 4000 Bottleneck Drops (packets) Bottleneck Capacity (Mb/s) XCP RED CSFQ REM AVQ Figure 4: XCP significantly outperforms TCP in high bandwidth environments. The graphs compare the efficiency of XCP with that of TCP over RED, CSFQ, REM, and AVQ as a function of capacity. Random Early Marking (REM [5]). Our experiments set REM parameters according to the authors’ recommendation provided with their code. In particular, ✺ ★→✆❿❇ ❆❊❆✦✆ , ❖❃★✫❆✣❇ ❆❊❆✦✆ , the update interval is set to the transmission time of 10 packets, and qref is set to one third of the buffer size. Adaptive Virtual Queue (AVQ [22]). As recommended by the authors, our experiments use ❖✫★➣❆✦❇ ↔❿↕ and compute ✻ based on the equation in [22]. Yet, as shown in [19], the equation for setting ✻ does not admit a solution for high capacities. In these cases, we use ✻❨★✫❆✣❇➎✆✚➙ as used in [22]. Core Stateless Fair Queuing (CSFQ [28]). In contrast to the above AQMs, whose goal is to achieve high utilization and small queue size, CSFQ aims for providing high fairness in a network cloud with no per-flow state in core routers. We compare CSFQ with XCP to show that XCP can be used within the CSFQ framework to improve its fairness and efficiency. Again, the parameters are set to the values chosen by the authors in their ns implementation. The simulator code for these AQM schemes is provided by their authors. Further, to allow these schemes to exhibit their best performance, we simulate them with ECN enabled. In all of our simulations, the XCP parameters are set to ✻❂★➑❆✦❇ ❈ and ❁❍★➛❆✣❇ ❉❿❉❊❋. We experimented with XCP with both Drop-Tail and RED dropping policies. There was no difference between the two cases because XCP almost never dropped packets. Most of our simulations use the topology in Figure 2. The bottleneck capacity, the round trip delay, and the number of flows vary according to the objective of the experiment. The buffer size is always set to the delay-bandwidth product. The data packet size is 1000 bytes. Simulations over the topology in Figure 3 are used to show that our results generalize to larger and more complex topolo-
Round-Tip Propagation Delay (sec) 00200300400 Round-Trip Propagation Delay (sec) (b)Number of FTP 60000 。 Figure 5: XCP significantly outperforms TCP in high delay en- XCP is efficient with any number of flows. The vironments. The graphs compare bottleneck utilization, aver- compare the efficiency of XCP and TCP with various age queue, an d number of drops as round trip delay y Increases schemes as a function of the number of flows when flows are XCPs and when they are TCPs over rED, CSFQ, REM, and AvQ. of congestion control. All other parameters have the same values used in the previous experiment gies. When unspecified, the reader should assume that the simula Figure 5 shows that as the propagation delay increases, TCP's tion topology is that in Figure 2, the flows RTTs are equivalent, and utilization degrades considerably regardless of the queuing scheme the sources are long-lived FTP flows. Simulations'running times In contrast, XCP maintains high utilization independently of delay. vary depending on the propagation delay but are always larger than The adverse impact of large delay on TCP's performance has 300 RTTs. All simulations were run long enough to ensure the been noted over satellite links. The bursty nature of TCP has been system has reached a consistent behavior. uggested as a potential explanation and packet pacing has been ent s hows that 5.2 Comparison with TCP and AQMSchemes burstiness is a minor factor. In particular. xcP is a bursty window- based protocol but it copes with delay much better than TCP. It does (with the resulting increase of per-flow bandwidth)will cause a so by adjusting its aggressiveness according to round trip delay significant degradation in TCP's performance, irrespective of the Impact of Number of Flows: We fix the bottleneck capacity at queuing scheme 150 Mb/s and round trip propagation delay at 80 ms and repeat the In this experiment, 50 long-lived FTP fows share a bottleneck. same experiment with a varying number of FTP sources. Other The round trip propagation delay is 80 ms. Additionally, there are parameters have the same values used in the previous experiment. 50 flows traversing the reverse path and used merely to create a Figure 6 shows that overall, XCP exhibits good utilization, rea- 2-way traffic environment with the potential for ack compression. sonable queue size, and no packet losses. The increase in XCP Since XCp is based on a fluid model and estimates some parame ueue as the number of flows increases is a side effect of its higl ters, the existence of reverse traffic, with the resulting burstiness airness(see Figure 8). When the number of flows is larger thai tends to stress the protocol 500, the fair congestion window is between two and three pack Figure 4 demonstrates that as capacity increases, TCP's bottle- ets. In particular, the fair congestion window is a real number but neck utilization decreases significantly. This happens regardless of the effective(i.e, used) congestion window is an integer number of the queuing scheme. packets. Thus, as the fair window size decreases, the effect of the In contrast, XCP's utilization is always near optimal independent rounding error increases causing a disturbance. Consequently, the of the link capacity. Furthermore, XCP never drops any packet, queue increases to absorb this disturbance whereas TCP drops thousands of packets despite its use of ECN. Impact of Short Web-Like Traffic: Since a large number offlows in the Internet are short web-like flows, it is important to investi- ing delay does not increase because the larger capacity causes the to dr gate the impact of such dynamic fows on congestion control. In this experiment, we have 50 long-lived FTP flows traversing the Impact of Feedback Delay: We fix the bottleneck capacity at 150 bottleneck link. Also, there are 50 flows traversing the reverse Mb/s and study the impact of increased delay on the performance path whose presence emulates a 2-way traffic environment with
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Bottleneck Utilization ➔ Round-Trip Propagation Delay (sec) XCP RED CSFQ REM AVQ 0 1000 2000 3000 4000 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Average Bottleneck Queue (packets) Round-Trip Propagation Delay (sec) XCP RED CSFQ REM AVQ 0 10000 20000 30000 40000 50000 60000 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Bottleneck Drops (packets) Round-Trip Propagation Delay (sec) XCP RED CSFQ REM AVQ Figure 5: XCP significantly outperforms TCP in high delay environments. The graphs compare bottleneck utilization, average queue, and number of drops as round trip delay increases when flows are XCPs and when they are TCPs over RED, CSFQ, REM, and AVQ. gies. When unspecified, the reader should assume that the simulation topology is that in Figure 2, the flows RTTs are equivalent, and the sources are long-lived FTP flows. Simulations’ running times vary depending on the propagation delay but are always larger than 300 RTTs. All simulations were run long enough to ensure the system has reached a consistent behavior. 5.2 Comparison with TCP and AQM Schemes Impact of Capacity: We show that an increase in link capacity (with the resulting increase of per-flow bandwidth) will cause a significant degradation in TCP’s performance, irrespective of the queuing scheme. In this experiment, 50 long-lived FTP flows share a bottleneck. The round trip propagation delay is 80 ms. Additionally, there are 50 flows traversing the reverse path and used merely to create a 2-way traffic environment with the potential for ack compression. Since XCP is based on a fluid model and estimates some parameters, the existence of reverse traffic, with the resulting burstiness, tends to stress the protocol. Figure 4 demonstrates that as capacity increases, TCP’s bottleneck utilization decreases significantly. This happens regardless of the queuing scheme. In contrast, XCP’s utilization is always near optimal independent of the link capacity. Furthermore, XCP never drops any packet, whereas TCP drops thousands of packets despite its use of ECN. Although the XCP queue increases with the capacity, the queuing delay does not increase because the larger capacity causes the queue to drain faster. Impact of Feedback Delay: We fix the bottleneck capacity at 150 Mb/s and study the impact of increased delay on the performance 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 600 700 800 900 1000 Bottleneck Utilization ➔ (a) Number of FTP Flows XCP RED CSFQ REM AVQ 0 500 1000 1500 0 100 200 300 400 500 600 700 800 900 1000 Average Bottleneck Queue (packets) (b) Number of FTP Flows XCP RED CSFQ REM AVQ 0 10000 20000 30000 40000 50000 60000 70000 0 100 200 300 400 500 600 700 800 900 1000 Bottleneck Drops (packets) (c) Number of FTP Flows XCP RED CSFQ REM AVQ Figure 6: XCP is efficient with any number of flows. The graphs compare the efficiency of XCP and TCP with various queuing schemes as a function of the number of flows. of congestion control. All other parameters have the same values used in the previous experiment. Figure 5 shows that as the propagation delay increases, TCP’s utilization degrades considerably regardless of the queuing scheme. In contrast, XCP maintains high utilization independently of delay. The adverse impact of large delay on TCP’s performance has been noted over satellite links. The bursty nature of TCP has been suggested as a potential explanation and packet pacing has been proposed as a solution [4]; however, this experiment shows that burstiness is a minor factor. In particular, XCP is a bursty windowbased protocol but it copes with delay much better than TCP. It does so by adjusting its aggressiveness according to round trip delay. Impact of Number of Flows: We fix the bottleneck capacity at 150 Mb/s and round trip propagation delay at 80 ms and repeat the same experiment with a varying number of FTP sources. Other parameters have the same values used in the previous experiment. Figure 6 shows that overall, XCP exhibits good utilization, reasonable queue size, and no packet losses. The increase in XCP queue as the number of flows increases is a side effect of its high fairness (see Figure 8). When the number of flows is larger than 500, the fair congestion window is between two and three packets. In particular, the fair congestion window is a real number but the effective (i.e., used) congestion window is an integer number of packets. Thus, as the fair window size decreases, the effect of the rounding error increases causing a disturbance. Consequently, the queue increases to absorb this disturbance. Impact of Short Web-Like Traffic: Since a large number of flows in the Internet are short web-like flows, it is important to investigate the impact of such dynamic flows on congestion control. In this experiment, we have 50 long-lived FTP flows traversing the bottleneck link. Also, there are 50 flows traversing the reverse path whose presence emulates a 2-way traffic environment with
(a)Mice Arival Rate(new mice lsec (a)EquaHRTT Flow ID …-“ 800 ( )Mice Amal Rate(new mice / sec) 38E3 Figure 7: XCP is robust and efficient in environments with a Figure 8: XCP is fair to both equal and different RTT flows. rivals and departures of short web-like flows. The graphs com The graphs compare XCP's Fairness to that of TCP over rED pare the efficiency of XCP to that of TCP over various queuing CSFQ, REM, and AvQ. Graph(b)also shows XCPs robustness schemes as a function of the arrival rate of web-like flows to environments with different rtts the resulting ack compression. The bottleneck bandwidth is 150 s, although XCP computes an estimate of th Mb/s and the round trip propagation delay is 80 m ystem, it still operates correctly in environme arrive according to a Poisson process. Their transfer ws have substantially different RTTs. For further infor rived from a pareto distribution with an average of 30 point see Appendix C implementation with shape-=1.35), which complies with real A More Complex Topology: This experiment uses the 9-link topo Figure 7 graphs bottleneck utilization, i ogy in Figure 3, although results are very similar for topologies average queue s ze and total number of drops, all as functions of the arrival rate of short whereas the others are 100 Mb/s links. Ail links have 20 ms one- fiows. The figure demonstrates XCP's robustness in dynamic en- way propagation delay. Fifty flows, represented by the solid arrow vironments with a large number of flow arrivals and departures. traverse all links in the forward direction. Fifty cross flows, illus- XCP continues to achieve high utilization, small queue size and trated by the small dashed arrows, traverse each individual link in zero drops even as the arrival rate of short flows becomes signif the forward direction. 50 ows also traverse all links along the antly high. At arrival rates higher than 800 flows/s(more than 10 new flows every RTT, XCP starts dropping packets. This be- Figure 9 illustrates the average utilization, queue size, and num- havior is not caused by the environment being highly dynamic. It ber of drops at every link. In general, all schemes maintain a rea- happens because at such high arrival rates the number of simulta- sonably high utilization at all links(note the y-scale). However, eously active flows is a few thousands. Thus, there is no space in the trade off between optimal utilization and small queue size is the pipe to maintain a minimum of one packet from each flow and handled differently in XCP from the various AQM schemes. XCP drops become inevitable. In this case, XCP's behavior approaches trades a few percent of utilization for a considerably smaller queue the under-lying dropping policy, which is RED for Figure 7 Fairness This experiment shows that XCP is significantly fairer ous ones is due to disturbance introduced by shuffling. In particu- than TCP, regardless of the queuing scheme. We have 30 long- lar, at links 1, 2, 3, and 4 (i.e, the set of links preceding the lowest ved FTP flows sharing a single 30 Mb/s bottleneck. We conduct apacity link along the path), the fairness controller tries to shuffl two sets of simulations. In the first set, all flows have a common bandwidth from the cross flows to the long-distance flows, which round-trip propagation delay of 40 ms. In the second set of simu- have lower throughput. Yet, these long-distance flows are throttled lations, the flows have different RTTs in the range [4 downstream at link 5, and so cannot benefit from this positive feed (RTTi+1=RTT+10ms) back. This effect is mitigated at links downstream from link 5 be- Figures 8-a and 8-b demonstrate that, in contrast to other ap- cause they can observe the upstream throttling and correspondingly proaches, XCP provides a fair bandwidth allocation and does not reduce the amount of negative feedback given(see implementation have any bias against long RTT flows. Furthermore, Figure &-b in Appendix A). In any event, as the total shuffled bandwidth is less demonstrates XCP robusmess to high variance in the RtT distri than 10%, the utilization is always higher than 90%
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 Utilization (a) Mice Arrival Rate (new mice /sec) XCP RED CSFQ REM AVQ 0 500 1000 1500 2000 0 200 400 600 800 1000 Average Queue (packets) ➜ (b) Mice Arrival Rate (new mice /sec) XCP-R RED CSFQ REM AVQ 0 40000 80000 120000 160000 0 200 400 600 800 1000 Packet Drops ➝ (c) Mice Arrival Rate (new mice /sec) XCP-R RED CSFQ REM AVQ Figure 7: XCP is robust and efficient in environments with arrivals and departures of short web-like flows. The graphs compare the efficiency of XCP to that of TCP over various queuing schemes as a function of the arrival rate of web-like flows. the resulting ack compression. The bottleneck bandwidth is 150 Mb/s and the round trip propagation delay is 80 ms. Short flows arrive according to a Poisson process. Their transfer size is derived from a Pareto distribution with an average of 30 packets (nsimplementation with shape ★➞✆❊❇ ➟❿➙), which complies with real web traffic [11]. Figure 7 graphs bottleneck utilization, average queue size, and total number of drops, all as functions of the arrival rate of short flows. The figure demonstrates XCP’s robustness in dynamic environments with a large number of flow arrivals and departures. XCP continues to achieve high utilization, small queue size and zero drops even as the arrival rate of short flows becomes significantly high. At arrival rates higher than 800 flows/s (more than 10 new flows every RTT), XCP starts dropping packets. This behavior is not caused by the environment being highly dynamic. It happens because at such high arrival rates the number of simultaneously active flows is a few thousands. Thus, there is no space in the pipe to maintain a minimum of one packet from each flow and drops become inevitable. In this case, XCP’s behavior approaches the under-lying dropping policy, which is RED for Figure 7. Fairness : This experiment shows that XCP is significantly fairer than TCP, regardless of the queuing scheme. We have 30 longlived FTP flows sharing a single 30 Mb/s bottleneck. We conduct two sets of simulations. In the first set, all flows have a common round-trip propagation delay of 40 ms. In the second set of simulations, the flows have different RTTs in the range [40 ms, 330 ms] (➠✒➡➢➡❱① ✠ ★➑➠✱➡✱➡❱ ✲✼✆✚❆❊❸❃✶). Figures 8-a and 8-b demonstrate that, in contrast to other approaches, XCP provides a fair bandwidth allocation and does not have any bias against long RTT flows. Furthermore, Figure 8-b demonstrates XCP robustness to high variance in the RTT distri- 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 5 10 15 20 25 30 Flow Throughput (Mb/s) ➤ (a) Equal-RTT Flow ID XCP RED CSFQ REM AVQ 0 0.5 1 1.5 2 2.5 3 0 5 10 15 20 25 30 Flow Throughput (Mb/s) ➤ (b) Different-RTT Flow ID XCP RED CSFQ REM AVQ Figure 8: XCP is fair to both equal and different RTT flows. The graphs compare XCP’s Fairness to that of TCP over RED, CSFQ, REM, and AVQ. Graph (b) also shows XCP’s robustness to environments with different RTTs. bution. Thus, although XCP computes an estimate of the average RTT of the system, it still operates correctly in environments where different flows have substantially different RTTs. For further information on this point see Appendix C. A More Complex Topology: This experiment uses the 9-link topology in Figure 3, although results are very similar for topologies with more links. Link 5 has the lowest capacity, namely 50 Mb/s, whereas the others are 100 Mb/s links. All links have 20 ms oneway propagation delay. Fifty flows, represented by the solid arrow, traverse all links in the forward direction. Fifty cross flows, illustrated by the small dashed arrows, traverse each individual link in the forward direction. 50 flows also traverse all links along the reverse path. Figure 9 illustrates the average utilization, queue size, and number of drops at every link. In general, all schemes maintain a reasonably high utilization at all links (note the y-scale). However, the trade off between optimal utilization and small queue size is handled differently in XCP from the various AQM schemes. XCP trades a few percent of utilization for a considerably smaller queue size. XCP’s lower utilization in this experiment compared to previous ones is due to disturbance introduced by shuffling. In particular, at links 1, 2, 3, and 4 (i.e., the set of links preceding the lowest capacity link along the path), the fairness controller tries to shuffle bandwidth from the cross flows to the long-distance flows, which have lower throughput. Yet, these long-distance flows are throttled downstream at link 5, and so cannot benefit from this positive feedback. This effect is mitigated at links downstream from link 5 because they can observe the upstream throttling and correspondingly reduce the amount of negative feedback given (see implementation in Appendix A). In any event, as the total shuffled bandwidth is less than 10%, the utilization is always higher than 90%
(b)Link ID (b)Time(seconds) 止出 Figure 9: Simulation with multiple congested &. Utiliza Figure 10: XCP's smooth convergence to high fairness, good tion, average Queue size, and number of dro consecu- utilization, and small queue size. Five XCP Flows share a 45 tive links(topology in Figure 3). Link 5 has capacity Mb/s bottleneck, They start their transfers at times 0, 2, 4, 6, along the path and 8 seconds It is possible to modify XCP to maintain 100% utilization in the a 45 Mb/s bottleneck and have a common rtt of 40 ms. The flows presence of multiple congested links. In particular, we could mod- start their transfers two seconds apart at 0, 2, 4, 6, and 8 seconds fy XCP so that it maintains the queue around a target value rather Figure 10-a shows that whenever a new fiow starts, the fair- than draining all of it. This would cause the disturbance induced ness controller reallocates bandwidth to maintain min-max fair by shuffling to appear as a fluctuation in the queue rather than as a ness. Figure 10-b shows that decoupling utilization and fairness drop in utilization. However, we believe that maintaining a small control ensures that this reallocation is achieved without disturbing queue size is more valuable than a few percent increase in utiliza the utilization. Finally, Figure 10-c shows the instantaneous queue, tion when flows traverse multiple congested links. In particular, which effectively absorbs the new traffic and drains afterwards it leaves a safety margin for bursty arrivals of new flows. In con Robustness to sudden Inerease or decrease in Traffic Demands. trast, the large queues maintained at all links in the TCP simulations In this experiment, we examine performance as traffic demands and cause every packet to wait at all of the nine queues, which consid- erably increases end-to-end latency. dynamics vary considerably. We start the simulation with 10 long lived FTP flows sharing a 100 Mb/s bottleneck with a round trip At the end of this section, it is worth noting that, in all of our sim propagation delay of 40 ms. At t=4 seconds, we start 100 new Is three orders of magnitude smaller than the other schemes despite flows leaving the original 10 flows in the system t stop these 100 ulations, the average drop rate of XCP was less than 10, which fows and let them stabilize. At t=8 seconds, w their use of ECN. Thus, in environments where the fair congestion Figure ll shows that XCP adapts quickly to a sudden increase or window of a flow is larger than one or two packets, XCP can con- decrease in traffic. It shows the utilization and queue, both for the trol congestion with almost no drops. (As the number of competing case when the flows are XCP, and for when they are TCPs travers- flows increases to the point where the fair congestion window is ing rED queues. XCP absorbs the new burst of flows without drop- less than one packet, drops become inevitable ping any packets, while maintaining high utilization. TCP on the other hand is highly disturbed by the sudden increase in the traffic 5.3 The Dynamics of XCP and takes a long time to restabilize. When the flows are suddenly While the simulations presented above focus on long term aver- stopped at t= 10 seconds, XCP quickly reallocates the spare band- age behavior, this section shows the short term dynamics of XCP. In width and continues to have high utilization. In contrast, the sudden particular, we show that XCPs utilization, queue size, and thro decrease in demand destabilizes TCP and causes a large sequence or presented in the section above is highly representative of the general behavior of the protocol 6. DIFFERENTIAL BANDWIDTHALLOCA Convergence Dynamics: We show that XCP dampens oscillations and converges smoothly to high utilization small queues and fair TION bandwidth allocation. In this experiment, 5 long-lived fiows share By decoupling efficiency and faimess, XCP provides a flexible
0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 Utilization (a) Link ID XCP RED CSFQ REM AVQ 0 500 1000 1500 2000 2500 1 2 3 4 5 6 7 8 9 Average Queue (packets) ➜ (b) Link ID XCP RED CSFQ REM AVQ 0 2000 4000 6000 8000 10000 12000 1 2 3 4 5 6 7 8 9 Packet Drops ➝ (c) Link ID XCP RED CSFQ REM AVQ Figure 9: Simulation with multiple congested queues. Utilization, average Queue size, and number of drops at nine consecutive links (topology in Figure 3). Link 5 has the lowest capacity along the path. It is possible to modify XCP to maintain 100% utilization in the presence of multiple congested links. In particular, we could modify XCP so that it maintains the queue around a target value rather than draining all of it. This would cause the disturbance induced by shuffling to appear as a fluctuation in the queue rather than as a drop in utilization. However, we believe that maintaining a small queue size is more valuable than a few percent increase in utilization when flows traverse multiple congested links. In particular, it leaves a safety margin for bursty arrivals of new flows. In contrast, the large queues maintained at all links in the TCP simulations cause every packet to wait at all of the nine queues, which considerably increases end-to-end latency. At the end of this section, it is worth noting that, in all of our simulations, the average drop rate of XCP was less than ✆✚❆❿➥✙➦ , which is three orders of magnitude smaller than the other schemes despite their use of ECN. Thus, in environments where the fair congestion window of a flow is larger than one or two packets, XCP can control congestion with almost no drops. (As the number of competing flows increases to the point where the fair congestion window is less than one packet, drops become inevitable.) 5.3 The Dynamics of XCP While the simulations presented above focus on long term average behavior, this section shows the short term dynamics of XCP. In particular, we show that XCP’s utilization, queue size, and throughput exhibit very limited oscillations. Therefore, the average behavior presented in the section above is highly representative of the general behavior of the protocol. Convergence Dynamics: We show that XCP dampens oscillations and converges smoothly to high utilization small queues and fair bandwidth allocation. In this experiment, 5 long-lived flows share 0 5 10 15 20 25 30 35 40 45 50 0 2 4 6 8 10 12 14 Flow Throughput (Mb/s) ➤ (a) Time (seconds) Throughput of Flow 1 Throughput of Flow 2 Throughput of Flow 3 Throughput of Flow 4 Throughput of Flow 5 0 0.2 0.4 0.6 0.8 1 1.2 0 2 4 6 8 10 12 14 Utilization (b) Time (seconds) Bottleneck Utilization 0 1 2 3 4 5 Queue at Packet Arrival (Packets) 0 2 4 6 8 10 12 14 (c) Time (seconds) Bottleneck Queue Figure 10: XCP’s smooth convergence to high fairness, good utilization, and small queue size. Five XCP flows share a 45 Mb/s bottleneck. They start their transfers at times 0, 2, 4, 6, and 8 seconds. a 45 Mb/s bottleneck and have a common RTT of 40 ms. The flows start their transfers two seconds apart at 0, 2, 4, 6, and 8 seconds. Figure 10-a shows that whenever a new flow starts, the fairness controller reallocates bandwidth to maintain min-max fairness. Figure 10-b shows that decoupling utilization and fairness control ensures that this reallocation is achieved without disturbing the utilization. Finally, Figure 10-c shows the instantaneous queue, which effectively absorbs the new traffic and drains afterwards. Robustness to Sudden Increase or Decrease in Traffic Demands: In this experiment, we examine performance as traffic demands and dynamics vary considerably. We start the simulation with 10 longlived FTP flows sharing a 100 Mb/s bottleneck with a round trip propagation delay of 40 ms. At ☞✾★➛❈ seconds, we start 100 new flows and let them stabilize. At ☞✒★➧↕ seconds, we stop these 100 flows leaving the original 10 flows in the system. Figure 11 shows that XCP adapts quickly to a sudden increase or decrease in traffic. It shows the utilization and queue, both for the case when the flows are XCP, and for when they are TCPs traversing RED queues. XCP absorbs the new burst of flows without dropping any packets, while maintaining high utilization. TCP on the other hand is highly disturbed by the sudden increase in the traffic and takes a long time to restabilize. When the flows are suddenly stopped at ☞➨★→✆✚❆ seconds, XCP quickly reallocates the spare bandwidth and continues to have high utilization. In contrast, the sudden decrease in demand destabilizes TCP and causes a large sequence of oscillations. 6. DIFFERENTIALBANDWIDTHALLOCATION By decoupling efficiency and fairness, XCP provides a flexible
Figure 11: XCP is more robust against sudden increase or decrease in traffic demands than TCP. Ten FTP flows share a bottleneck. At time t=4 seconds, we start 100 additional flows. At t=8 seconds, these 100 flows are suddenly stopped and the original 10 flows are left to stabilize again. framework for designing a variety of bandwidth allocation schemes In particular, the min-max fairness controller, described in$3, may e replaced by a controller auses the flows throughputs to converge to a different bandwidth allocation(e. g, weighted fair needs to replace the AIMD policy used by the FC by a policy that 51f-ouon ess, proportional fairness, priority, etc). To do so, the designer allocates the aggregate feedback to the individual flows so that they converge to the desired rates In this section, we modify the fairness controller to provide dif- ferential bandwidth allocation. Before describing our bandwidth differentiation scheme, we note that in XCP, the only interesting Tme(seconds) juality of service (Qos)schemes are the ones that address band width allocation. Since XCP provides small queue size and near- Figure 12: Providing differential bandwidth allocation using zero drops, QoS schemes that guarantee small queuing delay or low XCP. Three XCP flows each transferring a 10 Mbytes file over tter are redundant a shared 10 Mb/s bottleneck, Flow 1's price is 5, Flow 2s We describe a simple scheme that provide differential bandwidth price is 10, and Flow 3 s price is 15. Throughput is averaged allocation according to the shadow prices model defined by Kelly over 200 ms(5 RTTs 1211. In this model, a user chooses the price per unit time she is willing to pay. The network allocates bandwidth so that the throughputs of users competing for the same bottleneck are pro- throughputs continue being proportional to their prices.Note the throughputs high responsiveness of the system. In particular, when Flow 1 fin- To provide bandwidth differentiation, we replace the AIMD pol- shes its transfer freeing half of the link capacity, the other flows sending rates adapt in a few rTts icy by: Io>0, allocate it so that the increase in throughput of a flow is proportional to its price. 7. SECURITY lo<0, allocate it so that the decrease in throughput of aflow is Similarly to TCP, in XCP security against misbehaving sources requires an additional mechanism that polices the flows and ensures that they obey the congestion control protocol. This may be done e can implement the above policy by modifying the conges- by policing agents located at the edges of the network. The agents tion header. In particular, the sender replaces the H_cwnd field maintain per-iow state and monitor the behavior of the flows to by the current congestion window divided by the price she is will detect network attacks and isolate unresponsive sources ing to pay (i.e, cwnd/price). This minor modification is enough to Unlike TCP, XCP facilitates the job of these policing agents be- produce a service that complies with the above model. cause of its explicit feedback. Isolating the misbehaving source Next, we show simulation results that support our claims. Three becomes faster and easier because the agent can use the explicit XCP sources share a 10 Mb/s bottleneck. The corresponding prices feedback to test a source. More precisely, in TCP isolating an 10, and p3= 15. Each source wants to transfer responsive source requires the agent/router to monitor the averag a file of 10 Mbytes, and they all start together at t= 0. The re rate of a suspect source over a fairly long interval to decide whether sults in Figure 12 show that the transfer rate depends on the price the source is reacting according to AIMD. Also, the source pays. At the beginning, when all flows are active, their RTT is unknown, its correct sending rate is not specified, which throughputs are 5 Mb/s, 3=Mb/s, and 1=Mb/s, which are propor- complicates the task even further. In contrast, in XCP, isolating a tional to their corresponding prices. After Flow I finishes its trans. suspect flow is easy. The router can send the flow a test feedback er, the remaining flows grab the freed bandwidth such that their requiring it to decrease its congestion window to a particular value
0 0.2 0.4 0.6 0.8 1 Utilization Averaged over an RTT 0 2 4 6 8 10 12 ➩ (a) Time (seconds) XCP 0 100 200 300 400 500 Queue at Packet Arrival (packets) 0 2 4 6 8 10 12 (b) Time (seconds) XCP 0 0.2 0.4 0.6 0.8 1 Utilization Averaged over an RTT 0 2 4 6 8 10 12 ➩ (c) Time (seconds) TCP 0 100 200 300 400 500 Queue at Packet Arrival (packets) 0 2 4 6 8 10 12 (d) Time (seconds) TCP Figure 11: XCP is more robust against sudden increase or decrease in traffic demands than TCP. Ten FTP flows share a bottleneck. At time ☞➨★✼❈ seconds, we start 100 additional flows. At ☞➫★➑↕ seconds, these 100 flows are suddenly stopped and the original 10 flows are left to stabilize again. framework for designing a variety of bandwidth allocation schemes. In particular, the min-max fairness controller, described in ✄ 3, may be replaced by a controller that causes the flows’ throughputs to converge to a different bandwidth allocation (e.g., weighted fairness, proportional fairness, priority, etc). To do so, the designer needs to replace the AIMD policy used by the FC by a policy that allocates the aggregate feedback to the individual flows so that they converge to the desired rates. In this section, we modify the fairness controller to provide differential bandwidth allocation. Before describing our bandwidth differentiation scheme, we note that in XCP, the only interesting quality of service (QoS) schemes are the ones that address bandwidth allocation. Since XCP provides small queue size and nearzero drops, QoS schemes that guarantee small queuing delay or low jitter are redundant. We describe a simple scheme that provide differential bandwidth allocation according to the shadow prices model defined by Kelly [21]. In this model, a user chooses the price per unit time she is willing to pay. The network allocates bandwidth so that the throughputs of users competing for the same bottleneck are proportional to their prices; (i.e., ♦ ✇ ♥P➭✏➯✞➲ ✇ ✉➯✕♦ ❥ ✉ ♥ ❱ ❢✌➳ ❥ ★ ♦ ✇ ♥P➭➂➯✞➲ ✇ ✉ ➯❊♦➃➵ ✉ ♥ ❱ ❢❝➳ ➵ ). To provide bandwidth differentiation, we replace the AIMD policy by: If ✺✽❑ ❆, allocate it so that the increase in throughput of a flow is proportional to its price. If ✺ ■❍❆, allocate it so that the decrease in throughput of a flow is proportional to its current throughput. We can implement the above policy by modifying the congestion header. In particular, the sender replaces the ✡ ✍✟✎✱✑✔✓ field by the current congestion window divided by the price she is willing to pay (i.e, cwnd/price). This minor modification is enough to produce a service that complies with the above model. Next, we show simulation results that support our claims. Three XCP sources share a 10 Mb/s bottleneck. The corresponding prices are ❯ ✠ ★➧➙, ❯ ✈ ★➏✆✥❆, and ❯➍ ★➏✆✞➙. Each source wants to transfer a file of 10 Mbytes, and they all start together at ☞✩★➣❆. The results in Figure 12 show that the transfer rate depends on the price the source pays. At the beginning, when all flows are active, their throughputs are 5 Mb/s, ➟ ✠➍ Mb/s, and ✆ ➍✈ Mb/s, which are proportional to their corresponding prices. After Flow 1 finishes its transfer, the remaining flows grab the freed bandwidth such that their 0 2 4 6 8 10 0 5 10 15 20 Throughput (Mb/s) ➸ Time (seconds) Flow 0 Flow 1 Flow 3 Figure 12: Providing differential bandwidth allocation using XCP. Three XCP flows each transferring a 10 Mbytes file over a shared 10 Mb/s bottleneck. Flow 1’ s price is 5, Flow 2’ s price is 10, and Flow 3’ s price is 15. Throughput is averaged over 200 ms (5 RTTs). throughputs continue being proportional to their prices. Note the high responsiveness of the system. In particular, when Flow 1 finishes its transfer freeing half of the link capacity, the other flows’ sending rates adapt in a few RTTs. 7. SECURITY Similarly to TCP, in XCP security against misbehaving sources requires an additional mechanism that polices the flows and ensures that they obey the congestion control protocol. This may be done by policing agents located at the edges of the network. The agents maintain per-flow state and monitor the behavior of the flows to detect network attacks and isolate unresponsive sources. Unlike TCP, XCP facilitates the job of these policing agents because of its explicit feedback. Isolating the misbehaving source becomes faster and easier because the agent can use the explicit feedback to test a source. More precisely, in TCP isolating an unresponsive source requires the agent/router to monitor the average rate of a suspect source over a fairly long interval to decide whether the source is reacting according to AIMD. Also, since the source’s RTT is unknown, its correct sending rate is not specified, which complicates the task even further. In contrast, in XCP, isolating a suspect flow is easy. The router can send the flow a test feedback requiring it to decrease its congestion window to a particular value