INTERNETWORKING: RESEARCH AND EXPERIENCE, VOL. 1, 3-26(1990) Analysis and Simulation of a Fair Queueing Algorithm ALAN DEMERS Xerox PARC 3333 Coyote Hill Road Palo Alto, ca94304 U.S.A. SRINIVASAN KESHAV Computer Science Division Department of EECS University of California at Berkeley Berkeley, CA94720, U.S.A. SCOTT SHENKER Xerox PARC 3333 Coyote Hill Road Palo Alto CA94304 U.S.A. SUMMARY We discuss gateway queueing algorithms and their role in controlling congestion in datagram networks. fair queueing algorithm, based on an earlier suggestion by Nagle, is proposed. Analysis and simulations are used to compare this algorithm to other congestion control schemes. We find queueing algorithm: fair allocation of bandwidth lower delay for sources using less than their full share of bandwidth, and protection from ill-behaved sources. KEY WORDS Congestion control Queueing algorithms Fair queueing INTRODUCTION Datagram networks have long suffered from performance degradation in the presence of congestion(Gerla and Kleinrock, 1980. The rapid growth, in both use and size, of 1049-8915/90/010003-24$12.00 Received January 1990 1990 by John Wiley&sons,ltd Revised 10 April 1990
A DEMERs, S KESHAV AND S SHENKER computer networks has sparked a renewed interest in methods of congestion control (Jain and Ramakrishnan, 1988; Ramakrishnan and Jain, 1988; Chiu and Jain, 1989 Ramakrishnan Chiu and Jain, 1987; Jain Ramakrishnan and Chiu. 1987: Jacobson 1988 Mankin and Thompson, 1989; Nagle, 1984, 1987). These methods have two points of implementation. The first is at the source, where flow control algorithms vary the rate at which the source sends packets. Of course, flow control algorithms are designed primarily to ensure the presence of free buffers at the destination host, but we are more concerned with their role in limiting the overall network traffic. The second point of oplementation is at the gateway. Congestion can be controlled at gateways through routing and queueing algorithms. Adaptive routing, if properly implemented, lessens congestion by routing packets away from network bottlenecks. Queueing algorithms, which control the order in which packets are sent and the usage of the gateway's buffer space, do not affect congestion directly, in that they do not change the total traffic on the gateway's outgoing line. Queueing algorithms do, however, determine the way in which packets from different sources interact with each other which, in turn, affects the collective behavior of flow control algorithms. We shall argue that this effect, which is often ignored, makes queueing algorithms a crucial component in eff ective congestion control Queueing algorithms can be thought of as allocating three nearly independent quantities bandwidth(which packets get transmitted), promptness(when do those packets get transmitted), and buffer space (which and when packets get discarded by the gateway) Currently, the most common queueing algorithm is first-come-first-served(FCFS).FCFS queueing essentially relegates all congestion control to the sources, since the order of arrival completely determines the bandwidth, promptness, and buffer space allocations Thus, FCFS inextricably intertwines these three allocation issues. There may indeed be flow control algorithms that, when universally implemented throughout a network with FCFS gateways, can overcome these limitations and provide reasonably fair and efficient congestion control. This point is discussed more fully later in the paper, when several flow control algorithms are compared. However, with today's diverse and decentralized omputing environments, it is unrealistic to expect universal implementation of any given flow control algorithm. This is not merely a question of standards, but also one of compliance. Even if a universal standard such as OSI(International Organization for Standardization, 1986)were adopted, malfunctioning hardware and software could violate the standard, and there is always the possibility that individuals would alter the algorithms on their own machine to improve their performance at the expense of others. Consequently congestion control algorithms should function well even in the presence of ill-behaved sources. Unfortunately, irrespective of the flow control algorithm used by the well behaved sources, networks with FCFS gateways do not have this proper gle source, sending packets to a gateway at a sufficiently high speed, can capture an arbitrarily high fraction of the bandwidth of the outgoing line. Thus, FCFS queueing is not adequate more discriminating queueing algorithms must be used in conjunction with source flow control algorithms to control congestion effectively in noncooperative environments Following a similar line of reasoning, Nagle(1987)proposed a fair queueing(FQ) algorithm in which gateways maintain separate queues for packets from each individual source. The queues are serviced in a round-robin manner. This prevents a source from arbitrarily increasing its share of the bandwidth or the delay of other sources. In fact when a source sends packets too quickly, it merely increases the length of its own queue
FAIR QUEUEING ALGORITHM Nagle's algorithm, by changing the way packets from different sources interact, does not reward, nor leave others vulnerable to, antisocial behavior. On the surface, this proposal appears to have considerable merit, but we are not aware of any published data on the performance of datagram networks with such fair queueing gateways. In this paper, we will first describe a modification of Nagle's algorithm, and then provide simulation data comparing networks with FQ gateways and those wi The three different components of congestion control algorithms introduced above source flow control, gateway routing and gateway queueing algorithms, interact in interesting and complicated ways. It is impossible to assess the effectiveness of any algorithm without reference to the other components of congestion control in operation We will evaluate our proposed queueing algorithm in the context of static routing and several widely used flow control algorithms. The aim is to find a queueing algorithm that functions well in current computing environments. The algorithm might, indeed it should enable new and improved routing and flow control algorithms, but it must not require We had three goals in writing this paper. The first was to describe a new fair queueing algorithm. In the next section, we discuss the design requirements for an effective ueueing algorithm, outline how Nagle's original proposal fails to meet them, and then propose a new fair queueing algorithm which does meet these requirements. Our second goal was to provide some rigorous understanding of the performance of this new fair queueing algorithm; we present a delay-throughput curve given by this algorithm for a specific configuration of sources, and then compare this performance to that given by the FCFS algorithm. Our third goal was to evaluate our new queueing proposal in the context of real networks. To this end we discuss some currently implemented flow control algorithms and present simulation data comparing several combinations of flow control and queueing algorithms on six benchmark networks In circuit-switched networks where there is explicit buffer reservation and uniform packet sizes, it has been established that round-robin service disciplines allocate bandwidth fairly(Hahne, 1986; Katevenis, 1987). Recently Morgan (1989)has examined the role such queueing algorithms play in controlling congestion in circuit switched networks although his application context is quite different from ours, his conclusions are qualitatively similar. In other related work, the datAKIt queueing algorithm combines round-robin service and FIFO Priority service, and has been analyzed extensively (Lo 1987; Fraser and Morgan, 1984). Also, Luan and Lucantoni(1988) present a different form of bandwidth management policy for circuit switched networks Since the completion of this work, we have learned of similar work by Zhang(1989); her Virtual Clock gateway queueing algorithm is essentially identical to the fair queueing agorithm presented here. Zhang analyzes this algorithm in the context of a proposed resource reservation scheme. the flow Network, whereas we do not consider resource reservation. Heybey and Davin(1989)have simulated a simplified version of our fair queueing algorithm, investigating issues of buffer allocation and policy-based bandwidth allocation. McKenney(1990)and Keshav(1990) have investigated the implementation pects of fair queueing. In addition, Greenberg and Madras(1990)have established some performance bounds on the fairness of our fair queueing scheme and other similar DATAKIT is a Trademark of AT&T
A DEMERS, S KESHAV AND S SHENKER FAIR QUEUEING Motivation What are the requirements for a queueing algorithm that will allow source fow control algorithms to provide adequate congestion control even in the pre sence of ill-behaved sources? We start with Nagle's observation that such queueing algorithms must provide otection, so that ill-behaved sources can only have a limited negative impact on well- behaved sources. Allocating bandwidth and buffer space in a fair manner, to be defined later, automatically ensures that ill-behaved sources can get no more than their fair share This led us to adopt, as our central design consideration, the requirement that the queueing algorithm allocate bandwidth and buffer space fairly. Ability to control the promptness, or delay, allocation somewhat independently of the bandwidth and bufter allocation is also desirable. Finally, we require that the gateway should provide service that, in some sense, does not depend discontinuously on a packet,'s time of arrival(this continuity condition will be made precise in the context of defining our algorithm). This continuity requirement attempts to prevent the efficiency of source flow control implementations from being overly sensitive to timing details(timers are the Bermuda Triangle of Aow control algorithms ). Nagle's proposal does not satisfy these requiremen The most obvious faw is its lack of consideration of packet lengths. A source using long ackets gets more bandwidth than one using short packets, so bandwidth is not allocated airly. Also, the proposal has no explicit promptness allocation other than that provided by the round-robin service discipline. In addition, the static round-robin ordering violates the continuity requirement. These defects are corrected in our version of fair queueing which we define after first dicussing our definition of fairness In stating our requirements for queueing algorithms, we have left the term fair undefined. The term fair has a clear colloquial meaning but it also has a technical definition(actually several, but only one is considered here). Consider, for example, the allocation of a single resource among N users. Assume there is an amount total of this resource and that each of the users requests an amount p; and, under a particular llocation. receives an amount u What is a fair allocation The max-min fairness criterion(Ramakrishnan, Chiu and Jain, 1987; Hahne, 1986; Gafni and Bertsekas, 1984) states that an allocation is fair if (1)no user receives more than its request, (2)no other allocation scheme satisfying condition 1 has a higher minimum allocation, and (3)condition 2 remains recursively true as we remove the minimal user and reduce the total resource accordingly, ptotale-H-total-Hmin. This condition reduces to w =MIN(Wrair p) in the simple example, with Pfair, the fair share, being set so that Total 2 H This concept of fairness easily generalizes to the multiple resource case(Ramakrishnan, Chiu and Jain, 1987) Note that implicit in the max-min definition of fairness is the assumption that the users have equal rights to the resource In our communication application, the bandwidth and buffer demands are clearly represented by the packets that arrive at the gateway( Demands for promptness are not explicitly communicated, and we return to this issue later. )However, it is not clear what constitutes a user. The user associated with a packet could refer to the source of the packet, the destination, the source-destination pair, or even refer to an individual process running on a source host. Each of these definitions has limitations. Allocation per source
FAIR QUEUEING ALGORITHM unnaturally restricts sources such as file servers which typically consume considerable bandwidth. Ideally the gateways could know that some sources deserve more bandwidth than others, but there is no adequate mechanism for establishing that knowledge in days networks. Allocation per receiver allows a receiver's useful incoming bandwidth to be reduced by a broken or malicious source sending unwanted packets to it. Allocation per process on a host encourages human users to start several processes communicating simultaneously, thereby avoiding the original intent of fair allocation. Allocation per source-destination pair allows a malicious source to consume an unlimited amount of bandwidth by sending many packets all to different destinations. While this does not allow the malicious source to do useful work, it can prevent other sources from obtaining sufficient bandwidth Overall, allocation on the basis of source-destination pairs, or conversations, seems the best trade-off between security and efficiency and will be used here. However, our treatment will apply to any of these interpretations of user. Given the requirements for an adequate queueing algorithm, coupled with the definitions of fairness and user, we now turn to the description of our fair queueing algorithm Definition It is simple to allocate buffer space fairly by dropping packets, when necessary, from the conversation with the largest queue. Allocating bandwidth fairly is less straightforward Pure round-robin service provides a fair allocation of packets-sent but fails to guarantee a fair allocation of bandwidth because of variations in packet sizes, To see how this unfairness can be avoided, we first consider a hypothetical service discipline where transmission occurs in a bit-by-bit round-robin(BR)fashion (as in a head-of-queue processor sharing discipline). This service discipline allocates bandwidth fairly since at every instant in time each conversation is receiving its fair share. Let R(r) denote the number of rounds made in the round-robin service discipline up to time t(r(t)is a continuous function, with the fractional part indicating partially completed rounds). Let Nac(t) denote the number of active conversations, i. e. those that have bits in their queue at time t. Then, aR/at=u/Nac(), where u is the linespeed of the gateway's outgoing line(we will, for convenience, work in units such that u=1). a packet of size P whose first bit gets serviced at time fo will have its last bit serviced P rounds later, at time t such that R(O=R(Lo)+ P. Let ro be the time that packet i belonging to conversation a arrives at the gateway, and define the numbers S and Fa as the values of R(r) when the packet started and finished service. With Pa denoting the size of the packet, the following relations hold: F9= S?+Po and S,= MAX(F-i, R(e9)). Since R(t)is a strictly monotonically increasing function whenever there are bits at the gateway, the ordering of the Fa values is the same as the ordering of the finishing times of the various packets in the br discipline Sending packets in a bit-by-bit round-robin fashion, while satisfying our requirements for an adequate queueing algorithm, is obviously unrealistic. We hope to emulate this impractical algorithm by a practical packet-by-packet transmission scheme. Note that the functions R(o and Nac(o) and the quantities S and F depend only on the packet arrival times r and not on the actual packet transmission times, as long as we define a conversation to be active whenever R(osFa for i=MAX(Irsn). We are thus free to use these quantities in defining our packet-by-packet transmission algorithm. A natural
A DEMERS. S. KESHAV AND S SHENKER way to emulate the bit-by-bit round-robin algorithm is to let the quantities Fa define the sending order of the packets. Our packet-by-packet transmission algorithm is simply defined by the rule that, whenever a packet finishes transmission, the next packet sent is the one with the smallest value of F. The continuity requirement mentioned earlier can be restated precisely as demanding that the relative transmission priorities depend continuously on the packet arrival times. The fact that the Fas depend continuously on the t@s means that our algorithm satisfies this continuity requirement In a preemptive version of this algorithm, newly arriving packets whose finishin lumber Fo is smaller than that of the packet currently in transmission preempt the transmitting packet. For practical reasons, we have implemented the nonpreemptive version, but the preemptive algorithm (with resumptive service) is more tractable analytically. Clearly the preemptive and nonpreemptive packetized algorithms do not give the same instantaneous bandwidth llocation as the bR version. However, for each conversation the total bits sent at a given time by these three algorithms are always within Pmax of each other, where Pmax is the maximum packet size (this lation discrepancy bound is proved in Greenberg and Madras(1990). Thus, over sufficiently long conversations, the packetized algorithms asymptotically approach the fair bandwidth allocation of the br scheme Recall that a user's request for promptness is not made explicit. The IP protocol(USC Information Sciences Institute, 1981a)does have a field for type-of-service, but not enough applications make intelligent use of this option to render it a useful hint. Consequently promptness allocation must be based solely on data already available at the gateway One such allocation strategy is to give more promptness(less delay) to us tho use less than their fair share of bandwidth. Separating the promptness allocation from the bandwidth allocation can be accomplished by introducing a nonnegative parameter 8, and defining a new quantity, the bid Bo, via B.=Pa+ MAX(F-1, R(9-8).The quantities R(O, Nac(o), Fo and So remain as before, but now the sending order is determined b the Bs, not the Fs. The asymptotic bandwidth allocation is independent of 6, since the Fs control the bandwidth allocation, but the algorithm gives slightly faster service to packets that arrive at an inactive conversation. The parameter 8 controls the extent of this additional promptness. Note that the bid B is continuous in ri, so that the forementioned continuity requirement is met The role of this term 8 can be seen more clearly by considering the two extreme cases 8=0 and 8=00. If an arriving packet has R(oasFo-I, then the conversation a is active (i.e. the corresponding conversation in the BR algorithm would have bits in the queue) In this case, the value of 8 is irrelevant and the bid number depends only on the finishing lumber of the previous packet. However, if R(r)>Fi-1, so that the a conversation is inactive, the two cases are quite different. With 8=0, the bid number is given by Bo Po+R() and is completely independent of the previous history of user a. With a=ox the bid number is B:= Pa+ Fa-I and depends only the previous packet's finishing number, no matter how many rounds ago. For intermediate values of 8, scheduling decisions for packets arriving at inactive conversations depend on the previous packet's finishing round as long as it was not too long ago, and 8 controls far back this dependence goes Recall that when the queue is full and a new packet arrives, the last packet from the conversation currently using the most buffer space is dropped. We have chosen to leave
FAIR QUEUEING ALGORITHM the quantities Fa and S unchanged when we drop a packet, This provides a small penalty for ill-behaved hosts, in that they will be charged for throughput that, because of their own poor flow control, they could not use. Subsequent work(Heybey and Davin, 1989) raises questions about the desirability of this aspect of our algorithm Performance The desired bandwidth and buffer allocations are completely specified by the definition of fairness, and we have demonstrated that our algorithm achieves those goals. However we have not been able to characterize the promptness allocation for an arbitrary arrival stream of packets. To obtain some quantitative results on the promptness, or delay performance of a single FQ gateway, we consider a very restricted class of arrival streams in which there are es of sources There are FtP-like file transfer sources which always have and transmit them whenever permitted by the source flow control (which, for simplicity, is taken to be sliding window flow control), and the are Telnet-like interactive sources, which produce packets intermittently according to some unspecified generation process. What are the quantities of interest? An FTP source is typically transferring a large file, so the quantity of interest is the transfer time of the file, which for asymptotically large files depends only on the bandwidth allocation. Given the configuration of sources this bandwidth allocation can be computed a priori by using the fairness property of FQ gateways. The interesting quantity for Telnet sources is the average delay of each packet, and it is for this quantity that we now provide a rather limited result Consider a single FQ gateway with N FTP sources sending packets of size Pe,and low a single packet of size Pr from a Telnet source to arrive at the gateway at time t. It will be assigned a bid number B=R(t)+ Pr-8; thus, the dependence of the queueing delay on the quantities Pr and 8 is only through the combination Pr-8. We will denote the queueing delay of this packet by (0), which is a periodic function with period NP We are interested in the de NPE p(r)di The finishing numbers Fa for the N FTPs can be expressed, after perhaps renumbering the packets, by Fo=(i+/m)P, where the ls obey Os/<1. The queueing delay of the Telnet packet depends on the configuration of ls whenever Pr<P. One can show that the delay is bounded by the extremal cases of /@=0 for all a and /o=a/N for a=0, N-1. The delay values for these extremal cases are straightforward to calculate; for the sake of brevity we omit the derivation and merely display the result below. The average queueing delay is given by 4=A(Pr-8), where the function A(P), the delay with 8=0, is defined below(with integer k and small constant 6, 0se<1, defined via Pr=PF(k+e)/
A DEMERS, S KESHAV AND S SHENKER Preemptive A(P)=NP- for P=Pp NP2 NP 1/P NPZ N NP2 P Lve A(P)=N(- for P=PF NI P A(P)=)11+xk2+k(2e-1)}forP PE PE 1)1/ for A(P)≤1+Nk2+k(2∈-1) +:≥P Now consider a general Telnet packet generation process (ignoring the effects of flow control)and characterize this generation process by the function D (PT)which denotes the queueing delay of the Telnet source when it is the sole source at the gateway. In the BR algorithm, the queueing delay of the Telnet source in the presence of N FTP sources is merely Do[(N+1)Prl For the packetized preemptive algorithm with 8=0, we can express the queueing delay in the presence of N FTP sources, call it D(PT),in terms of D, via the relation(averaging over all relative synchronizations between the FTPs and the Telnet) DN(PT)=Do[(N+1)Pr]+ A(Pr) where the term A(PT)reflects the extra delay incurred when emulating the BR algorithm by the preemptive packetized algorithm For nonzero values of 8, the generation process must be further characterized by the quantity lo(Pr, t)which, in a system where the Telnet is the sole source, is the probability that a packet arrives at a queue which has been idle for time f. The delay is given by DM(Pr)=Dol(N+1)Pr|+ A(Pr)- lol(N+1)Pr A(P1)-APr- MININ where the last term represents the reduction in delay due the nonzero 8. These expressions for DN, which were derived for the preemptive case, are also valid for the nonpreemptive algorithm when Pr≥P
FAIR QUEUEING ALGORITHM 1 What do these forbidding formulae mean? Consider, for concreteness, a Poisson arrival A, packet sizes Pr=PF=P, a linespeed u=1 and an FTP chronization described by l =a/N for a=0, 1,.,. N-1. Define p to be the average Iwidth of the stream, measured relative to the fair share of the Telnet: p=AP(N+1) for the nonpreemptive algorithm P2(1-)2+N(1-0(1-(N~少exp(N+1) DN(P) p MIN This equation represents the throughput/delay curve the FQ gateway offers the Poisson Telnet source (the formulae for different FTP synchronizations are substantially more complicated, but have the same qualitative behavior). This throughput/delay curve can be contrasted with that offered by an FCFS gateway; however, the FCFS results depend in detail on the flow control used by the FTP sources and on the surrounding network environment. We consider a network where all other communications speeds are infinitely fast in relation to the outgoing linespeed of the gateway, and where the FTPs all use sliding window flow control with window size w, so there are always Nw FTP packets in the queue or in transmission. Figure 1 shows the throughput/delay curves of such a system with parameters N=3 and w=5 for both an FCFS gateway and an FQ gateway with 8=0 and 8=P. For p-0, FCFS gives a large queueing delay of(Nw-2)P, whereas Q gives a queueing delay of NP/2 for 8=0 and P/2 for 8-P. This ability to provide a the FTPs, is one of the most important features or la dependent of the window sizes of lower delay to lower throughput sources, completely ine eueing. Note also that the FQ queueing delay diverges as p1, reflecting FQ's insistence that no conversation gets more than its fair share. In contrast, the fCfs curve remains finite for all p<(N+1), showing that an ill-behaved source can consume an arbitrarily large fraction of the bandwidth What happens in a network of FQ gateways? There are few results here, but Hahne (1986)has shown that for strict round-robin service gateways and only FTP sources there is fair allocation of bandwidth(in the multiple resource sense) when the window sizes are sufficiently large. She also provides examples where insufficient window sizes, but much larger than the communication path, result in unfair allocations. We believe, but have been unable to prove, that both of these properties hold for our fair queueing FLOW CONTROL ALGORITHMS Flow control algorithms are both the benchmarks against wh congestion control properties of fair queueing are measured, and also the envi which FQ gateways will operate. We already know that, when combined with teways ese control algorithms all suffer from the fundamental problem of vulnerability to ill-behaving sources. Also, there is no mechanism for separating the promptness allocation from the bandwidth and buffer allocation. The remaining question is then how fairly do these flow control algorithms allocate bandwidth. Before proceeding, note that there are really two distinct problems in controlling congestion Congestion recovery allow from a badly congested state, whereas congestion avoidance attempts to prevent the congestion from occurring (Jain and Ramakrishnan, 1988). In this paper, we are focusing on congestion avoidance and will not discuss congestion recovery mechanisms at length
DEMERS. S. KESHAV AND S SHENKER Dela nits: P) Q(6=P) 234567 91111.213141.516 Figure I, Queueing delay vs. throughput for a single Poisson Telnet source sharing a gateway with three FTP sources; the linespeed k=l and all packets are of size P A generic version of source flow control, as implemented in XNS'S SPP(Xerox Corporation, 1981)or in TCP(USC Information Sciences Institute, 1981b) has two parts There is a timeout mechanism, which provides for congestion recovery, whereby packets that have not been acknowledged before the timeout period are retransmitted(and a new timeout period set). The timeout periods are given by Brtt where typically B-2 and rtt is the exponentially averaged estimate of the round trip time(the rtt estimate for retransmitted packets is the time from their first transmission to their acknowledgement) The congestion avoidance part of the algorithm is sliding window flow control, with some