正在加载图片...
EEE/ACM TRANSACTIONS ON NETWORKING. VOL 7. NO. 2. APRIL 1999 The isLIP Scheduling algorithm for Input-Queued Switches Nick McKeown, Senior Member, IEEE Abstract-An increasing number of high performance inter- is nonblocking, it allows multiple cells to be transferred across networking protocol routers, LAN and asynchronous transfer the fabric simultaneously, alleviating the congestion found on rossbar switch. Most offer a switched backplane based on a a conventional shared backplane. In this paper, we describe an hold packets waiting to traverse the switching fabric. It is algorithm that is designed to configure a crossbar switch using well known that if simple first in first out(FIFO) input queues a single-chip centralized scheduler. The algorithm presented are used to hold packets then, even under benign conditions, here attempts to achieve high throughput for best-effort unicast head-of-line (HOL) blocking limits the achievable bandwidth traffic, and is designed to be simple to implement in hardware to approximately 58.6% of the maximum. HOL blocking can Our work was motivated by the design of two such systems described in this paper. A scheduling algorithm is used to con- the Cisco 12 000 GSR, a 50-Gb/s IP router, and the Tiny Tera figure the crossbar switch, deciding the order in which packets a 0.5-Tb/ MPLS switch [7] will be served. Recent results have shown that with a suitable Before using a crossbar switch as a switching fabric. it is scheduling algorithm, 100% throughput can be achieved. In important to consider some of the potential drawbacks, we An iterative round-robin algorithm. i sliP can achieve 100% consider three here. First, the implementation complexity of an are presented, along with modified versions for prioritized traffic. Fortunately, the majority of high-performance switches and Simulation results are presented to indicate the performance routers today have only a relatively small number of ports of iSLIP under benign and bursty traffie conditions. Prototyp and commercial implementations of iSLIP exist in systems with (usually between 8 and 32). This is because the highest aggregate bandwidths ranging from 50 to 500 Gb/s. When the performance devices are used at aggregation points where port traffic is nonuniform, isLiP quickly adapts to a fair scheduling density is low.Our work is, therefore, focussed on systems policy that is guaranteed never to starve an input queue. Finally, with low port density. A second potential drawback of crossbar we describe the implementation complexity of isLIP. Based on switches is that they make it difficult to provide guaranteed hedulers have been built supporting up to 32 ports, nilele-chip qualities of service. This is because cells arriving to the switch a two-dimensional(2-D) array of priority encoders, sing must contend for access to the fabric with cells at both the input and the output. The time at which they leave the input Index Termns-ATM switch, crossbar switch, inpul-queueing, queues and enter the crossbar switching fabric is dependent on router, scheduling. other traffic in the system, making it difficult to control when a cell will depart. There are two common ways to mitigate L INTRODUCTION this problem. One is to schedule the transfer of cells from N AN ATTEMPT to take advantage of the cell-switching inputs to outputs in a similar manner to that used in a time- hag a pacity of the asynchronous transfer mode(ATM), there slot interchanger, providing peak bandwidth allocation for recently been a merging of ATM switches and Inter- reserved flows. This method has been implemented in at least net Protocol(IP) routers [29[32]. This idea is already two commercial switches and routers. The second approach Deing carried one step further, with cell switches forming is to employ "speedup "in which the core of the switch the core, or backplane, of high-performance IP routers [26), runs faster than the connected lines. Simulation and analytical 31](61, [4]. Each of these high-speed switches and routers results indicate that with a small speedup, a switch will deliver is built around a crossbar switch that is configured using a cells quickly to their outgoing port, apparently independent of centralized scheduler, and each uses a fixed-size cell as a contending traffic [271,[371-[41]. While these techniques are transfer unit. Variable-length packets are segmented as they of growing importance, we restrict our focus in this paper to arrive, transferred across the central switching fabric, and the efficient and fast scheduling of best-effort traffic then reassembled again into packets before they depart. A crossbar switch is used because it is simple to implement and I Some people believe that this situation will change in the future, and that switches and routers with Manuscript received November 19 r even thousands of port se systems become real, then cross proved by IEEE/ACM TRANSACTIONS not be suitable g, Stanford However, the techniques described here will be suitable for a few years University, Stanford, CA 94305-9030 ford. edu) Publisher Item Identifier S 1063-6692(99)03593.1 Gigaswitch/ATM (2)and the Cisco Systems LS2020 ATM Switc 1063-6692/99s1000@1999IEEE188 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 7, NO. 2, APRIL 1999 The iSLIP Scheduling Algorithm for Input-Queued Switches Nick McKeown, Senior Member, IEEE Abstract—An increasing number of high performance inter￾networking protocol routers, LAN and asynchronous transfer mode (ATM) switches use a switched backplane based on a crossbar switch. Most often, these systems use input queues to hold packets waiting to traverse the switching fabric. It is well known that if simple first in first out (FIFO) input queues are used to hold packets then, even under benign conditions, head-of-line (HOL) blocking limits the achievable bandwidth to approximately 58.6% of the maximum. HOL blocking can be overcome by the use of virtual output queueing, which is described in this paper. A scheduling algorithm is used to con- figure the crossbar switch, deciding the order in which packets will be served. Recent results have shown that with a suitable scheduling algorithm, 100% throughput can be achieved. In this paper, we present a scheduling algorithm called iSLIP. An iterative, round-robin algorithm, iSLIP can achieve 100% throughput for uniform traffic, yet is simple to implement in hardware. Iterative and noniterative versions of the algorithms are presented, along with modified versions for prioritized traffic. Simulation results are presented to indicate the performance of iSLIP under benign and bursty traffic conditions. Prototype and commercial implementations of iSLIP exist in systems with aggregate bandwidths ranging from 50 to 500 Gb/s. When the traffic is nonuniform, iSLIP quickly adapts to a fair scheduling policy that is guaranteed never to starve an input queue. Finally, we describe the implementation complexity of iSLIP. Based on a two-dimensional (2-D) array of priority encoders, single-chip schedulers have been built supporting up to 32 ports, and making approximately 100 million scheduling decisions per second. Index Terms— ATM switch, crossbar switch, input-queueing, IP router, scheduling. I. INTRODUCTION I N AN ATTEMPT to take advantage of the cell-switching capacity of the asynchronous transfer mode (ATM), there has recently been a merging of ATM switches and Inter￾net Protocol (IP) routers [29], [32]. This idea is already being carried one step further, with cell switches forming the core, or backplane, of high-performance IP routers [26], [31], [6], [4]. Each of these high-speed switches and routers is built around a crossbar switch that is configured using a centralized scheduler, and each uses a fixed-size cell as a transfer unit. Variable-length packets are segmented as they arrive, transferred across the central switching fabric, and then reassembled again into packets before they depart. A crossbar switch is used because it is simple to implement and Manuscript received November 19, 1996; revised February 9, 1998; ap￾proved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor H. J. Chao. The author is with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305-9030 USA (e-mail: nickm@stanford.edu). Publisher Item Identifier S 1063-6692(99)03593-1. is nonblocking; it allows multiple cells to be transferred across the fabric simultaneously, alleviating the congestion found on a conventional shared backplane. In this paper, we describe an algorithm that is designed to configure a crossbar switch using a single-chip centralized scheduler. The algorithm presented here attempts to achieve high throughput for best-effort unicast traffic, and is designed to be simple to implement in hardware. Our work was motivated by the design of two such systems: the Cisco 12 000 GSR, a 50-Gb/s IP router, and the Tiny Tera: a 0.5-Tb/s MPLS switch [7]. Before using a crossbar switch as a switching fabric, it is important to consider some of the potential drawbacks; we consider three here. First, the implementation complexity of an -port crossbar switch increases with making crossbars impractical for systems with a very large number of ports. Fortunately, the majority of high-performance switches and routers today have only a relatively small number of ports (usually between 8 and 32). This is because the highest performance devices are used at aggregation points where port density is low.1 Our work is, therefore, focussed on systems with low port density. A second potential drawback of crossbar switches is that they make it difficult to provide guaranteed qualities of service. This is because cells arriving to the switch must contend for access to the fabric with cells at both the input and the output. The time at which they leave the input queues and enter the crossbar switching fabric is dependent on other traffic in the system, making it difficult to control when a cell will depart. There are two common ways to mitigate this problem. One is to schedule the transfer of cells from inputs to outputs in a similar manner to that used in a time￾slot interchanger, providing peak bandwidth allocation for reserved flows. This method has been implemented in at least two commercial switches and routers.2 The second approach is to employ “speedup,” in which the core of the switch runs faster than the connected lines. Simulation and analytical results indicate that with a small speedup, a switch will deliver cells quickly to their outgoing port, apparently independent of contending traffic [27], [37]–[41]. While these techniques are of growing importance, we restrict our focus in this paper to the efficient and fast scheduling of best-effort traffic. 1Some people believe that this situation will change in the future, and that switches and routers with large aggregate bandwidths will support hundreds or even thousands of ports. If these systems become real, then crossbar switches—and the techniques that follow in this paper—may not be suitable. However, the techniques described here will be suitable for a few years hence. 2A peak-rate allocation method was supported by the DEC AN2 Gigaswitch/ATM [2] and the Cisco Systems LS2020 ATM Switch. 1063–6692/99$10.00  1999 IEEE
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有