Copyright Hari Balakrishnan, 2001-2005, and Nick Feamster, 2005. All rights reserved Please do not redistribute without permission ECTURE 4 Interdomain Internet Routing he goal of this lecture is to explain how routing between different administrative do- mains works in the Internet. We discuss how Internet Service Providers(ISPs) change routing information(and packets)between each other, and how the way in which they buy service from and sell service to each other and their customers influences the technical research agenda of Internet routing in the real-world. we discuss the salient fea- tures of the Border Gateway Protocol, Version 4(BGP4), the current interdomain routing protocol in the Internet 4.1 Autonomous Systems An abstract, highly idealized view of the Internet is shown in Figure 4-1, where end-hosts hook up to routers, which hook up with other routers to form a nice connected graph of essentially"peer"routers that cooperate nicely using routing protocols that exchange shortest-path"or similar information and provide global connectivity. The same view posits that the graph induced by the routers and their links has a large amount of re- dundancy and the Internet's routing algorithms are designed to rapidly detect faults and problems in the routing substrate and route around them. Some would even posit that the same routing protocols today perform load-sensitive routing to dynamically shed load away from congested paths on to less-loaded path Unfortunately, while simple, this abstraction is actually quite misleading. TI of the Internet routing infrastructure is that the Internet service is provided by a large number of commercial enterprises, generally in competition with each other. Coopera tion, required for global connectivity, is generally at odds with the need to be a profitable commercial enterprise, which often occurs at the expense of ones competitors-the same people with whom one needs to cooperate. How this is achieved in practice(although there's lots of room for improvement), and how we might improve things, is an interest ing and revealing study of how good technical research can be shaped and challenged by commercial realities A second pass at developing a good picture of the Internet routing substrate is shown in Figure 4-2, which depicts a group of Internet Service Providers(ISPs)somehow cooper-
Copyright Hari Balakrishnan, 2001-2005, and Nick Feamster, 2005. All rights reserved. Please do not redistribute without permission. LECTURE 4 Interdomain Internet Routing The goal of this lecture is to explain how routing between different administrative domains works in the Internet. We discuss how Internet Service Providers (ISPs) exchange routing information (and packets) between each other, and how the way in which they buy service from and sell service to each other and their customers influences the technical research agenda of Internet routing in the real-world. We discuss the salient features of the Border Gateway Protocol, Version 4 (BGP4), the current interdomain routing protocol in the Internet. ! 4.1 Autonomous Systems An abstract, highly idealized view of the Internet is shown in Figure 4-1, where end-hosts hook up to routers, which hook up with other routers to form a nice connected graph of essentially “peer” routers that cooperate nicely using routing protocols that exchange “shortest-path” or similar information and provide global connectivity. The same view posits that the graph induced by the routers and their links has a large amount of redundancy and the Internet’s routing algorithms are designed to rapidly detect faults and problems in the routing substrate and route around them. Some would even posit that the same routing protocols today perform load-sensitive routing to dynamically shed load away from congested paths on to less-loaded paths. Unfortunately, while simple, this abstraction is actually quite misleading. The real story of the Internet routing infrastructure is that the Internet service is provided by a large number of commercial enterprises, generally in competition with each other. Cooperation, required for global connectivity, is generally at odds with the need to be a profitable commercial enterprise, which often occurs at the expense of one’s competitors—the same people with whom one needs to cooperate. How this is achieved in practice (although there’s lots of room for improvement), and how we might improve things, is an interesting and revealing study of how good technical research can be shaped and challenged by commercial realities. A second pass at developing a good picture of the Internet routing substrate is shown in Figure 4-2, which depicts a group of Internet Service Providers (ISPs) somehow cooper- 1
LECTURE 4. INTER End-host structure Figure 4-1: This is a rather misleading abstraction of the Internet routing layer. g to provide global connectivity to end-customers. This picture is closer to the truth, but the main thing it's missing is that not all ISPs are created equal. Some are bigger and more"connected"than others, and still others have global reachability in their routing tables. There are names given to these"small, ""large, "and"really huge" ISPs: Tier-3 ISPs are ones that have a small number of usually localized (in geography )end-customers Tier-2 ISPs generally have regional scope(e. g, state-wide, region-wide, or non-US country wide), while Tier-1 ISPs, of which there are a handful, have global scope in the sense that ting tables actually have routes to all currently reachable Internet prefixes(i.e they have no default routes). This organization is shown in Figure 4-3. The current wide-area routing protocol, which exchanges reachability information about routeable IP-address prefixes between routers at the boundary between ISPs, is BGP(Bor der Gateway Protocol, Version 4)[13, 14]. More precisely, the wide-area routing architec ture is divided into autonomous systems(ASes)that exchange reachability information.An AS is owned and administered by a single commercial entity, and implements some set of policies in deciding how to route its packets to the rest of the Internet, and how to export its routes(its own, those of its customers, and other routes it may have learned from other ASes)to other ASes. Each As is identified by a unique 16-bit number A different routing protocol operates within each AS. These routing protocols are called Interior Gateway Protocols(IGPs), and include protocols like Routing Information Protocol (RIP)[8]. Open Shortest Paths First(OSPF)[11], Intermediate System-Intermediate System (IS-IS)[12], and E-IGRP. In contrast, interdomain protocols like BGP are also called EGPs (Exterior Gateway Protocols). Operationally, a key difference between EGPs like BGP and IGPs is that the former is concerned with providing reachability information and facilitating routing policy implementation in a scalable manner, whereas the latter are typically con- cerned with optimizing a path metric. Scalability is typically not a major concern in the design of IGPs(and all known IGPs don' t scale as well as BGP does The rest of this lecture is in two parts: first, we will look at inter-AS relationships(transit and peering); then, we will study some salient features of BGP. We dont have time to
2 LECTURE 4. INTERDOMAIN INTERNET ROUTING A B C D “Internet” End-hosts Routers connected in a fault-tolerant structure E Figure 4-1: This is a rather misleading abstraction of the Internet routing layer. ating to provide global connectivity to end-customers. This picture is closer to the truth, but the main thing it’s missing is that not all ISPs are created equal. Some are bigger and more “connected” than others, and still others have global reachability in their routing tables. There are names given to these “small,” “large,” and “really huge” ISPs: Tier-3 ISPs are ones that have a small number of usually localized (in geography) end-customers; Tier-2 ISPs generally have regional scope (e.g., state-wide, region-wide, or non-US countrywide), while Tier-1 ISPs, of which there are a handful, have global scope in the sense that their routing tables actually have routes to all currently reachable Internet prefixes (i.e., they have no default routes). This organization is shown in Figure 4-3. The current wide-area routing protocol, which exchanges reachability information about routeable IP-address prefixes between routers at the boundary between ISPs, is BGP (Border Gateway Protocol, Version 4) [13, 14]. More precisely, the wide-area routing architecture is divided into autonomous systems (ASes) that exchange reachability information. An AS is owned and administered by a single commercial entity, and implements some set of policies in deciding how to route its packets to the rest of the Internet, and how to export its routes (its own, those of its customers, and other routes it may have learned from other ASes) to other ASes. Each AS is identified by a unique 16-bit number. A different routing protocol operates within each AS. These routing protocols are called Interior Gateway Protocols (IGPs), and include protocols like Routing Information Protocol (RIP) [8]. Open Shortest Paths First (OSPF) [11], Intermediate System-Intermediate System (IS-IS) [12], and E-IGRP. In contrast, interdomain protocols like BGP are also called EGPs (Exterior Gateway Protocols). Operationally, a key difference between EGPs like BGP and IGPs is that the former is concerned with providing reachability information and facilitating routing policy implementation in a scalable manner, whereas the latter are typically concerned with optimizing a path metric. Scalability is typically not a major concern in the design of IGPs (and all known IGPs don’t scale as well as BGP does). The rest of this lecture is in two parts: first, we will look at inter-AS relationships (transit and peering); then, we will study some salient features of BGP. We don’t have time to
SECTION 4.2. INTER-AS RELATIONSHIPS: TRANSIT AND PEERING ISP romers linked to routers in other ISPs) ISP Figure 4-2: The Internet is actually composed of many competing Internet Service Providers (IsPs)that co- operate to provide global connectivity. This picture suggests that all ISPs are"equal, " which isn't actually survey IGPs in this lecture, but you should be familiar with the more well-known ones like RIP and OSPF (or at least with distance-vector and link-state protocols). To learn more about IGPs if you're not familiar with them, read a standard networking textbook (e. 8, Peterson Davie, Kurose Ross, Tanenbaum) or a book on routing protocols(e.g Huitema) 4.2 Inter-As Relationships: Transit and peering The Internet is composed of many different types of ASes, from universities to corpora tions to regional Internet Service Providers(ISPs) to nationwide ISPs. Smaller ASes(e.g universities, corporations, etc. typically purchase Internet connectivity from ISPs. Smaller regional ISPs, in turn, purchase connectivity from larger ISPs with"backbone"networks Consider the picture shown in Figure 4-4. It shows an ISP, with AS number X, directly connected to a provider(from whom it buys Internet service) and a few customers( to whom it sells Internet service). In addition, the figure shows two other ISPs to whom it is directly connected, with whom X exchanges routing information via BGP. The different types of ASes lead to different types of business relationships between them, which in turn translate to different policies for exchanging and selecting routes There are two prevalent forms of As-As interconnection. The first form is provider-customer transit(aka"transit), wherein one ISP(the"provider" P in Figure 4-4)provides access to all (or most)destinations in its routing tables. Transit almost always is meaningful in an inter-AS relationship where financial settlement is involved; the provider charges its customers for Internet access, in return for forwarding packets on behalf of customers to destinations(and in the opposite direction in many cases). Another example of a transit relationship in Figure 4-4 is between X and its customers(the Cis) The second prevalent form is called peering. Here, two ASes(typically ISPs) provide
SECTION 4.2. INTER-AS RELATIONSHIPS: TRANSIT AND PEERING 3 “ISP” ISP network (some routers linked to routers in other ISP’s) ISP ISP ISP ISP End-hosts (ISP customers) Figure 4-2: The Internet is actually composed of many competing Internet Service Providers (ISPs) that cooperate to provide global connectivity. This picture suggests that all ISPs are “equal,” which isn’t actually true. survey IGPs in this lecture, but you should be familiar with the more well-known ones like RIP and OSPF (or at least with distance-vector and link-state protocols). To learn more about IGPs if you’re not familiar with them, read a standard networking textbook (e.g., Peterson & Davie, Kurose & Ross, Tanenbaum) or a book on routing protocols (e.g., Huitema). ! 4.2 Inter-AS Relationships: Transit and Peering The Internet is composed of many different types of ASes, from universities to corporations to regional Internet Service Providers (ISPs) to nationwide ISPs. Smaller ASes (e.g., universities, corporations, etc.) typically purchase Internet connectivity from ISPs. Smaller regional ISPs, in turn, purchase connectivity from larger ISPs with “backbone” networks. Consider the picture shown in Figure 4-4. It shows an ISP, with AS number X, directly connected to a provider (from whom it buys Internet service) and a few customers (to whom it sells Internet service). In addition, the figure shows two other ISPs to whom it is directly connected, with whom X exchanges routing information via BGP. The different types of ASes lead to different types of business relationships between them, which in turn translate to different policies for exchanging and selecting routes. There are two prevalent forms of AS-AS interconnection. The first form is provider-customer transit (aka “transit”), wherein one ISP (the “provider” P in Figure 4-4) provides access to all (or most) destinations in its routing tables. Transit almost always is meaningful in an inter-AS relationship where financial settlement is involved; the provider charges its customers for Internet access, in return for forwarding packets on behalf of customers to destinations (and in the opposite direction in many cases). Another example of a transit relationship in Figure 4-4 is between X and its customers (the Cis). The second prevalent form is called peering. Here, two ASes (typically ISPs) provide
LECTURE 4. INTER End-hosts (ISP customers) Tier3 ISP (“ Local”) Tier-2 ISP Provider Tier-2 ISP Customer -Provider (“ Regional or CDefault-free”; Has global reachability info) nother)Tier-I ISP Tier 2 ISP Figure 4-3: A more accurate picture of the wide-area Internet routing infrastructure, with various types of ISPs defined by their respective reach. Tier-1 ISPs have"default-free"routing tables (i.e. they don' t have any default routes), and typically have global reachability information. There are a handful of these today (about five or so) mutual access to a subset of each others'routing tables. The subset of interest here is their own transit customers(and the ISPs own internal addresses). Like transit, peering is a business deal, but it may not involve financial settlement. While paid peering is common in some parts of the world, in many cases they are reciprocal agreements. As long as the traffic ratio between the concerned ASs is not highly asymmetric(e.g, 4: 1 is a commonly believed and quoted ratio), there's usually no financial settlement. Peering deals are almost always under non-disclosure and are confidential 4.2.1 Peering v. Transit a key point to note about peering relationships is that they are often between business competitors. The common reason for peering is the observation by each party that a non- trivial fraction of the packets emanating from each one is destined for the others direct transit customers. Of course, the best thing for each of the ISPs to try to do would be to wean away the others customers, but that may be hard to do. The next best thing, which would be in their mutual interest, would be to avoid paying transit costs to their respective providers, but instead set up a transit-free link between each other to forward packets for their direct customers. In addition, this approach has the advantage that this more direct path would lead to better end-to-end performance(in terms of latency, packet loss rate, and throughput)for their customers. It's also worth noticing that a Tier-1 ISP usually will find it essential to be involved in peering relationships with other ISPs(especially other Tier-1 ISPs)to obtain global routing information in a default-free manner. Balancing these potential benefits are some forces against peering. Transit relationships generate revenue; peering relationships usually don't. Peering relationships typically need
4 LECTURE 4. INTERDOMAIN INTERNET ROUTING Tier-1 ISP (“Default-free”; Has global reachability info) Tier-3 ISP (“Local”) Tier-2 ISP (“Regional or country-wide) Tier-2 ISP End-hosts (ISP customers) (Another) Tier-1 ISP Customer Provider Customer Provider Tier-2 ISP Figure 4-3: A more accurate picture of the wide-area Internet routing infrastructure, with various types of ISPs defined by their respective reach. Tier-1 ISPs have “default-free” routing tables (i.e., they don’t have any default routes), and typically have global reachability information. There are a handful of these today (about five or so). mutual access to a subset of each others’ routing tables. The subset of interest here is their own transit customers (and the ISPs own internal addresses). Like transit, peering is a business deal, but it may not involve financial settlement. While paid peering is common in some parts of the world, in many cases they are reciprocal agreements. As long as the traffic ratio between the concerned ASs is not highly asymmetric (e.g., 4:1 is a commonly believed and quoted ratio), there’s usually no financial settlement. Peering deals are almost always under non-disclosure and are confidential. ! 4.2.1 Peering v. Transit A key point to note about peering relationships is that they are often between business competitors. The common reason for peering is the observation by each party that a nontrivial fraction of the packets emanating from each one is destined for the other’s direct transit customers. Of course, the best thing for each of the ISPs to try to do would be to wean away the other’s customers, but that may be hard to do. The next best thing, which would be in their mutual interest, would be to avoid paying transit costs to their respective providers, but instead set up a transit-free link between each other to forward packets for their direct customers. In addition, this approach has the advantage that this more direct path would lead to better end-to-end performance (in terms of latency, packet loss rate, and throughput) for their customers. It’s also worth noticing that a Tier-1 ISP usually will find it essential to be involved in peering relationships with other ISPs (especially other Tier-1 ISPs) to obtain global routing information in a default-free manner. Balancing these potential benefits are some forces against peering. Transit relationships generate revenue; peering relationships usually don’t. Peering relationships typically need
SECTION 4.2. INTER-AS RELATIONSHIPS: TRANSIT AND PEERING 5 Transit(SSS) Peering Transit(SSS) tTransit(S) Transit(sS Transit(5s) Z's cusTom Y's customers Figure 4-4: Inter-AS relationships; transit and peer to be renegotiated often, and asymmetric traffic ratios require care to handle in a way thats mutually satisfactory. Above all, these relationships are often between competitors vying for the same customer base the discussion so far, we have implicitly used an important property of current in terdomain routing: A route advertisement from B to A for a destination prefix is an agreement by b that it will forward packets sent via a destined for any destination in the prefix. This(im plicit)agreement implies that one way to think about Internet economics is to view iSPs as charging customers for entries in their routing tables. Of course, the data rate of the interconnection is also crucial, and is the major determinant of an ISP's pricing policy 4.2.2 Exporting Routes: Route Filtering Each As(ISP)needs to make decisions on which routes to export to its neighboring ISPs using BGP. The reason why export policies are important is that no iSP wants to act as transit for packets that it isn' t somehow making money on. Because packets flow in the opposite direction to the(best) route advertisement for any destination, an As should ad vertise routes to neighbors with care Transit customer routes. To an ISP, its customer routes are likely the most important, because the view it provides to its customers is the sense that all potential senders in the Internet can reach them. It is in the isP's best interest to advertise routes to its transit customers to as many other connected ASes as possible. The more traffic that an ISp car- ries on behalf of a customer, the"fatter"the pipe that the customer would need implying igher revenue for the ISP. Hence, if a destination were advertised from multiple neigh ors, an ISP should prefer the advertisement made from a customer over all other choices (in particular, over peers and transit providers
SECTION 4.2. INTER-AS RELATIONSHIPS: TRANSIT AND PEERING 5 C1 X Z ISP Y C2 C3 Peering Peering Transit ($$$) Transit ($$) Transit ($) Transit ($$) Z’s customers Y’s customers P C’P Transit ($$) Peering Transit ($$$) ISP Figure 4-4: Inter-AS relationships; transit and peering. to be renegotiated often, and asymmetric traffic ratios require care to handle in a way that’s mutually satisfactory. Above all, these relationships are often between competitors vying for the same customer base. In the discussion so far, we have implicitly used an important property of current interdomain routing: A route advertisement from B to A for a destination prefix is an agreement by B that it will forward packets sent via A destined for any destination in the prefix. This (implicit) agreement implies that one way to think about Internet economics is to view ISPs as charging customers for entries in their routing tables. Of course, the data rate of the interconnection is also crucial, and is the major determinant of an ISP’s pricing policy. ! 4.2.2 Exporting Routes: Route Filtering Each AS (ISP) needs to make decisions on which routes to export to its neighboring ISPs using BGP. The reason why export policies are important is that no ISP wants to act as transit for packets that it isn’t somehow making money on. Because packets flow in the opposite direction to the (best) route advertisement for any destination, an AS should advertise routes to neighbors with care. Transit customer routes. To an ISP, its customer routes are likely the most important, because the view it provides to its customers is the sense that all potential senders in the Internet can reach them. It is in the ISP’s best interest to advertise routes to its transit customers to as many other connected ASes as possible. The more traffic that an ISP carries on behalf of a customer, the “fatter” the pipe that the customer would need, implying higher revenue for the ISP. Hence, if a destination were advertised from multiple neighbors, an ISP should prefer the advertisement made from a customer over all other choices (in particular, over peers and transit providers)
LECTURE 4. INTERDOMAIN INTERNET ROUTING transit provider routes. Does an ISP want to provide transit to the routes exported by provider to it? Most likely not, because the isp isn t making any money on providing such transit facilities. An example of this situation is shown in Figure 4-4, where Cp is a customer of P, and P has exported a route to Cp to X. It isn,'t in X's interest to advertise this route to everyone, e.g., to other ISPs with whom X has a peering relationship. An impor- tant exception to this, of course, is X's transit customers who are paying X for service-the service X provides its customers Cis is that they can reach any location on the Internet via X, so it makes sense for X to export as many routes to X as possible Peer routes. It usually makes sense for an iSP to export only selected routes from its outing tables to other peering ISPs. It obviously makes sense to export routes to all of ones transit customers. It also makes sense to export routes to addresses within an ISP However, it does not make sense to export an ISP's transit provider routes to other peering ISPs, because that may cause a peering iSP to use the advertising ISP to reach a destination advertised by a transit provider. Doing so would expend isP resources but not lead to revenue. The same situation applies to routes learned from other peering relationships. Consider ISP Z in Figure 4-4, with its own transit customers. It doesnt make sense for X to advertise routes to Z's customers to another peering ISP(Y), because X doesn' t make any money on Y using X to get packets to Z's customers! These arguments show that most ISPs end up providing selective transit: typically, full transit capabilities for their own transit customers in both directions, some transit(between mutual customers)in a peering relationship, and transit only for ones transit customers (and ISP-internal addresses)to ones providers The discussion so far may make it sound like BGP is the only way in which to ex change reachability information between an ISP and its customers or between two ASes That is not true--a large fraction of end-customers(typically customers who don' t pro- vide large amounts of further transit and /or aren' t ISPs)do not run BGP sessions with their providers. The reason is that BGP is complicated to configure, administer, and man- age, and isnt particularly useful if the set of addresses in the customer is relatively in- variant. These customers interact with their providers via static routes. These routes are usually manually configured. Of course, information about customer address blocks will in general be exchanged by a provider using BGP to other ASes(ISPs)to achieve global reachability to the customer premises. 4.2.3 Importing Routes The previous section described the issues considered by an As(specifically, routers in an AS involved in BGP sessions with routers in other ASes) while deciding which routes to export. In a similar manner, when a router hears many possible routes to a destination network, it needs to decide which route to install in its forwarding tables This is a fairly involved process in BGP and requires a consideration of several attributes of the advertised routes. At this stage, we consider only one of the many things that a router needs to consider, but it's the most important consideration. It has to do with who advertised the route. Typically, when a router(e.g, X in Figure 4-4)hears advertisements to its transit customers from other ASes(e.g, because the customer is multi-homed), it needs to ensure that packets to the customer do not traverse additional ASes unnecessarily. This
6 LECTURE 4. INTERDOMAIN INTERNET ROUTING Transit provider routes. Does an ISP want to provide transit to the routes exported by its provider to it? Most likely not, because the ISP isn’t making any money on providing such transit facilities. An example of this situation is shown in Figure 4-4, where C! P is a customer of P, and P has exported a route to C! P to X. It isn’t in X’s interest to advertise this route to everyone, e.g., to other ISPs with whom X has a peering relationship. An important exception to this, of course, is X’s transit customers who are paying X for service—the service X provides its customers Ci’s is that they can reach any location on the Internet via X, so it makes sense for X to export as many routes to X as possible. Peer routes. It usually makes sense for an ISP to export only selected routes from its routing tables to other peering ISPs. It obviously makes sense to export routes to all of ones transit customers. It also makes sense to export routes to addresses within an ISP. However, it does not make sense to export an ISP’s transit provider routes to other peering ISPs, because that may cause a peering ISP to use the advertising ISP to reach a destination advertised by a transit provider. Doing so would expend ISP resources but not lead to revenue. The same situation applies to routes learned from other peering relationships. Consider ISP Z in Figure 4-4, with its own transit customers. It doesn’t make sense for X to advertise routes to Z’s customers to another peering ISP (Y), because X doesn’t make any money on Y using X to get packets to Z’s customers! These arguments show that most ISPs end up providing selective transit: typically, full transit capabilities fortheir own transit customers in both directions, some transit (between mutual customers) in a peering relationship, and transit only for one’s transit customers (and ISP-internal addresses) to one’s providers. The discussion so far may make it sound like BGP is the only way in which to exchange reachability information between an ISP and its customers or between two ASes. That is not true—a large fraction of end-customers (typically customers who don’t provide large amounts of further transit and/or aren’t ISPs) do not run BGP sessions with their providers. The reason is that BGP is complicated to configure, administer, and manage, and isn’t particularly useful if the set of addresses in the customer is relatively invariant. These customers interact with their providers via static routes. These routes are usually manually configured. Of course, information about customer address blocks will in general be exchanged by a provider using BGP to other ASes (ISPs) to achieve global reachability to the customer premises. ! 4.2.3 Importing Routes The previous section described the issues considered by an AS (specifically, routers in an AS involved in BGP sessions with routers in other ASes) while deciding which routes to export. In a similar manner, when a router hears many possible routes to a destination network, it needs to decide which route to install in its forwarding tables. This is a fairly involved process in BGP and requires a consideration of several attributes of the advertised routes. At this stage, we consider only one of the many things that a router needs to consider, but it’s the most important consideration. It has to do with who advertised the route. Typically, when a router(e.g., X in Figure 4-4) hears advertisements to its transit customers from other ASes (e.g., because the customer is multi-homed), it needs to ensure that packets to the customer do not traverse additional ASes unnecessarily. This
SECTION 43. BGP usually means that customer routes are prioritized over routes to the same network ad vertised by providers or peers. Second, peer routes are likely more preferable to provider routes, since the purpose of peering was to exchange reachability information about mu- tual transit customers. These two observations imply that typically routes are imported in the following priority order customer> peer> provider This rule(and many others like it)can be implemented in BGP using a special attribute thats locally maintained by routers in an As, called the LOCAL PREF attribute. The first rule in route selection with bGP is to pick a route based on this attribute It is only if this attribute is not set for a route, are other attributes of a route even considered. Note however, that in practice most routes in most ASes are not selected using the LOCAL PREF attribute; other attributes like the length of the AS path tend to be quite common. w discuss these other route attributes and the details of the BGP route selection process, also called the decision process, in the next section ■4.3BGP We now turn to how reachability information is exchanged using BGP, and how routing policies like the ones explained in the previous section can be expressed and enforced. We start with a discussion of the main design goals in BGP and summarize the protocol. Most of the complexity in wide-area routing is not in the protocol, but in how BGP routers are configured to implement policy, and in how routes learned from other ASes are dissemi- nated within an As. The rest of the section discusses these issues 4.3.1 Design Goals The design of BGP, and its current version (4), was motivated by three important needs 1. Scalability. the division of the Internet into ASes under independent administration was done while the backbone of the then internet was under the administration of the for BGP was to ensure that the Internet routing infrastructure remained scalable as the number of connected networks increased 2. Policy. The ability for each As to implement and enforce various forms of routing gn go opment of the BGP attribute structure for route announcements, and allowing route 3. Cooperation under competitive circumstances. BGP was designed in large part to handle the transition from the nsfnet to a situation where the" backbone"Inter net infrastructure would no longer be run by a single administrative entity. This structure implies that the routing protocol should allow ASes to make purely local decisions on how to route packets, from among any set of choices In the old NSFNET, the backbone routers exchanged routing information over a tree ology, using a routing protocol called EGP. (While the modern use of the term EGP
SECTION 4.3. BGP 7 usually means that customer routes are prioritized over routes to the same network advertised by providers or peers. Second, peer routes are likely more preferable to provider routes, since the purpose of peering was to exchange reachability information about mutual transit customers. These two observations imply that typically routes are imported in the following priority order: customer > peer > provider This rule (and many others like it) can be implemented in BGP using a special attribute that’s locally maintained by routers in an AS, called the LOCAL PREF attribute. The first rule in route selection with BGP is to pick a route based on this attribute. It is only if this attribute is not set for a route, are other attributes of a route even considered. Note, however, that in practice most routes in most ASes are not selected using the LOCAL PREF attribute; other attributes like the length of the AS path tend to be quite common. We discuss these other route attributes and the details of the BGP route selection process, also called the decision process, in the next section. ! 4.3 BGP We now turn to how reachability information is exchanged using BGP, and how routing policies like the ones explained in the previous section can be expressed and enforced. We start with a discussion of the main design goals in BGP and summarize the protocol. Most of the complexity in wide-area routing is not in the protocol, but in how BGP routers are configured to implement policy, and in how routes learned from other ASes are disseminated within an AS. The rest of the section discusses these issues. ! 4.3.1 Design Goals The design of BGP, and its current version (4), was motivated by three important needs: 1. Scalability. The division of the Internet into ASes under independent administration was done while the backbone of the then Internet was underthe administration of the NSFNet. An important requirement for BGP was to ensure that the Internet routing infrastructure remained scalable as the number of connected networks increased. 2. Policy. The ability for each AS to implement and enforce various forms of routing policy was an important design goal. One of the consequences of this was the development of the BGP attribute structure for route announcements, and allowing route filtering. 3. Cooperation under competitive circumstances. BGP was designed in large part to handle the transition from the NSFNet to a situation where the “backbone” Internet infrastructure would no longer be run by a single administrative entity. This structure implies that the routing protocol should allow ASes to make purely local decisions on how to route packets, from among any set of choices. In the old NSFNET, the backbone routers exchanged routing information over a tree topology, using a routing protocol called EGP. (While the modern use of the term EGP
LECTURE 4. INTERDOMAIN INTERNET ROUTING is as a family of exterior gateway protocols, its use in the context of NSFNET refers to the specific one used in that network. Because the backbone routing information was exchanged over a tree, the routing protocol was relatively simple. The evolution of the Internet from a singly administered backbone to its current commercial structure made the NSFNet EGP obsolete and required a more sophisticated protocol 4.3.2 The Protocol As protocols go, the operation of BGP is quite straightforward. The basic operation of BGP-the protocol state machine, the format of routing messages, and the propagation of routing updates-are all defined in the protocol standard [13]. BGP runs over TCP,on a well-known port(179). To start participating in a BGP session with another router, a router sends an OPEN message after establishing a TCP connection to it on the BGP port. After the OPEN is completed, both routers exchange their tables of all active routes(of course, applying all applicable route filtering rules). This process may take several minutes to complete, especially on sessions that have a large number of active routes After this initialization, there are two main types of messages on the BGP session. First, BGP routers send route UPDATE messages sent on the session. These updates only send any routing entries that have changed since the last update(or transmission of all active routes). There are two kinds of updates: announcements, which are changes to existing routes or new routes, and withdrawals, which are messages that inform the receiver that the named routes no longer exist. a withdrawal usually happens when some previously announced route can no longer be used(e. g, because of a failure or a change in policy Because BGP uses TCP, which provides reliable and in-order delivery, routes do not need to be periodically announced, unless they change But, in the absence of periodic routing updates, how does a router know whether the neighbor at the other end of a session is still functioning properly? One possible solution might be for BGP to run over a transport protocol that implements its own"is the peer alive"message protocol. Such messages are also called"keepalive"messages. TCP,how ever,does not implement a transport-layer"keepalive", so BGP uses its own. Each BGP session has a configurable keepalive timer, and the router guarantees that it will attempt to send at least one BGP message during that time. If there are no UPDATE messages, then the router sends the second type of message on the session: KEEPALIVE messages. The absence of a certain number BGP KEEPALIVE messages on a session causes the router to terminate that session The number of g messages depends on a configurable times called the hold timer; the specification recommends that the hold timer be at least as long as the keepalive timer duration negotiated on the session. More details about the bGP state machine may be found in [2, 13 Unlike many IGP's, BGP does not simply optimize any metrics like shortest-paths or delays. Because its goals are to provide reachability information and enable routing poli cies, its announcements do not simply announce some metric like hop-count. Rather, they have the following format: IP prefix: Attribute where for each announced IP prefix, one or more attributes are also announced. There are a substantial number of standardized attributes in bgp and we'll look at some of them in
8 LECTURE 4. INTERDOMAIN INTERNET ROUTING is as a family of exterior gateway protocols, its use in the context of NSFNET refers to the specific one used in that network.) Because the backbone routing information was exchanged over a tree, the routing protocol was relatively simple. The evolution of the Internet from a singly administered backbone to its current commercial structure made the NSFNET EGP obsolete and required a more sophisticated protocol. ! 4.3.2 The Protocol As protocols go, the operation of BGP is quite straightforward. The basic operation of BGP—the protocol state machine, the format of routing messages, and the propagation of routing updates—are all defined in the protocol standard [13]. BGP runs over TCP, on a well-known port (179). To start participating in a BGP session with another router, a router sends an OPEN message after establishing a TCP connection to it on the BGP port. After the OPEN is completed, both routers exchange their tables of all active routes (of course, applying all applicable route filtering rules). This process may take several minutes to complete, especially on sessions that have a large number of active routes. After this initialization, there are two main types of messages on the BGP session. First, BGP routers send route UPDATE messages sent on the session. These updates only send any routing entries that have changed since the last update (or transmission of all active routes). There are two kinds of updates: announcements, which are changes to existing routes or new routes, and withdrawals, which are messages that inform the receiver that the named routes no longer exist. A withdrawal usually happens when some previously announced route can no longer be used (e.g., because of a failure or a change in policy). Because BGP uses TCP, which provides reliable and in-order delivery, routes do not need to be periodically announced, unless they change. But, in the absence of periodic routing updates, how does a router know whether the neighbor at the other end of a session is still functioning properly? One possible solution might be for BGP to run over a transport protocol that implements its own “is the peer alive” message protocol. Such messages are also called “keepalive” messages. TCP, however, does not implement a transport-layer “keepalive”, so BGP uses its own. Each BGP session has a configurable keepalive timer, and the router guarantees that it will attempt to send at least one BGP message during that time. If there are no UPDATE messages, then the router sends the second type of message on the session: KEEPALIVE messages. The absence of a certain number BGP KEEPALIVE messages on a session causes the router to terminate that session. The number of missing messages depends on a configurable times called the hold timer; the specification recommends that the hold timer be at least as long as the keepalive timer duration negotiated on the session. More details about the BGP state machine may be found in [2, 13]. Unlike many IGP’s, BGP does not simply optimize any metrics like shortest-paths or delays. Because its goals are to provide reachability information and enable routing policies, its announcements do not simply announce some metric like hop-count. Rather, they have the following format: IP pref ix : Attributes where for each announced IP prefix, one or more attributes are also announced. There are a substantial number of standardized attributes in BGP, and we’ll look at some of them in
SECTION 43. BGP e BGP Figure 4-5: eBGP and iBGP. more detail in the rest of this lecture Recall that one BGP attribute has already been introduced, the LOCAL PREF attribute This attribute isn't disseminated with route announcements, but is an important attribute used locally while selecting a route for a destination. When a route is advertised from a neighboring AS, the receiving BGP router consults its configuration and may set a LOCAL PREF for this route 4.3.3 Disseminating Routes within an as: ebGP and iBGP There are two types of BGP sessions: eBGP sessions are between BGP-speaking routers in different ASes, while iBGP sessions are between BGP routers in the same AS. They serve different purposes, but use exactly the same protocol eBGP is the"standard"mode in which BGP is used; after all BGP was designed to exchange network routing information between different ASes in the Internet. eBGP ses- sions are shown in Figure 4-5, where the BGP routers implement route filtering rules and exchange a subset of their routes with routers in other ASes In general, each AS will have more than one router that participates in eBGP sessions vith neighboring ASes. During this process, each router will obtain information about some subset of all the prefixes that the entire as knows about. Each such eBGP router must disseminate routes to the external prefix to all the other routers in the as. This dis- semination must be done with care to meet two important goals 1. Loop-free forwarding. After the dissemination of eBGP learned routes, the resulting routes(and the subsequent forwarding paths of packets sent along those routes) picked by all routers should be free of deflections and forwarding loops [4, 71 2. Complete visibility. One of the goals of BGP is to allow each As to be treated as a single monolithic entity. This means that the several eBGP-speaking routes in the as must exchange external route information so that they have a complete view of all externa routes. For instance, consider Figure 4-5, and prefix D. Router r2 needs to know how
SECTION 4.3. BGP 9 R2 R1 Info about D D iBGP eBGP Figure 4-5: eBGP and iBGP. more detail in the rest of this lecture. Recall that one BGP attribute has already been introduced, the LOCAL PREF attribute. This attribute isn’t disseminated with route announcements, but is an important attribute used locally while selecting a route for a destination. When a route is advertised from a neighboring AS, the receiving BGP router consults its configuration and may set a LOCAL PREF for this route. ! 4.3.3 Disseminating Routes within an AS: eBGP and iBGP There are two types of BGP sessions: eBGP sessions are between BGP-speaking routers in different ASes, while iBGP sessions are between BGP routers in the same AS. They serve different purposes, but use exactly the same protocol. eBGP is the “standard” mode in which BGP is used; after all BGP was designed to exchange network routing information between different ASes in the Internet. eBGP sessions are shown in Figure 4-5, where the BGP routers implement route filtering rules and exchange a subset of their routes with routers in other ASes. In general, each AS will have more than one router that participates in eBGP sessions with neighboring ASes. During this process, each router will obtain information about some subset of all the prefixes that the entire AS knows about. Each such eBGP router must disseminate routes to the external prefix to all the other routers in the AS. This dissemination must be done with care to meet two important goals: 1. Loop-free forwarding. After the dissemination of eBGP learned routes, the resulting routes (and the subsequent forwarding paths of packets sent along those routes) picked by all routers should be free of deflections and forwarding loops [4, 7]. 2. Complete visibility. One of the goals of BGP is to allow each AS to be treated as a single monolithic entity. This means that the several eBGP-speaking routes in the AS must exchange external route information so that they have a complete view of all external routes. For instance, consider Figure 4-5, and prefix D. Router R2 needs to know how
LECTURE 4. INTER OUTI Figure 4-6: Small ASes establish a"full mesh"of iBGP sessions. Each circle represents a router within an AS Only eBGP-learned routes are re-advertised over iBGP sessions. to forward packets destined for D, but R2 hasn't heard a direct announcement on any of its eBGP sessions for D. By "complete visibility", we mean the following: for every external destination, each router picks the same route that it would have picked had it seen the best routes from each eBGP router in the as The dissemination of externally learned routes to routers inside an as is done over internal BGP (iBGP)sessions running in each AS An important question concerns the topology over which iBGP sessions should be run One possibility is to use an arbitrary connected graph and"flood"updates of external routes to all BGP routers in an AS Of cours, an approach based on flooding would require additional techniques to avoid routing loops. The original BGP specification solved this problem by simply setting up a full mesh of iBGP sessions(see Figure 4-6, where every eBGP router maintains an iBGP session with every other BGP router in the AS. Flooding updates is now straightforward; an eBGP router simply sends UPDATE messages to its iBGP neighbors. An iBGP router does not have to send any UPDATE messages because it does not have any eBGP sessions with a router in another AS It is important to note that iBGP is not an IGP like RIP or OSPF, and it cannot be used to et up routing state that allows packets to be forwarded correctly between internal nodes in an AS. Rather, iBGP sessions, running over TCP provide a way by which routers inside an AS can use BGP to exchange information about external routes. In fact, iBGP sessions and messages are themselves routed between the BGP routers in the as via whatever IGP One might wonder why iBGP is needed, and why one cant simply use whatever IGP is being used in the as to also send BGP updates. There are several reasons why intro ducing eBGP routes into an IGP is inconvenient. The first reason is that most IGPs don't scale as well as BGP does, and often rely on periodic routing announcements rather than incremental updates (i.e, their state machines are different). Second, IGPs usually don' implement the rich set of attributes present in BGP. To preserve all the information about routes gleaned from eBGP sessions, it is best to run BGP sessions inside an AS as well The requirement that the iBGP routers be connected via a complete mesh limits scalabil ity: a network with e eBGP routers and i other interior routers requires e(e-1)/2+ei iBGP lIt turns out that each router inside doesnt know about all the external routes to a destination Rather we will strive for each router being able to discover the best routes of the egress routers in the As for a destination
10 LECTURE 4. INTERDOMAIN INTERNET ROUTING Figure 4-6: Small ASes establish a “full mesh” of iBGP sessions. Each circle represents a router within an AS. Only eBGP-learned routes are re-advertised over iBGP sessions. to forward packets destined for D, but R2 hasn’t heard a direct announcement on any of its eBGP sessions for D. 1 By “complete visibility”, we mean the following: for every external destination, each router picks the same route that it would have picked had it seen the best routes from each eBGP router in the AS. The dissemination of externally learned routes to routers inside an AS is done over internal BGP (iBGP) sessions running in each AS. An important question concerns the topology over which iBGP sessions should be run. One possibility is to use an arbitrary connected graph and “flood” updates of external routes to all BGP routers in an AS. Of cours, an approach based on flooding would require additional techniques to avoid routing loops. The original BGP specification solved this problem by simply setting up a full mesh of iBGP sessions (see Figure 4-6, where every eBGP router maintains an iBGP session with every other BGP router in the AS. Flooding updates is now straightforward; an eBGP router simply sends UPDATE messages to its iBGP neighbors. An iBGP router does not have to send any UPDATE messages because it does not have any eBGP sessions with a router in another AS. It is important to note that iBGP is not an IGP like RIP or OSPF, and it cannot be used to set up routing state that allows packets to be forwarded correctly between internal nodes in an AS. Rather, iBGP sessions, running over TCP, provide a way by which routers inside an AS can use BGP to exchange information about external routes. In fact, iBGP sessions and messages are themselves routed between the BGP routers in the AS via whatever IGP is being used in the AS! One might wonder why iBGP is needed, and why one can’t simply use whatever IGP is being used in the AS to also send BGP updates. There are several reasons why introducing eBGP routes into an IGP is inconvenient. The first reason is that most IGPs don’t scale as well as BGP does, and often rely on periodic routing announcements rather than incremental updates (i.e., their state machines are different). Second, IGPs usually don’t implement the rich set of attributes present in BGP. To preserve all the information about routes gleaned from eBGP sessions, it is best to run BGP sessions inside an AS as well. The requirement that the iBGP routers be connected via a complete mesh limits scalability: a network with e eBGP routers and i other interior routers requires e(e − 1)/2 + ei iBGP 1It turns out that each router inside doesn’t know about all the external routes to a destination. Rather, we will strive for each router being able to discoverthe best routes of the egress routers in the AS for a destination