《信息网络协议基础》课程教学资源（学习资料）lookup-arch（IP ADDRESS LOOKUP）.pdf_大学文库

CHAPTER 2 IP ADDRESS LOOKUP 2.1 OVERVIEW The primary role of routers is to forward packets toward their final destinations. To this purpose, a router must decide for each incoming packet where to send it next, that is, finding the address of the next-hop router as well as the egress port through which the packet should be sent. This forwarding information is stored in a forwarding table that the router computes based on the information gathered by routing protocols. To consult the forwarding table, the router uses the packet’s destination address as a key – this operation is called address lookup [1]. Once the forwarding information is retrieved, the router can transfer the packet from the incoming link to the appropriate outgoing link. Classful Addressing Scheme. IPv4 IP addresses are 32 bits in length and are divided into 4 octets. Each octet has 8 bits that are separated by dots. For example, the address 10000010 01010110 00010000 01000010 corresponds in dotted-decimal notation to 130.86.16.66. The bits in an IP address are ordered as shown in Figure 2.1, where the 1st bit is the most significant bit (MSB) that lies in the leftmost position. The 32nd bit is the least significant bit (LSB) and it lies in the rightmost position. The IP address consists of two parts. The first part contains the IP addresses for networks and the second part contains the IP addresses for hosts. The network part corresponds to the first bits of the IP address, called the address prefix. We will write prefixes as bit strings of up to 32 bits in IPv4 followed by an asterisk(*). For example, the prefix 10000010 01010110* represents all the 216 addresses that begin with the bit pattern 10000010 01010110. Alternatively, prefixes can be indicated using the dotted-decimal notation, so the same prefix can be written as 130.86/16, where the number after the slash indicates the length of the prefix. High Performance Switches and Routers, by H. Jonathan Chao and Bin Liu Copyright © 2007 John Wiley & Sons, Inc. 25

26 IP ADDRESS LOOKUP MSB LSB 30 31 32 Figure 2.1 IP address bit positions Since routing occurs at the network level to locate the destination network,routers only forward packets based on network level IP addresses.Thus,all hosts attached to the network can be stored in the router's forwarding table by a single network IP address,known as address aggregation.A group of addresses are represented by prefixes.An example of a router's forwarding table is shown in Table 2.1.Each entry in the forwarding table contains a prefix,next-hop IP address,and output interface number.The forwarding information is located by searching for the prefix of the destination address. The Internet addressing architecture was first designed using an allocation scheme known as classful addressing.Classful addressing defines three different sized networks of classes: A,B,or C(Fig.2.2).The classes are based on the amount of IP addresses contained in the network partition.With the IPv4 address space of 32 bits,Class A has a network size of 8 bits and a host size of 24 bits.Class B has a network size of 16bits and a host size of 16 bits.Class C has a network size of 24 bits and a host size of 8 bits.Class D is for multicasting applications. The classful addressing scheme created very few class A networks.Their address space contains 50 percent of the total IPv4 address space (231 addresses out of a total of 232). Class B address space contains 16.384(214)networks with up to65,534 hosts per network. Class Caddress space contains 2,097.152(221)networks with up to 256 hosts per network. Classless Inter-Domain Routing(CIDR)Addressing Scheme.The evolution and growth of the Internet in recent years has proven that the classful address scheme is inflexible and wasteful.For most organizations,class C is too small while class B is too large.The three choices resulted in address space being exhausted very rapidly,even though only a small fraction of the addresses allocated were actually in use.The lack of a network class of a size that is appropriate for mid-sized organizations results in the exhaustion of the class B network address space.In order to use the address space efficiently,bundles of class C addresses were given out instead of class B addresses.This causes a massive growth of forwarding table entries. CIDR [2]was introduced to remedy the inefficiencies of classful addressing.The Inter- net Engineering Task Force (IETF)began implementing CIDR in the early 1990s [2,3]. With CIDR,IP address space is better conserved through arbitrary aggregation of network TABLE 2.1 Router's Forwarding Table Structure [1] Destination Address Prefix Next Hop IP Address Output Interface 24.40.32/20 192.41.177.148 130.86/16 192.41.177.181 6 208.12.16/20 192.41.177.241 4 208.12.21/24 192.41.177.196 167.24.103/24 192.41.177.3 4

26 IP ADDRESS LOOKUP Figure 2.1 IP address bit positions. Since routing occurs at the network level to locate the destination network, routers only forward packets based on network level IP addresses. Thus, all hosts attached to the network can be stored in the router’s forwarding table by a single network IP address, known as address aggregation. A group of addresses are represented by prefixes. An example of a router’s forwarding table is shown in Table 2.1. Each entry in the forwarding table contains a prefix, next-hop IP address, and output interface number. The forwarding information is located by searching for the prefix of the destination address. The Internet addressing architecture was first designed using an allocation scheme known as classful addressing. Classful addressing defines three different sized networks of classes: A, B, or C (Fig. 2.2). The classes are based on the amount of IP addresses contained in the network partition. With the IPv4 address space of 32 bits, Class A has a network size of 8 bits and a host size of 24 bits. Class B has a network size of 16 bits and a host size of 16 bits. Class C has a network size of 24 bits and a host size of 8 bits. Class D is for multicasting applications. The classful addressing scheme created very few class A networks. Their address space contains 50 percent of the total IPv4 address space (231 addresses out of a total of 232). Class B address space contains 16,384 (214) networks with up to 65,534 hosts per network. Class C address space contains 2,097,152 (221) networks with up to 256 hosts per network. Classless Inter-Domain Routing (CIDR) Addressing Scheme. The evolution and growth of the Internet in recent years has proven that the classful address scheme is inflexible and wasteful. For most organizations, class C is too small while class B is too large. The three choices resulted in address space being exhausted very rapidly, even though only a small fraction of the addresses allocated were actually in use. The lack of a network class of a size that is appropriate for mid-sized organizations results in the exhaustion of the class B network address space. In order to use the address space efficiently, bundles of class C addresses were given out instead of class B addresses. This causes a massive growth of forwarding table entries. CIDR [2] was introduced to remedy the inefficiencies of classful addressing. The Internet Engineering Task Force (IETF) began implementing CIDR in the early 1990s [2, 3]. With CIDR, IP address space is better conserved through arbitrary aggregation of network TABLE 2.1 Router’s Forwarding Table Structure [1] Destination Address Prefix Next Hop IP Address Output Interface 24.40.32/20 192.41.177.148 2 130.86/16 192.41.177.181 6 208.12.16/20 192.41.177.241 4 208.12.21/24 192.41.177.196 1 167.24.103/24 192.41.177.3 4

2.2 TRIE-BASED ALGORITHMS 29 Lookup Speed.The explosive growth of link bandwidth requires faster IP lookups.For example,links running at 10 Gbps can carry 31.25 million packets per second(mpps) (assuming minimum sized 40-byte IP packets). Storage Requirement.Small storage means fast memory access speed and low power consumption,which are important for cache-based software algorithms and SRAM (static RAM)-based hardware algorithms. Update Time.Currently,the Internet has a peak of a few hundred BGP(Border Gateway Protocol)updates per second.Thus,a certain algorithm should be able to perform 1k updates per second to avoid routing instabilities.These updates should interfere little with normal lookup operations. Scalability.It is expected that the size of forwarding tables will increase at a speed of 25k entries per year.Hence,there will be around 250k entries for the next five years. The ability of an algorithm to handle large forwarding tables is required. Flexibility in Implementation.Most current lookup algorithms can be implemented in either software or hardware.Some of them have the flexibility of being implemented in different ways,such as ASIC,a network processor,or a generic processor. 2.2 TRIE-BASED ALGORITHMS 2.2.1 Binary Trie A trie structure is a multi-way tree where each node contains zero or more pointers to point its child nodes,allowing the organization of prefixes on a digital basis by using the bits of prefixes to direct the branching.In the binary trie(or 1-bit trie)structure [5],each node contains two pointers,the 0-pointer and the 1-pointer. Data Structure.A node X at the level h of the trie represents the set of all addresses that begin with the sequence of h bits consisting of the string of bits labeling the path from the root to that node.Depending on the value of the (h+1)th bit,0 or 1,each pointer of the node X points to the corresponding subtree (if it exists),which represents the set of all route prefixes that have the same(h+1)bits as their first bits.An example data structure of each node (i.e.,the entry in a memory)is shown in Figure 2.6,including the next hop address (if it is a prefix node),a left pointer pointing to the left node location (with an address bit 0) and a right pointer pointing to the right node location (with an address bit 1). A prefix database is defined as a collection of all prefixes in a forwarding table.A prefix database example is shown in Figure 2.6 [6],where the prefixes are arranged in an ascending order of their lengths for easy illustration. To add a route prefix,say 10111*,simply follow the pointer to where 10111 would be in the trie.If no pointer exists for that prefix,it should be added.If the node for the prefix already exists,it needs to be marked with a label as being in the forwarding table(for example,Pi).The nodes in gray are prefix nodes.When deleting a route prefix that has no children,the node and the pointer pointing to it are deleted and the parent node is examined. If the parent node has another child or it is a gray node,it is left alone.Otherwise,that node is also deleted and its parent node is examined.The deletion process is repeated up to the trie until a node that has another child or a gray node is found

2.2 TRIE-BASED ALGORITHMS 29 Lookup Speed. The explosive growth of link bandwidth requires faster IP lookups. For example, links running at 10 Gbps can carry 31.25 million packets per second (mpps) (assuming minimum sized 40-byte IP packets). Storage Requirement. Small storage means fast memory access speed and low power consumption, which are important for cache-based software algorithms and SRAM (static RAM)-based hardware algorithms. Update Time. Currently, the Internet has a peak of a few hundred BGP (Border Gateway Protocol) updates per second. Thus, a certain algorithm should be able to perform 1k updates per second to avoid routing instabilities. These updates should interfere little with normal lookup operations. Scalability. It is expected that the size of forwarding tables will increase at a speed of 25k entries per year. Hence, there will be around 250 k entries for the next five years. The ability of an algorithm to handle large forwarding tables is required. Flexibility in Implementation. Most current lookup algorithms can be implemented in either software or hardware. Some of them have the flexibility of being implemented in different ways, such as ASIC, a network processor, or a generic processor. 2.2 TRIE-BASED ALGORITHMS 2.2.1 Binary Trie A trie structure is a multi-way tree where each node contains zero or more pointers to point its child nodes, allowing the organization of prefixes on a digital basis by using the bits of prefixes to direct the branching. In the binary trie (or 1-bit trie) structure [5], each node contains two pointers, the 0-pointer and the 1-pointer. Data Structure. A node X at the level h of the trie represents the set of all addresses that begin with the sequence of h bits consisting of the string of bits labeling the path from the root to that node. Depending on the value of the (h + 1)th bit, 0 or 1, each pointer of the node X points to the corresponding subtree (if it exists), which represents the set of all route prefixes that have the same (h + 1) bits as their first bits. An example data structure of each node (i.e., the entry in a memory) is shown in Figure 2.6, including the next hop address (if it is a prefix node), a left pointer pointing to the left node location (with an address bit 0) and a right pointer pointing to the right node location (with an address bit 1). A prefix database is defined as a collection of all prefixes in a forwarding table. A prefix database example is shown in Figure 2.6 [6], where the prefixes are arranged in an ascending order of their lengths for easy illustration. To add a route prefix, say 10111*, simply follow the pointer to where 10111 would be in the trie. If no pointer exists for that prefix, it should be added. If the node for the prefix already exists, it needs to be marked with a label as being in the forwarding table (for example, Pi). The nodes in gray are prefix nodes. When deleting a route prefix that has no children, the node and the pointer pointing to it are deleted and the parent node is examined. If the parent node has another child or it is a gray node, it is left alone. Otherwise, that node is also deleted and its parent node is examined. The deletion process is repeated up to the trie until a node that has another child or a gray node is found

30 IP ADDRESS LOOKUP Prefix database P1 P2 1$ Next hop information 00* (If a prefix node) P4 1018 Left pointer Right pointer P5 111* P6 1000* Data structure of a node in P7 11101* the binary trie P8 111001* 1000011* Figure 2.6 Data structure of a 1-bit binary trie. Route Lookup.Each IP lookup starts at the root node of the trie.Based on the value of each bit of the destination address of the packet,the lookup algorithm determines whether the left or the right node is to be visited.The next hop of the longer matching prefix found along the path is maintained while the trie is traversed.An example is shown in Figure 2.6. Suppose that a destination address 11101000 is given.The IP lookup starts at the root, traverses the path indicated by the destination address,and remembers the last time a gray node was visited.The first bit of 11101000 is 1,so we go to the right and get to the node 1*,which is a gray node,the longest prefix match so far.The 2nd-5th bits of the key are 1, 1,0,and 1,respectively.So,we turn right,right,left,and right in sequence,and come to a leaf node P7.It is a prefix node and its associated next hop information is returned. Performance.The drawback of using the binary trie structure for IP route lookup is that the number of memory accesses in the worst case is 32 for IPv4.To add a prefix to the trie,in the worst case it needs to add 32 nodes.In this case,the storing complexity is 32N.S,where N denotes the number of prefixes in the forwarding table and S denotes the memory space required for each node.In summary,the lookup complexity is O(W), so is the update complexity,where W is the maximum length of the prefix.The storage complexity is O(NW). Variants of Binary Tries.The 1-bit binary trie in Figure 2.6 can be expanded to a complete trie,where every bottom leaf node is a prefix.There will be 128 leaf nodes. The data structure will be a memory with 128 entries.Each stores a prefix and can be directly accessed by a memory lookup using the seven bits of the destination address.One drawback is that the memory size becomes too big to be practical when the address has 32 bits,requiring a memory with 232 entries. One way to avoid the use of the longest prefix match rule and still find the most specific forwarding information is to transform a given set of prefixes into a set of disjoint prefixes. Disjoint prefixes do not overlap,and thus no address prefix is itself a prefix of another.A trie representing a set of disjoint prefixes will have prefixes at the leaves but not at internal nodes.To obtain a disjoint-prefix binary trie,we simply add leaves to nodes that have only one child.These new leaves are new prefixes that inherit the forwarding information of the closest ancestor marked as a prefix.Finally,internal nodes marked as prefixes are unmarked

30 IP ADDRESS LOOKUP Figure 2.6 Data structure of a 1-bit binary trie. Route Lookup. Each IP lookup starts at the root node of the trie. Based on the value of each bit of the destination address of the packet, the lookup algorithm determines whether the left or the right node is to be visited. The next hop of the longer matching prefix found along the path is maintained while the trie is traversed. An example is shown in Figure 2.6. Suppose that a destination address 11101000 is given. The IP lookup starts at the root, traverses the path indicated by the destination address, and remembers the last time a gray node was visited. The first bit of 11101000 is 1, so we go to the right and get to the node 1*, which is a gray node, the longest prefix match so far. The 2nd–5th bits of the key are 1, 1, 0, and 1, respectively. So, we turn right, right, left, and right in sequence, and come to a leaf node P7. It is a prefix node and its associated next hop information is returned. Performance. The drawback of using the binary trie structure for IP route lookup is that the number of memory accesses in the worst case is 32 for IPv4. To add a prefix to the trie, in the worst case it needs to add 32 nodes. In this case, the storing complexity is 32N · S, where N denotes the number of prefixes in the forwarding table and S denotes the memory space required for each node. In summary, the lookup complexity is O(W), so is the update complexity, where W is the maximum length of the prefix. The storage complexity is O(NW). Variants of Binary Tries. The 1-bit binary trie in Figure 2.6 can be expanded to a complete trie, where every bottom leaf node is a prefix. There will be 128 leaf nodes. The data structure will be a memory with 128 entries. Each stores a prefix and can be directly accessed by a memory lookup using the seven bits of the destination address. One drawback is that the memory size becomes too big to be practical when the address has 32 bits, requiring a memory with 232 entries. One way to avoid the use of the longest prefix match rule and still find the most specific forwarding information is to transform a given set of prefixes into a set of disjoint prefixes. Disjoint prefixes do not overlap, and thus no address prefix is itself a prefix of another. A trie representing a set of disjoint prefixes will have prefixes at the leaves but not at internal nodes. To obtain a disjoint-prefix binary trie, we simply add leaves to nodes that have only one child. These new leaves are new prefixes that inherit the forwarding information of the closest ancestor marked as a prefix. Finally, internal nodes marked as prefixes are unmarked

2.2 TRIE-BASED ALGORITHMS 31 P2a P2b )P6a P6b )P5b○P8 ○P6c (P9 Figure 2.7 Disjoint-prefix binary trie. For example,Figure 2.7 shows the disjoint-prefix binary trie that corresponds to the trie in Figure 2.6.Prefixes P2a and P2b have inherited the forwarding information of the original prefix P2,similar to other nodes such as Pla,P5b,P6a,P6b,and P6c.Since prefixes at internal nodes are expanded or pushed down to the leaves of the trie,this technique is called 'leaf pushing'by Srinivasan and Varghese [7]. 2.2.2 Path-Compressed Trie Path compression technique was first adopted in the Patricia trees [8].A path-compressed trie is based on the observation that each internal node of the trie that does not contain a route prefix and has only one child node can be removed in order to shorten the path from the root node. Data Structure.By removing some internal nodes,the technique requires a mechanism to record which nodes are missing.A simple mechanism is to store in each node: .A number,the skip value,that indicates how many bits have been skipped on the path. A variable-length bit-string,segment,that indicates the missing bit-string from the last skip operation. The path-compressed version of the binary trie in Figure 2.6 is shown in Figure 2.8.The node structure has two more fields-skip value and segment.Note that some gray nodes have a skip value 1 or >1.For instance,for node P9,its skip value =2 and the segment is '11'.As compared to P9 in Figure 2.6,the P9 node in Figure 2.8 moved up the level by 2 and missed the examination of two bits'11'.Therefore,when we traverse the trie in Figure 2.8 and reach P9,the immediate two bits of the key need to be checked with the 2-bit segment. Route Lookup.Suppose that a destination address 11101000(i.e.,the key)is given.The route lookup starts at the root and traverses the path based on the destination address bits. It also records the last gray node that was visited.The first bit of 11101000 is 1,so we go to the right and get to the prefix node P2.As the second bit of the key is 1,we go right again

2.2 TRIE-BASED ALGORITHMS 31 Figure 2.7 Disjoint-prefix binary trie. For example, Figure 2.7 shows the disjoint-prefix binary trie that corresponds to the trie in Figure 2.6. Prefixes P2a and P2b have inherited the forwarding information of the original prefix P2, similar to other nodes such as P1a, P5b, P6a, P6b, and P6c. Since prefixes at internal nodes are expanded or pushed down to the leaves of the trie, this technique is called ‘leaf pushing’ by Srinivasan and Varghese [7]. 2.2.2 Path-Compressed Trie Path compression technique was first adopted in the Patricia trees [8]. A path-compressed trie is based on the observation that each internal node of the trie that does not contain a route prefix and has only one child node can be removed in order to shorten the path from the root node. Data Structure. By removing some internal nodes, the technique requires a mechanism to record which nodes are missing. A simple mechanism is to store in each node: • A number, the skip value, that indicates how many bits have been skipped on the path. • A variable-length bit-string, segment, that indicates the missing bit-string from the last skip operation. The path-compressed version of the binary trie in Figure 2.6 is shown in Figure 2.8. The node structure has two more fields – skip value and segment. Note that some gray nodes have a skip value = 1 or >1. For instance, for node P9, its skip value = 2 and the segment is ‘11’. As compared to P9 in Figure 2.6, the P9 node in Figure 2.8 moved up the level by 2 and missed the examination of two bits ‘11’. Therefore, when we traverse the trie in Figure 2.8 and reach P9, the immediate two bits of the key need to be checked with the 2-bit segment. Route Lookup. Suppose that a destination address 11101000 (i.e., the key) is given. The route lookup starts at the root and traverses the path based on the destination address bits. It also records the last gray node that was visited. The first bit of 11101000 is 1, so we go to the right and get to the prefix node P2. As the second bit of the key is 1, we go right again