CHAPTER 2 IP ADDRESS LOOKUP 2.1 OVERVIEW The primary role of routers is to forward packets toward their final destinations.To this purpose,a router must decide for each incoming packet where to send it next,that is,finding the address of the next-hop router as well as the egress port through which the packet should be sent.This forwarding information is stored in a forwarding table that the router computes based on the information gathered by routing protocols.To consult the forwarding table, the router uses the packet's destination address as a key-this operation is called address lookup [1].Once the forwarding information is retrieved,the router can transfer the packet from the incoming link to the appropriate outgoing link. Classful Addressing Scheme.IPv4 IP addresses are 32 bits in length and are divided into 4 octets.Each octet has 8 bits that are separated by dots.For example,the address 10000010 01010110 00010000 01000010 corresponds in dotted-decimal notation to 130.86.16.66.The bits in an IP address are ordered as shown in Figure 2.1,where the 1st bit is the most significant bit(MSB)that lies in the leftmost position.The 32nd bit is the least significant bit(LSB)and it lies in the rightmost position. The IP address consists of two parts.The first part contains the IP addresses for networks and the second part contains the IP addresses for hosts.The network part corresponds to the first bits of the IP address,called the address prefix.We will write prefixes as bit strings of up to 32 bits in IPv4 followed by an asterisk(*).For example,the prefix 1000001001010110* represents all the 216 addresses that begin with the bit pattern 1000001001010110.Alter- natively,prefixes can be indicated using the dotted-decimal notation,so the same prefix can be written as 130.86/16,where the number after the slash indicates the length of the prefix. High Performance Switches and Routers.by H.Jonathan Chao and Bin Liu Copyright 2007 John Wiley Sons,Inc. 25
CHAPTER 2 IP ADDRESS LOOKUP 2.1 OVERVIEW The primary role of routers is to forward packets toward their final destinations. To this purpose, a router must decide for each incoming packet where to send it next, that is, finding the address of the next-hop router as well as the egress port through which the packet should be sent. This forwarding information is stored in a forwarding table that the router computes based on the information gathered by routing protocols. To consult the forwarding table, the router uses the packet’s destination address as a key – this operation is called address lookup [1]. Once the forwarding information is retrieved, the router can transfer the packet from the incoming link to the appropriate outgoing link. Classful Addressing Scheme. IPv4 IP addresses are 32 bits in length and are divided into 4 octets. Each octet has 8 bits that are separated by dots. For example, the address 10000010 01010110 00010000 01000010 corresponds in dotted-decimal notation to 130.86.16.66. The bits in an IP address are ordered as shown in Figure 2.1, where the 1st bit is the most significant bit (MSB) that lies in the leftmost position. The 32nd bit is the least significant bit (LSB) and it lies in the rightmost position. The IP address consists of two parts. The first part contains the IP addresses for networks and the second part contains the IP addresses for hosts. The network part corresponds to the first bits of the IP address, called the address prefix. We will write prefixes as bit strings of up to 32 bits in IPv4 followed by an asterisk(*). For example, the prefix 10000010 01010110* represents all the 216 addresses that begin with the bit pattern 10000010 01010110. Alternatively, prefixes can be indicated using the dotted-decimal notation, so the same prefix can be written as 130.86/16, where the number after the slash indicates the length of the prefix. High Performance Switches and Routers, by H. Jonathan Chao and Bin Liu Copyright © 2007 John Wiley & Sons, Inc. 25
26 IP ADDRESS LOOKUP MSB LSB 30 31 32 Figure 2.1 IP address bit positions Since routing occurs at the network level to locate the destination network,routers only forward packets based on network level IP addresses.Thus,all hosts attached to the network can be stored in the router's forwarding table by a single network IP address,known as address aggregation.A group of addresses are represented by prefixes.An example of a router's forwarding table is shown in Table 2.1.Each entry in the forwarding table contains a prefix,next-hop IP address,and output interface number.The forwarding information is located by searching for the prefix of the destination address. The Internet addressing architecture was first designed using an allocation scheme known as classful addressing.Classful addressing defines three different sized networks of classes: A,B,or C(Fig.2.2).The classes are based on the amount of IP addresses contained in the network partition.With the IPv4 address space of 32 bits,Class A has a network size of 8 bits and a host size of 24 bits.Class B has a network size of 16bits and a host size of 16 bits.Class C has a network size of 24 bits and a host size of 8 bits.Class D is for multicasting applications. The classful addressing scheme created very few class A networks.Their address space contains 50 percent of the total IPv4 address space (231 addresses out of a total of 232). Class B address space contains 16.384(214)networks with up to65,534 hosts per network. Class Caddress space contains 2,097.152(221)networks with up to 256 hosts per network. Classless Inter-Domain Routing(CIDR)Addressing Scheme.The evolution and growth of the Internet in recent years has proven that the classful address scheme is inflexible and wasteful.For most organizations,class C is too small while class B is too large.The three choices resulted in address space being exhausted very rapidly,even though only a small fraction of the addresses allocated were actually in use.The lack of a network class of a size that is appropriate for mid-sized organizations results in the exhaustion of the class B network address space.In order to use the address space efficiently,bundles of class C addresses were given out instead of class B addresses.This causes a massive growth of forwarding table entries. CIDR [2]was introduced to remedy the inefficiencies of classful addressing.The Inter- net Engineering Task Force (IETF)began implementing CIDR in the early 1990s [2,3]. With CIDR,IP address space is better conserved through arbitrary aggregation of network TABLE 2.1 Router's Forwarding Table Structure [1] Destination Address Prefix Next Hop IP Address Output Interface 24.40.32/20 192.41.177.148 130.86/16 192.41.177.181 6 208.12.16/20 192.41.177.241 4 208.12.21/24 192.41.177.196 167.24.103/24 192.41.177.3 4
26 IP ADDRESS LOOKUP Figure 2.1 IP address bit positions. Since routing occurs at the network level to locate the destination network, routers only forward packets based on network level IP addresses. Thus, all hosts attached to the network can be stored in the router’s forwarding table by a single network IP address, known as address aggregation. A group of addresses are represented by prefixes. An example of a router’s forwarding table is shown in Table 2.1. Each entry in the forwarding table contains a prefix, next-hop IP address, and output interface number. The forwarding information is located by searching for the prefix of the destination address. The Internet addressing architecture was first designed using an allocation scheme known as classful addressing. Classful addressing defines three different sized networks of classes: A, B, or C (Fig. 2.2). The classes are based on the amount of IP addresses contained in the network partition. With the IPv4 address space of 32 bits, Class A has a network size of 8 bits and a host size of 24 bits. Class B has a network size of 16 bits and a host size of 16 bits. Class C has a network size of 24 bits and a host size of 8 bits. Class D is for multicasting applications. The classful addressing scheme created very few class A networks. Their address space contains 50 percent of the total IPv4 address space (231 addresses out of a total of 232). Class B address space contains 16,384 (214) networks with up to 65,534 hosts per network. Class C address space contains 2,097,152 (221) networks with up to 256 hosts per network. Classless Inter-Domain Routing (CIDR) Addressing Scheme. The evolution and growth of the Internet in recent years has proven that the classful address scheme is inflexible and wasteful. For most organizations, class C is too small while class B is too large. The three choices resulted in address space being exhausted very rapidly, even though only a small fraction of the addresses allocated were actually in use. The lack of a network class of a size that is appropriate for mid-sized organizations results in the exhaustion of the class B network address space. In order to use the address space efficiently, bundles of class C addresses were given out instead of class B addresses. This causes a massive growth of forwarding table entries. CIDR [2] was introduced to remedy the inefficiencies of classful addressing. The Internet Engineering Task Force (IETF) began implementing CIDR in the early 1990s [2, 3]. With CIDR, IP address space is better conserved through arbitrary aggregation of network TABLE 2.1 Router’s Forwarding Table Structure [1] Destination Address Prefix Next Hop IP Address Output Interface 24.40.32/20 192.41.177.148 2 130.86/16 192.41.177.181 6 208.12.16/20 192.41.177.241 4 208.12.21/24 192.41.177.196 1 167.24.103/24 192.41.177.3 4
2.1 OVERVIEW 27 ClassA 0 Network Host 14 Class B 10 Network Host 21 Class C 110 Network Host 28 Class D 1110 Multicast network Figure 2.2 Classful addresses [1]. addresses rather than being limited to 8,16,or 24 bits in length for the network part.This type of granularity provides an organization with more precise matches for IP address space requirements.The growth of forwarding table entries is also slowed by allowing address aggregation to occur at several levels within the heirarchy of the Internet's topology.Back- bone routers can now maintain the forwarding information at the level of the arbitrary aggregates of networks,instead of at the network level only. For example,consider the networks represented by the network numbers from 208.12.16/24 through 208.12.31/24(see Fig.2.3)and in arouter all these network addresses are reachable through the same service provider.The leftmost 20 bits of all the addresses in this range are the same (11010000 00001100 0001).Thus,these 16 networks can be aggregated into one 'supemetwork'represented by the 20-bit prefix,which in decimal notation gives 208.12.16/20.Indicating the prefix length is necessary in decimal notation, because the same value may be associated with prefixes of different lengths;for instance, 208.12.16/20(11010000000011000001*)is different from208.12.16/22(11010000 00001100000100*). Address aggregation does not reduce entries in the router's forwarding table for all cases. Consider the scenario where a customer owns the network 208.12.21/24 and changes its service provider,but does not want to renumber its network.Now,all the networks from 208.12.16/24 through 208.12.31/24 can be reached through the same service provider, except for the network 208.12.21/24 (see Fig.2.4).We cannot perform aggregation as before,and instead of only one entry,16 entries need to be stored in the forwarding table. One solution is aggregating in spite of the exception networks and additionally storing
2.1 OVERVIEW 27 Figure 2.2 Classful addresses [1]. addresses rather than being limited to 8, 16, or 24 bits in length for the network part. This type of granularity provides an organization with more precise matches for IP address space requirements. The growth of forwarding table entries is also slowed by allowing address aggregation to occur at several levels within the heirarchy of the Internet’s topology. Backbone routers can now maintain the forwarding information at the level of the arbitrary aggregates of networks, instead of at the network level only. For example, consider the networks represented by the network numbers from 208.12.16/24 through 208.12.31/24 (see Fig. 2.3) and in a router all these network addresses are reachable through the same service provider. The leftmost 20 bits of all the addresses in this range are the same (11010000 00001100 0001). Thus, these 16 networks can be aggregated into one ‘supernetwork’ represented by the 20-bit prefix, which in decimal notation gives 208.12.16/20. Indicating the prefix length is necessary in decimal notation, because the same value may be associated with prefixes of different lengths; for instance, 208.12.16/20 (11010000 00001100 0001*) is different from 208.12.16/22 (11010000 00001100 000100*). Address aggregation does not reduce entries in the router’s forwarding table for all cases. Consider the scenario where a customer owns the network 208.12.21/24 and changes its service provider, but does not want to renumber its network. Now, all the networks from 208.12.16/24 through 208.12.31/24 can be reached through the same service provider, except for the network 208.12.21/24 (see Fig. 2.4). We cannot perform aggregation as before, and instead of only one entry, 16 entries need to be stored in the forwarding table. One solution is aggregating in spite of the exception networks and additionally storing
28 IP ADDRESS LOOKUP 208.12.16/24 110100000000110000010000* 208.12.21/24 110100000000110000010101* .. 208.12.31/24 110100000000110000011111* 208.12.16/20 11010000000011000001* Figure 2.3 Prefix aggregation [1]. 208.12.16/24 208.12.21/24 208.12.31/24 Total IPv4 address space 232-1 Figure 2.4 Prefix ranges [1]. entries for the exception networks.In this example,this will result in only two entries in the forwarding table:208.12.16/20 and 208.12.21/24 (see Fig.2.5 and Table 2.1).However, now some addresses will match both entries because of the prefixes overlap.In order to always make the correct forwarding decision,routers need to do more than search for a prefix that matches.Since exceptions in the aggregations may exist,a router must find the most specific match,which is the longest matching prefix.In summary,the address lookup problem in routers requires searching the forwarding table for the longest prefix that matches the destination address of a packet. Obviously,the longest prefix match is harder than the exact match used for class-based addressing because the destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix.Hence,we need to search among the space of all prefix lengths,as well as the space of all prefixes of a given length.Many algorithms have been proposed in recent years regarding the longest prefix match.This chapter provides a survey of these techniques.But before that,we introduce some performance metrics [4]for the comparison of these lookup algorithms. 208.12.2124 208.12.16/20 Total IPv4 address space 0 232-1 These addresses match both prefixes Figure 2.5 Exception prefix [1]
28 IP ADDRESS LOOKUP Figure 2.3 Prefix aggregation [1]. Figure 2.4 Prefix ranges [1]. entries for the exception networks. In this example, this will result in only two entries in the forwarding table: 208.12.16/20 and 208.12.21/24 (see Fig. 2.5 and Table 2.1). However, now some addresses will match both entries because of the prefixes overlap. In order to always make the correct forwarding decision, routers need to do more than search for a prefix that matches. Since exceptions in the aggregations may exist, a router must find the most specific match, which is the longest matching prefix. In summary, the address lookup problem in routers requires searching the forwarding table for the longest prefix that matches the destination address of a packet. Obviously, the longest prefix match is harder than the exact match used for class-based addressing because the destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix. Hence, we need to search among the space of all prefix lengths, as well as the space of all prefixes of a given length. Many algorithms have been proposed in recent years regarding the longest prefix match. This chapter provides a survey of these techniques. But before that, we introduce some performance metrics [4] for the comparison of these lookup algorithms. Figure 2.5 Exception prefix [1]
2.2 TRIE-BASED ALGORITHMS 29 Lookup Speed.The explosive growth of link bandwidth requires faster IP lookups.For example,links running at 10 Gbps can carry 31.25 million packets per second(mpps) (assuming minimum sized 40-byte IP packets). Storage Requirement.Small storage means fast memory access speed and low power consumption,which are important for cache-based software algorithms and SRAM (static RAM)-based hardware algorithms. Update Time.Currently,the Internet has a peak of a few hundred BGP(Border Gateway Protocol)updates per second.Thus,a certain algorithm should be able to perform 1k updates per second to avoid routing instabilities.These updates should interfere little with normal lookup operations. Scalability.It is expected that the size of forwarding tables will increase at a speed of 25k entries per year.Hence,there will be around 250k entries for the next five years. The ability of an algorithm to handle large forwarding tables is required. Flexibility in Implementation.Most current lookup algorithms can be implemented in either software or hardware.Some of them have the flexibility of being implemented in different ways,such as ASIC,a network processor,or a generic processor. 2.2 TRIE-BASED ALGORITHMS 2.2.1 Binary Trie A trie structure is a multi-way tree where each node contains zero or more pointers to point its child nodes,allowing the organization of prefixes on a digital basis by using the bits of prefixes to direct the branching.In the binary trie(or 1-bit trie)structure [5],each node contains two pointers,the 0-pointer and the 1-pointer. Data Structure.A node X at the level h of the trie represents the set of all addresses that begin with the sequence of h bits consisting of the string of bits labeling the path from the root to that node.Depending on the value of the (h+1)th bit,0 or 1,each pointer of the node X points to the corresponding subtree (if it exists),which represents the set of all route prefixes that have the same(h+1)bits as their first bits.An example data structure of each node (i.e.,the entry in a memory)is shown in Figure 2.6,including the next hop address (if it is a prefix node),a left pointer pointing to the left node location (with an address bit 0) and a right pointer pointing to the right node location (with an address bit 1). A prefix database is defined as a collection of all prefixes in a forwarding table.A prefix database example is shown in Figure 2.6 [6],where the prefixes are arranged in an ascending order of their lengths for easy illustration. To add a route prefix,say 10111*,simply follow the pointer to where 10111 would be in the trie.If no pointer exists for that prefix,it should be added.If the node for the prefix already exists,it needs to be marked with a label as being in the forwarding table(for example,Pi).The nodes in gray are prefix nodes.When deleting a route prefix that has no children,the node and the pointer pointing to it are deleted and the parent node is examined. If the parent node has another child or it is a gray node,it is left alone.Otherwise,that node is also deleted and its parent node is examined.The deletion process is repeated up to the trie until a node that has another child or a gray node is found
2.2 TRIE-BASED ALGORITHMS 29 Lookup Speed. The explosive growth of link bandwidth requires faster IP lookups. For example, links running at 10 Gbps can carry 31.25 million packets per second (mpps) (assuming minimum sized 40-byte IP packets). Storage Requirement. Small storage means fast memory access speed and low power consumption, which are important for cache-based software algorithms and SRAM (static RAM)-based hardware algorithms. Update Time. Currently, the Internet has a peak of a few hundred BGP (Border Gateway Protocol) updates per second. Thus, a certain algorithm should be able to perform 1k updates per second to avoid routing instabilities. These updates should interfere little with normal lookup operations. Scalability. It is expected that the size of forwarding tables will increase at a speed of 25k entries per year. Hence, there will be around 250 k entries for the next five years. The ability of an algorithm to handle large forwarding tables is required. Flexibility in Implementation. Most current lookup algorithms can be implemented in either software or hardware. Some of them have the flexibility of being implemented in different ways, such as ASIC, a network processor, or a generic processor. 2.2 TRIE-BASED ALGORITHMS 2.2.1 Binary Trie A trie structure is a multi-way tree where each node contains zero or more pointers to point its child nodes, allowing the organization of prefixes on a digital basis by using the bits of prefixes to direct the branching. In the binary trie (or 1-bit trie) structure [5], each node contains two pointers, the 0-pointer and the 1-pointer. Data Structure. A node X at the level h of the trie represents the set of all addresses that begin with the sequence of h bits consisting of the string of bits labeling the path from the root to that node. Depending on the value of the (h + 1)th bit, 0 or 1, each pointer of the node X points to the corresponding subtree (if it exists), which represents the set of all route prefixes that have the same (h + 1) bits as their first bits. An example data structure of each node (i.e., the entry in a memory) is shown in Figure 2.6, including the next hop address (if it is a prefix node), a left pointer pointing to the left node location (with an address bit 0) and a right pointer pointing to the right node location (with an address bit 1). A prefix database is defined as a collection of all prefixes in a forwarding table. A prefix database example is shown in Figure 2.6 [6], where the prefixes are arranged in an ascending order of their lengths for easy illustration. To add a route prefix, say 10111*, simply follow the pointer to where 10111 would be in the trie. If no pointer exists for that prefix, it should be added. If the node for the prefix already exists, it needs to be marked with a label as being in the forwarding table (for example, Pi). The nodes in gray are prefix nodes. When deleting a route prefix that has no children, the node and the pointer pointing to it are deleted and the parent node is examined. If the parent node has another child or it is a gray node, it is left alone. Otherwise, that node is also deleted and its parent node is examined. The deletion process is repeated up to the trie until a node that has another child or a gray node is found
30 IP ADDRESS LOOKUP Prefix database P1 P2 1$ Next hop information 00* (If a prefix node) P4 1018 Left pointer Right pointer P5 111* P6 1000* Data structure of a node in P7 11101* the binary trie P8 111001* 1000011* Figure 2.6 Data structure of a 1-bit binary trie. Route Lookup.Each IP lookup starts at the root node of the trie.Based on the value of each bit of the destination address of the packet,the lookup algorithm determines whether the left or the right node is to be visited.The next hop of the longer matching prefix found along the path is maintained while the trie is traversed.An example is shown in Figure 2.6. Suppose that a destination address 11101000 is given.The IP lookup starts at the root, traverses the path indicated by the destination address,and remembers the last time a gray node was visited.The first bit of 11101000 is 1,so we go to the right and get to the node 1*,which is a gray node,the longest prefix match so far.The 2nd-5th bits of the key are 1, 1,0,and 1,respectively.So,we turn right,right,left,and right in sequence,and come to a leaf node P7.It is a prefix node and its associated next hop information is returned. Performance.The drawback of using the binary trie structure for IP route lookup is that the number of memory accesses in the worst case is 32 for IPv4.To add a prefix to the trie,in the worst case it needs to add 32 nodes.In this case,the storing complexity is 32N.S,where N denotes the number of prefixes in the forwarding table and S denotes the memory space required for each node.In summary,the lookup complexity is O(W), so is the update complexity,where W is the maximum length of the prefix.The storage complexity is O(NW). Variants of Binary Tries.The 1-bit binary trie in Figure 2.6 can be expanded to a complete trie,where every bottom leaf node is a prefix.There will be 128 leaf nodes. The data structure will be a memory with 128 entries.Each stores a prefix and can be directly accessed by a memory lookup using the seven bits of the destination address.One drawback is that the memory size becomes too big to be practical when the address has 32 bits,requiring a memory with 232 entries. One way to avoid the use of the longest prefix match rule and still find the most specific forwarding information is to transform a given set of prefixes into a set of disjoint prefixes. Disjoint prefixes do not overlap,and thus no address prefix is itself a prefix of another.A trie representing a set of disjoint prefixes will have prefixes at the leaves but not at internal nodes.To obtain a disjoint-prefix binary trie,we simply add leaves to nodes that have only one child.These new leaves are new prefixes that inherit the forwarding information of the closest ancestor marked as a prefix.Finally,internal nodes marked as prefixes are unmarked
30 IP ADDRESS LOOKUP Figure 2.6 Data structure of a 1-bit binary trie. Route Lookup. Each IP lookup starts at the root node of the trie. Based on the value of each bit of the destination address of the packet, the lookup algorithm determines whether the left or the right node is to be visited. The next hop of the longer matching prefix found along the path is maintained while the trie is traversed. An example is shown in Figure 2.6. Suppose that a destination address 11101000 is given. The IP lookup starts at the root, traverses the path indicated by the destination address, and remembers the last time a gray node was visited. The first bit of 11101000 is 1, so we go to the right and get to the node 1*, which is a gray node, the longest prefix match so far. The 2nd–5th bits of the key are 1, 1, 0, and 1, respectively. So, we turn right, right, left, and right in sequence, and come to a leaf node P7. It is a prefix node and its associated next hop information is returned. Performance. The drawback of using the binary trie structure for IP route lookup is that the number of memory accesses in the worst case is 32 for IPv4. To add a prefix to the trie, in the worst case it needs to add 32 nodes. In this case, the storing complexity is 32N · S, where N denotes the number of prefixes in the forwarding table and S denotes the memory space required for each node. In summary, the lookup complexity is O(W), so is the update complexity, where W is the maximum length of the prefix. The storage complexity is O(NW). Variants of Binary Tries. The 1-bit binary trie in Figure 2.6 can be expanded to a complete trie, where every bottom leaf node is a prefix. There will be 128 leaf nodes. The data structure will be a memory with 128 entries. Each stores a prefix and can be directly accessed by a memory lookup using the seven bits of the destination address. One drawback is that the memory size becomes too big to be practical when the address has 32 bits, requiring a memory with 232 entries. One way to avoid the use of the longest prefix match rule and still find the most specific forwarding information is to transform a given set of prefixes into a set of disjoint prefixes. Disjoint prefixes do not overlap, and thus no address prefix is itself a prefix of another. A trie representing a set of disjoint prefixes will have prefixes at the leaves but not at internal nodes. To obtain a disjoint-prefix binary trie, we simply add leaves to nodes that have only one child. These new leaves are new prefixes that inherit the forwarding information of the closest ancestor marked as a prefix. Finally, internal nodes marked as prefixes are unmarked
2.2 TRIE-BASED ALGORITHMS 31 P2a P2b )P6a P6b )P5b○P8 ○P6c (P9 Figure 2.7 Disjoint-prefix binary trie. For example,Figure 2.7 shows the disjoint-prefix binary trie that corresponds to the trie in Figure 2.6.Prefixes P2a and P2b have inherited the forwarding information of the original prefix P2,similar to other nodes such as Pla,P5b,P6a,P6b,and P6c.Since prefixes at internal nodes are expanded or pushed down to the leaves of the trie,this technique is called 'leaf pushing'by Srinivasan and Varghese [7]. 2.2.2 Path-Compressed Trie Path compression technique was first adopted in the Patricia trees [8].A path-compressed trie is based on the observation that each internal node of the trie that does not contain a route prefix and has only one child node can be removed in order to shorten the path from the root node. Data Structure.By removing some internal nodes,the technique requires a mechanism to record which nodes are missing.A simple mechanism is to store in each node: .A number,the skip value,that indicates how many bits have been skipped on the path. A variable-length bit-string,segment,that indicates the missing bit-string from the last skip operation. The path-compressed version of the binary trie in Figure 2.6 is shown in Figure 2.8.The node structure has two more fields-skip value and segment.Note that some gray nodes have a skip value 1 or >1.For instance,for node P9,its skip value =2 and the segment is '11'.As compared to P9 in Figure 2.6,the P9 node in Figure 2.8 moved up the level by 2 and missed the examination of two bits'11'.Therefore,when we traverse the trie in Figure 2.8 and reach P9,the immediate two bits of the key need to be checked with the 2-bit segment. Route Lookup.Suppose that a destination address 11101000(i.e.,the key)is given.The route lookup starts at the root and traverses the path based on the destination address bits. It also records the last gray node that was visited.The first bit of 11101000 is 1,so we go to the right and get to the prefix node P2.As the second bit of the key is 1,we go right again
2.2 TRIE-BASED ALGORITHMS 31 Figure 2.7 Disjoint-prefix binary trie. For example, Figure 2.7 shows the disjoint-prefix binary trie that corresponds to the trie in Figure 2.6. Prefixes P2a and P2b have inherited the forwarding information of the original prefix P2, similar to other nodes such as P1a, P5b, P6a, P6b, and P6c. Since prefixes at internal nodes are expanded or pushed down to the leaves of the trie, this technique is called ‘leaf pushing’ by Srinivasan and Varghese [7]. 2.2.2 Path-Compressed Trie Path compression technique was first adopted in the Patricia trees [8]. A path-compressed trie is based on the observation that each internal node of the trie that does not contain a route prefix and has only one child node can be removed in order to shorten the path from the root node. Data Structure. By removing some internal nodes, the technique requires a mechanism to record which nodes are missing. A simple mechanism is to store in each node: • A number, the skip value, that indicates how many bits have been skipped on the path. • A variable-length bit-string, segment, that indicates the missing bit-string from the last skip operation. The path-compressed version of the binary trie in Figure 2.6 is shown in Figure 2.8. The node structure has two more fields – skip value and segment. Note that some gray nodes have a skip value = 1 or >1. For instance, for node P9, its skip value = 2 and the segment is ‘11’. As compared to P9 in Figure 2.6, the P9 node in Figure 2.8 moved up the level by 2 and missed the examination of two bits ‘11’. Therefore, when we traverse the trie in Figure 2.8 and reach P9, the immediate two bits of the key need to be checked with the 2-bit segment. Route Lookup. Suppose that a destination address 11101000 (i.e., the key) is given. The route lookup starts at the root and traverses the path based on the destination address bits. It also records the last gray node that was visited. The first bit of 11101000 is 1, so we go to the right and get to the prefix node P2. As the second bit of the key is 1, we go right again
32 IP ADDRESS LOOKUP Prefix database PI P2 1* P3 00* 0 P4 101* P2 P5 111* P3 0 P6 1000* Skip=1 P5 P7 11101* 0 Skip 1 P8 111001* 0 P9 1000011# P4 0 P6 Skip 1 0 Next hop information (If a prefix node) Skip value Segment (p9 P8 P7 Skip=2 Left pointer Right pointer Skip=1 Data structure of a node in the path-compressed trie Figure 2.8 Path-compressed trie example. and reach node P5.This node has a skip value of 1,meaning that a node is skipped on the path.We then use the 3rd bits of the key to compare with the segment field'1'(to verify we have arrived at the correct node in the path).If a match is found,it indicates that we have arrived at P5 correctly.As the 4th bit of the key is 0,we tumn left;no skip value is found so we move on.With the 5th bit a 1,we again turn right and get to node P7.Here,we reach a leaf and no skip value is found.So,the lookup stops here,and the next hop information corresponding to P7 is returned. Performance.Path compression reduces the height of a sparse binary trie.When the tree is full and there is no compression possible,a path-compressed trie looks the same as a regular binary trie.Thus,its lookup and update complexity (the worst case)is the same as a binary trie,O(W).Considering a path-compressed trie as a full binary trie with N leaves, there can be N-1 internal nodes between the root and each leaf node (including the root node),as shown in Figure 2.9.Since the path can be significantly compressed to reduce the internal nodes,the space complexity becomes O(N),independent of W. 7 internal nodes 8 prefix nodes Figure 2.9 Example of path-compressed trie with N leaves
32 IP ADDRESS LOOKUP Figure 2.8 Path-compressed trie example. and reach node P5. This node has a skip value of 1, meaning that a node is skipped on the path. We then use the 3rd bits of the key to compare with the segment field ‘1’ (to verify we have arrived at the correct node in the path). If a match is found, it indicates that we have arrived at P5 correctly. As the 4th bit of the key is 0, we turn left; no skip value is found so we move on. With the 5th bit a 1, we again turn right and get to node P7. Here, we reach a leaf and no skip value is found. So, the lookup stops here, and the next hop information corresponding to P7 is returned. Performance. Path compression reduces the height of a sparse binary trie. When the tree is full and there is no compression possible, a path-compressed trie looks the same as a regular binary trie. Thus, its lookup and update complexity (the worst case) is the same as a binary trie, O(W). Considering a path-compressed trie as a full binary trie with N leaves, there can be N − 1 internal nodes between the root and each leaf node (including the root node), as shown in Figure 2.9. Since the path can be significantly compressed to reduce the internal nodes, the space complexity becomes O(N), independent of W. Figure 2.9 Example of path-compressed trie with N leaves
2.2 TRIE-BASED ALGORITHMS 33 2.2.3 Multi-Bit Trie The lookup performance can be improved by using a multi-bit trie structure [7].The multi- bit trie structure examines several bits at a time,called the lookup stride,while the standard binary trie only examines one bit at a time. Data Structure.A multi-bit trie example is shown in Figure 2.10.Its prefix database is the same as the one in Figure 2.6.Suppose we examine the destination address three bits at a time,that is,the lookup stride is 3.Then a problem arises for the prefixes like P2 =1* that do not fit evenly in multiples of three bits.One solution is to expand a prefix like 1* into all possible 3 bit extensions (100,101,110,and 111).However,prefixes like P4 =101 and P5 =111 are selected because they have longer length matches than those of expanded prefixes of P2.In other words,prefixes whose lengths do not fit into the stride length are expanded into a number of prefixes that fit into the stride length.However,expanded prefixes that collide with an existing longer length prefix are discarded. Figure 2.10 shows an expanded trie with a fixed stride length of 3 (i.e.,each trie node examines three bits).Notice that each trie node has a record of eight entries and each has two fields:one for the next hop information of a prefix and one for a pointer that points to the next child node.Consider,for example,entry 100 in the root node.It contains a prefix(P2=100)and a pointer to the trie node containing P6.The P6 pointer also points to another trie containing P9. Node I Root node Prefix Ptr Prefix database Prefix Ptr 000 Node 3 P6 P1* Prefix Ptr 000P3 001 P6 P21多 =000 P300* 001P3 010 P6 001 P4101* 010 PI 011 P6 P5111* 010 P61000 011P1 100 011 P711101* 100P2 101 P8111001 100P9 P91000011* 101 P4 110 101 110 P2 P9 111 110 111 P5 P9 111 P9 Node 2 Prefix Ptr 000 001 P8 010 P7 011 P7 100 101 110 111 Figure 2.10 Multi-bit trie structure example
2.2 TRIE-BASED ALGORITHMS 33 2.2.3 Multi-Bit Trie The lookup performance can be improved by using a multi-bit trie structure [7]. The multibit trie structure examines several bits at a time, called the lookup stride, while the standard binary trie only examines one bit at a time. Data Structure. A multi-bit trie example is shown in Figure 2.10. Its prefix database is the same as the one in Figure 2.6. Suppose we examine the destination address three bits at a time, that is, the lookup stride is 3. Then a problem arises for the prefixes like P2 = 1∗ that do not fit evenly in multiples of three bits. One solution is to expand a prefix like 1* into all possible 3 bit extensions (100, 101, 110, and 111). However, prefixes like P4 = 101 and P5 = 111 are selected because they have longer length matches than those of expanded prefixes of P2. In other words, prefixes whose lengths do not fit into the stride length are expanded into a number of prefixes that fit into the stride length. However, expanded prefixes that collide with an existing longer length prefix are discarded. Figure 2.10 shows an expanded trie with a fixed stride length of 3 (i.e., each trie node examines three bits). Notice that each trie node has a record of eight entries and each has two fields: one for the next hop information of a prefix and one for a pointer that points to the next child node. Consider, for example, entry 100 in the root node. It contains a prefix (P2 = 100) and a pointer to the trie node containing P6. The P6 pointer also points to another trie containing P9. Figure 2.10 Multi-bit trie structure example
34 IP ADDRESS LOOKUP Node I Prefix/Ptr Root node Prefix/Ptr 000 P6 000 P3 001 Ptr3. Node 3 001 P3 010 P6 Prefix/Ptr 010 P1 011 P6 000 P6 011 P1 100 P2 001 P6 100 Ptrl 101 P2 010 P6 101 P4 110 P2 011 P6 110 P2 111 P2 100 P9 111 Pr2、 101 P9 Node 2 Prefix/Ptr 110 P9 000 P5 111 P9 001 P8 010 P7 011 P7 100 P5 101 P5 110 P5 111 P5 Figure 2.11 Multi-bit trie structure example with each entry a prefix or a pointer to save memory space. Route Lookup.Let us use the example in Figure 2.10 and again suppose that the desti- nation address is 11101000.The IP lookup starts at the root and traverses the path using three address bits at a time while remembering the last prefix that was visited.The first three bits of the key 111 are used as the offset address in the root node to find if the corresponding entry contains a prefix (in this case,it contains P5).We then follow the pointer and move to the next node.Then the 4-6th bits of the key 010 are used as an offset in the second node.The corresponding entry contains P7's next hop information and a pointer pointing to a NULL address,indicating the end of the lookup. Performance.The advantage of the k-bit trie structure is that it improves the lookup by k times.The disadvantage is that a large memory space is required.One way to reduce the memory space is to use a scheme called 'leaf pushing'described in Section 2.2.1.Leaf pushing can cut the memory requirements of the expanded trie in half by making each node's entry contain either a pointer or next hop information but not both (as shown in Figure 2.11 versus the one shown in Figure 2.10).The trick is to push down the next hop information to the leaf nodes of the trie.The leaf nodes only have the next hop information, since they do not have a pointer.For instance,P6's next hop information at the entry 001 of the top middle node is pushed down to the vacant locations of its child node(i.e.,the right most node)
34 IP ADDRESS LOOKUP Figure 2.11 Multi-bit trie structure example with each entry a prefix or a pointer to save memory space. Route Lookup. Let us use the example in Figure 2.10 and again suppose that the destination address is 11101000. The IP lookup starts at the root and traverses the path using three address bits at a time while remembering the last prefix that was visited. The first three bits of the key 111 are used as the offset address in the root node to find if the corresponding entry contains a prefix (in this case, it contains P5). We then follow the pointer and move to the next node. Then the 4–6th bits of the key 010 are used as an offset in the second node. The corresponding entry contains P7’s next hop information and a pointer pointing to a NULL address, indicating the end of the lookup. Performance. The advantage of the k-bit trie structure is that it improves the lookup by k times. The disadvantage is that a large memory space is required. One way to reduce the memory space is to use a scheme called ‘leaf pushing’ described in Section 2.2.1. Leaf pushing can cut the memory requirements of the expanded trie in half by making each node’s entry contain either a pointer or next hop information but not both (as shown in Figure 2.11 versus the one shown in Figure 2.10). The trick is to push down the next hop information to the leaf nodes of the trie. The leaf nodes only have the next hop information, since they do not have a pointer. For instance, P6’s next hop information at the entry 001 of the top middle node is pushed down to the vacant locations of its child node (i.e., the right most node)