(test(type))014 start dst :48 src :48 {test(tt》@16 {t,env(mac2port》@8 type 16 eth type {dst,env(mac2port)>8 {t,env(mac2port)】 (inport) 0x86dd Y0x0800Qx0806 ist,env(mac2part)} (inport) ipv6 ipv4 arp Fig.8.Dependent label of r Fig.9.A parse graph we explain in Sec.III-A,the standard taint analysis algorithm example,the label set for the return command is the same merges the sets LI and L2 of the two branches when we reach with the set for the variable r,as shown in Fig.8.From the the end of the if statement.We generate the dependent label instrumented trace we know the branches taken for the if instead,which is a more refined view,saying the label set statements at lines 14 and 16.therefore we know the relevant depends on the value of the branch condition B containing labels are only those shown on the grey path in Fig.8.This information in L.If B holds the label set is L1,otherwise L2. allows us to remove the events labeled with"*"in the middle As an example,the if statement at line 8 in Fig.3 column of Table II.With this more compact sub-trace,in Flow assigns different values to r.We define the label set of r Table 2 of Fig.7 we can fill"*"into the first two fields of asL={),where l is a dependent label defined as the first row,which results in not only compact flow tables, port?r:(inr)and port is the label set but also much less mismatches and PacketIn messages. for port at line 8,i.e.Lport ={dst,env (mac2port)}. Similarly,the if statement at line 16 leaves r untouched in V.PACKET PARSING one branch,and assigns 0 to it in the other.So the label set of r is either the same as before (i.e.Lr1),or empty set (if it is POMP supports protocol oblivious programming.It uses reset to 0,which contains no source information).We define dotted paths of packet header fields to access pack- the label set as Lr2 ={l2),where the dependent label lr2 ets.For instance,the API call of readpacket (pkt, is defined as {test (tt1)?L:{}We can also give an "eth.ipv4.ttl")in Fig.3 accesses the "ttl"field of intuitive graphic view of dependent labels.Fig.8 shows the the ipv4 packet..In the dotted string path,“eth”and“ipv4" dependent label assigned to r at the end of the if statement represent protocol names and the last substring"ttl"is the field at line 14 (which is also the final label for r). name.The string path can be arbitrary for any user customized In the learning switches example,the extension of taint protocols.It is not limited to a predefined set of protocols. analysis with dependent labels affects only the labeling of To achieve this,the programmers provide protocol speci- r,whose value depends on the if statement branches taken fications,from which a parse graph is generated.The parse at runtime.Although it gives us a more refined view of graph serves both as a P4 parser,which can be compiled and information flow,the dependent label does not affect our deployed on P4 switches,or as an input to the runtime systems algorithm to generate pipelines shown in Sec.III because to parsing packets at the control plane. statically we do not know which branch will be taken and have to maintain a conservative view as before.The generated pipeline is the same as we have shown before. A.Protocol specification However,we can know which branch is taken at runtime, As shown in the learning switches example in Fig.2,a then we can derive more refined label sets from the dependent protocol specification consists of a set of header definitions label.This means we can find more precise sub-traces for and a starting protocol declaration.A header definition has different flow tables.However,to take advantage of dependent the following parts. labels.we need to instrument the code and the traces to remember which if statement branch is taken at runtime. header name:name of the protocol,e.g.eth and ipv4. fields:the layout of the header,consisting of a sequence C.Code and trace instrumentation of field names and the corresponding lengths in bits. We instrument the if statement in the code of the network next headers:the name of the next header,which could policy f to generate events showing which branch is taken at be conditional upon the value of certain field.For in- runtime.For the example execution trace shown in Table II, stance,if the type of eth is 0x800,the next header is we have three more events after the instrumentation,shown in ipv4;if it is 0x86dd,the next header is ipv6. the right column. The starting protocol declaration shows which protocol is With this instrumented trace and the dependent labels,we the outmost one,i.e.the one that needs to be parsed first.It can derive more compact sub-traces.In the learning switches is eth in our example.{test(type)}@14 {test(ttl)}@16 Y {dst, env(mac2port)}@8 N {dst, env(mac2port)}@8 Y {} N {dst, env(mac2port)} Y {inport} N {dst, env(mac2port)} Y {inport} N Fig. 8. Dependent label of r we explain in Sec. III-A, the standard taint analysis algorithm merges the sets L1 and L2 of the two branches when we reach the end of the if statement. We generate the dependent label instead, which is a more refined view, saying the label set depends on the value of the branch condition B containing information in L. If B holds the label set is L1, otherwise L2. As an example, the if statement at line 8 in Fig. 3 assigns different values to r. We define the label set of r as Lr1 = {lr1}, where lr1 is a dependent label defined as Lport@8 ?Lport : {inport}, and Lport is the label set for port at line 8, i.e. Lport = {dst, env(mac2port)}. Similarly, the if statement at line 16 leaves r untouched in one branch, and assigns 0 to it in the other. So the label set of r is either the same as before (i.e. Lr1), or empty set (if it is reset to 0, which contains no source information). We define the label set as Lr2 = {lr2}, where the dependent label lr2 is defined as {test(ttl)}@16 ?Lr1 : {}. We can also give an intuitive graphic view of dependent labels. Fig. 8 shows the dependent label assigned to r at the end of the if statement at line 14 (which is also the final label for r). In the learning switches example, the extension of taint analysis with dependent labels affects only the labeling of r, whose value depends on the if statement branches taken at runtime. Although it gives us a more refined view of information flow, the dependent label does not affect our algorithm to generate pipelines shown in Sec. III because statically we do not know which branch will be taken and have to maintain a conservative view as before. The generated pipeline is the same as we have shown before. However, we can know which branch is taken at runtime, then we can derive more refined label sets from the dependent label. This means we can find more precise sub-traces for different flow tables. However, to take advantage of dependent labels, we need to instrument the code and the traces to remember which if statement branch is taken at runtime. C. Code and trace instrumentation We instrument the if statement in the code of the network policy f to generate events showing which branch is taken at runtime. For the example execution trace shown in Table II, we have three more events after the instrumentation, shown in the right column. With this instrumented trace and the dependent labels, we can derive more compact sub-traces. In the learning switches Fig. 9. A parse graph example, the label set for the return command is the same with the set for the variable r, as shown in Fig. 8. From the instrumented trace we know the branches taken for the if statements at lines 14 and 16, therefore we know the relevant labels are only those shown on the grey path in Fig. 8. This allows us to remove the events labeled with “*” in the middle column of Table II. With this more compact sub-trace, in Flow Table 2 of Fig. 7 we can fill “*” into the first two fields of the first row, which results in not only compact flow tables, but also much less mismatches and PacketIn messages. V. PACKET PARSING POMP supports protocol oblivious programming. It uses dotted paths of packet header fields to access packets. For instance, the API call of read_packet(pkt, "eth.ipv4.ttl") in Fig. 3 accesses the “ttl” field of the ipv4 packet. In the dotted string path, “eth” and “ipv4” represent protocol names and the last substring “ttl” is the field name. The string path can be arbitrary for any user customized protocols. It is not limited to a predefined set of protocols. To achieve this, the programmers provide protocol speci- fications, from which a parse graph is generated. The parse graph serves both as a P4 parser, which can be compiled and deployed on P4 switches, or as an input to the runtime systems to parsing packets at the control plane. A. Protocol specification As shown in the learning switches example in Fig. 2, a protocol specification consists of a set of header definitions and a starting protocol declaration. A header definition has the following parts. • header name: name of the protocol, e.g. eth and ipv4. • fields: the layout of the header, consisting of a sequence of field names and the corresponding lengths in bits. • next headers: the name of the next header, which could be conditional upon the value of certain field. For instance, if the type of eth is 0x800, the next header is ipv4; if it is 0x86dd, the next header is ipv6. The starting protocol declaration shows which protocol is the outmost one, i.e. the one that needs to be parsed first. It is eth in our example