CS143 Handout #29 Autumn 200 TAC Handout written by maggie Johnson and revised by me Three address code Three-address code(tac) will be the intermediate representation used in our Decaf compiler. It is essentially a ably language that falls in the lower-end of the mid-level IRs. Some variant of 2 3 or 4 address code is fairly commonly used as an iR, since it maps well to most assembly languages Our TAC is a sequence of instructions, each of which can have at most three operands. The operands could be two operands to a binary arithmetic operator and the third the result location, or an operand to compare to zero and a second location to branch to, and so on. For example, below on the left is an arithmetic expression and on the right, is a translation into TAC t1= b t2 =b* d: down to three. Of course, it's a little more complicated than the above example, because we have to translate branching and looping instructions, as well as function and method calls. Here is an example of the TAC branching instructions used to translate an if-statement if(a<b+ c) tl =b+ci t1 2 Goto L0: ftt z3=4 And here is an example of the TAC translation for a function call and array access Int ni n LCall ReadIntegeri n ReadInteger()i Binky(arr[n]) t2=t1*n; rr t2 t4=*(t3); PushParam LCall Bink
CS143 Handout #29 Autumn 2001 TAC Handout written by Maggie Johnson and revised by me. Three address code Three-address code (TAC) will be the intermediate representation used in our Decaf compiler. It is essentially a generic assembly language that falls in the lower-end of the mid-level IRs. Some variant of 2, 3 or 4 address code is fairly commonly used as an IR, since it maps well to most assembly languages. Our TAC is a sequence of instructions, each of which can have at most three operands. The operands could be two operands to a binary arithmetic operator and the third the result location, or an operand to compare to zero and a second location to branch to, and so on. For example, below on the left is an arithmetic expression and on the right, is a translation into TAC: a = b * c + b * d _t1 = b * c; _t2 = b * d; _t3 = _t1 + _t2; a = _t3; Notice the use of temp variables created by the compiler as needed to keep the number of operands down to three. Of course, it's a little more complicated than the above example, because we have to translate branching and looping instructions, as well as function and method calls. Here is an example of the TAC branching instructions used to translate an if-statement: if (a < b + c) a = a - c; c = b * c; _t1 = b + c; _t2 = a < _t1; IfZ _t2 Goto _L0; _t3 = a - c; a = _3; _L0: _t4 = b * c; c = _t4; And here is an example of the TAC translation for a function call and array access: int n; n = ReadInteger(); Binky(arr[n]); n = LCall _ReadInteger; _t1 = 4; _t2 = _t1 * n; _t3 = arr + _t2; _t4 = *(_t3); PushParam _t4; LCall _Binky;
Decaf TAc instruction formats The convention followed in the examples below is that tl, t2, and so on refer to variables(either from source program or temporaries)and Ll, L2, etc are used for labels Labels are attached to the instruction that serve as targets for goto/branch instructions and are used to identify function/method definitions and vtables Assignment Function definitions: tl ="abcdefg" EndFunci Return t1 (rvalue can be variable, or string/int Return constant) Memory references Arithmetic (t2 t3=t2+t1 t1=*(t2+8); t3=t2-t1; t3=t2*t1; (t1+-4) t3=t2/t (optional offset must be integer constant, t3=t28t1 can be positive or negative (not all arithmetic operators are present, must synthesize others using the primitives Array indexi available) To access arr5], add offset multiplied by elem size to base and deref Relational/equality/logical: Object fields, method dispatch To access ivars. add offset to base deref t3=t2&&t1; To call method. retrieve function address t3=t2 (must synthesize other ops as necessary) from vtable, invoke using ACall Data specification: Labels and branches VTable ClassName Ll, L2,...i Goto L1: (take branch if value of tl is zer Function/method calls (before call, params are individually pushed right to left) LCall L1 1=Lca11L1; ACall tli to= ACall t1 (LCall used for a function label known at compile-time, ACall for a computed function address most likely from vtable Each has two forms that differ in void vs non-void return value) The built-in functions from Decaf standard library(Alloc, ReadLine, PrintInt, Halt, etc )are invoked via LCal
2 Decaf TAC instruction formats The convention followed in the examples below is that t1, t2, and so on refer to variables (either from source program or temporaries) and L1, L2, etc. are used for labels. Labels are attached to the instruction that serve as targets for goto/branch instructions and are used to identify function/method definitions and vtables. Assignment: t2 = t1; t1 = "abcdefg"; t1 = 8; (rvalue can be variable, or string/int constant) Arithmetic: t3 = t2 + t1; t3 = t2 - t1; t3 = t2 * t1; t3 = t2 / t1; t3 = t2 % t1; (not all arithmetic operators are present, must synthesize others using the primitives available) Relational/equality/logical: t3 = t2 == t1; t3 = t2 < t1; t3 = t2 && t1; t3 = t2 || t1; (must synthesize other ops as necessary) Labels and branches: L1: Goto L1; IfZ t1 Goto L1; (take branch if value of t1 is zero) Function/method calls: PushParam t1; (before call, params are individually pushed right to left) LCall L1; t1 = LCall L1; ACall t1; t0 = ACall t1; (LCall used for a function label known at compile-time, ACall for a computed function address, most likely from vtable. Each has two forms that differ in void vs non-void return value) The built-in functions from Decaf standard library (Alloc, ReadLine, PrintInt, Halt, etc.) are invoked via LCall Function definitions: BeginFunc; EndFunc; Return t1; Return; Memory references: t1 = *(t2); t1 = *(t2 + 8); *(t1) = t2; *(t1 + -4) = t2; (optional offset must be integer constant, can be positive or negative) Array indexing: To access arr[5], add offset multiplied by elem size to base and deref Object fields, method dispatch: To access ivars, add offset to base, deref To call method, retrieve function address from vtable, invoke using ACall Data specification: VTable ClassName = L1, L2, ...;
Here is an example of a simple Decaf program and its tac translation void maino[ mmain t ai int b t0=1; int C t1=2; b b=2; c= a+ b Print(c)i PushParam c LCall PrintInt Endfunc What we have to do is figure out how to get from one to the other as we This includes not only generating the TAC, but figuring out the use of temp variables, creating labels, calling functions, etc. Since we have a lot to do, we will make the mechanics of generating the tac as easy as possible. In our parser, we will create the TAC instructions one at a time. We can immediately print them out or store them for further processing. We will simplify the Decaf language a little by cluding doubles for code generation and internally treating bools as 4-byte integers. Classes arrays, and strings will be implemented with 4-byte pointers. This means we only ever need to deal with 4-byte integer/pointer variables As each production is reduced, we will create the necessary instructions. This strategy makes our code-generation a bit limited--particularly for the way we would have to do switch statements-but we can translate more-or-less any language structure into an executable program in a single pass without needing to go back and edit anything, which is pretty convenient To see how a syntax-directed translation can generate TAC, we need to look at the derivation, and figure out where the different TAC statements should be generated as the productions are reduced Lets start with a trivial program void main() main Print(hello world") Beginfunc; to =hello world"; LCall Printstringi Endfunc Notice that we call the built-in library function labelled PrintString to do the actual printing. The library functions are called like any ordinary global function, but the code is provided by the compiler as part of linking with the standard library. Here is the derivation of the source program The trick is to identify where and what processing occurs as these productions are reduced to generate the given TAC Type - void Stmtlist - Constant - stringConstant Expr - Constant ExprLlst - Expr Printstmt - Print( Exprlist Stmt - Printstmt i Stmtlist - stmtlist stmt StmtBlock - stmtlist FunctionDefn - Type identifier( Formals StmtBlock
3 Here is an example of a simple Decaf program and its TAC translation: void main() { int a; int b; int c; a = 1; b = 2; c = a + b; Print(c); } main: BeginFunc; _t0 = 1; a = _t0; _t1 = 2; b = _t1; _t2 = a + b; c = _t2; PushParam c; LCall _PrintInt; EndFunc; What we have to do is figure out how to get from one to the other as we parse. This includes not only generating the TAC, but figuring out the use of temp variables, creating labels, calling functions, etc. Since we have a lot to do, we will make the mechanics of generating the TAC as easy as possible. In our parser, we will create the TAC instructions one at a time. We can immediately print them out or store them for further processing. We will simplify the Decaf language a little by excluding doubles for code generation and internally treating bools as 4-byte integers. Classes, arrays, and strings will be implemented with 4-byte pointers. This means we only ever need to deal with 4-byte integer/pointer variables. As each production is reduced, we will create the necessary instructions. This strategy makes our code-generation a bit limited—particularly for the way we would have to do switch statements—but we can translate more-or-less any language structure into an executable program in a single pass, without needing to go back and edit anything, which is pretty convenient. To see how a syntax-directed translation can generate TAC, we need to look at the derivation, and figure out where the different TAC statements should be generated as the productions are reduced. Let’s start with a trivial program: void main() { Print("hello world"); } main: BeginFunc; _t0 = "hello world"; PushParam _t0; LCall _PrintString; EndFunc; Notice that we call the built-in library function labelled _PrintString to do the actual printing. The library functions are called like any ordinary global function, but the code is provided by the compiler as part of linking with the standard library. Here is the derivation of the source program. The trick is to identify where and what processing occurs as these productions are reduced to generate the given TAC: Type -> void Formals -> StmtList -> Constant -> stringConstant Expr -> Constant ExprList -> Expr PrintStmt -> Print ( ExprList ) Stmt -> PrintStmt ; StmtList -> StmtList Stmt StmtBlock -> { StmtList } FunctionDefn -> Type identifier ( Formals ) StmtBlock
Decllist - Decl Program -> Decllist Here is another simple program with a bit more complex expression void main()( main nt a Print (a)i t1 t0+ a 1, Pushpa LCall PrintInti Endfunc Here is the derivation. Again, consider where the instructions above must be emitted relative to the parsing Formals - Stmtlist - int Variable - Type identifier VariableDecl-> variable Stmt - variableDecl Stmtlist - stmtlist stmt OptReceiver -> iver identif Value ->OptReceiver identi p了 Simplestmt-> tmt - Simplestmt titlist - stmtlist stmt Recei LVal iver identifi Expr - LValue ExprLlst - Expr intstmt - Print( ExprList Stmt - Prints stmtlist - StmtBlock - Stmtlist tionDefn - Type identifier( Formals StmtBlock tion def cIList - Decl
4 Decl -> FunctionDefn DeclList -> Decl Program -> DeclList Here is another simple program with a bit more complex expression: void main() { int a; a = 2 + a; Print(a); } main: BeginFunc; _t0 = 2; _t1 = _t0 + a; a = _t1; PushParam a; LCall _PrintInt; EndFunc; Here is the derivation. Again, consider where the instructions above must be emitted relative to the parsing activity: Type -> void Formals -> StmtList -> Type -> int Variable -> Type identifier VariableDecl -> Variable ; Stmt -> VariableDecl StmtList -> StmtList Stmt OptReceiver -> LValue -> OptReceiver identifier Constant -> intConstant Expr -> Constant OptReceiver -> LValue -> OptReceiver identifier Expr -> LValue Expr -> Expr + Expr SimpleStmt -> LValue = Expr Stmt -> SimpleStmt ; StmtList -> StmtList Stmt OptReceiver -> LValue -> OptReceiver identifier Expr -> LValue ExprList -> Expr PrintStmt -> Print ( ExprList ) Stmt -> PrintStmt ; StmtList -> StmtList Stmt StmtBlock -> { StmtList } FunctionDefn -> Type identifier ( Formals ) StmtBlock Decl -> FunctionDefn DeclList -> Decl Program -> DeclList
What additional processing would need to be added for a program with a complex expression like void main()( Beginfunc nt a b=3 a=(b+2)-(a*3)/6 = b+ t2 t6=6 t7=t5 Endfunci let's consider what needs to be done to deal with arrays(note the TAC code below doesn't do ay bounds checking, that will be your job to implement void Binky (int[] arr)i Binl axr[1]=arr[0]*2; BeginFunc; rr t2 t8 ⊥t9=2 t10=t8*t9 Endfunc Before we deal with classes, we should look at how function calls are implemented. This will facilitate our study of methods as they are used in classes. a program with a simple function call int foo (int a, int b)t eturn a bi void main(t eturn int c Endfunc t d main Beginfunc PushParam ci t1 lCall fooi Endfunc;
5 What additional processing would need to be added for a program with a complex expression like: void main() { int b; int a; b = 3; a = 12; a = (b + 2)-(a*3)/6; } main: BeginFunc; _t0 = 3; b = _t0; _t1 = 12; a = _t1; _t2 = 2; _t3 = b + _t2; _t4 = 3; _t5 = a * _t4; _t6 = 6; _t7 = _t5 / _t6; _t8 = _t3 - _t7; a = _t8; EndFunc; Now let’s consider what needs to be done to deal with arrays (note the TAC code below doesn't do array bounds checking, that will be your job to implement!) void Binky(int[] arr) { arr[1] = arr[0] * 2; } _Binky: BeginFunc; _t0 = 1; _t1 = 4; _t2 = _t1 * _t0; _t3 = arr + _t2; _t4 = 0; _t5 = 4; _t6 = _t5 * _t4; _t7 = arr + _t6; _t8 = *(_t7); _t9 = 2; _t10 = _t8 * _t9; *(_t3) = _t10; EndFunc; Before we deal with classes, we should look at how function calls are implemented. This will facilitate our study of methods as they are used in classes. A program with a simple function call: int foo(int a, int b) { return a + b; } void main() { int c; int d; foo(c, d); } _foo: BeginFunc; _t0 = a + b; Return _t0; EndFunc; main: BeginFunc; PushParam d; PushParam c; _t1 = LCall _foo; EndFunc;
Now for a class example with both fields and methods(notice how this is passed as a secret first arameter to a method call) class Animal Animal. InitAnimal int height Beginfunc; void InitAnimal(int h)( *(this 4)=h this height =h Endure class Cow extends Animal Beginfunc void InitCow(int h)i InitAnimal(h) 1=*(_t0 P1 Push void binky (class Cow Endfunc betsy. Initcow(5) VTable Cow Cow. Initcow, Beginfunc; t2=5 t3 =*(betsy) PushParam tli PushParam betsy Ende
6 Now for a class example with both fields and methods (notice how this is passed as a secret first parameter to a method call) class Animal { int height; void InitAnimal(int h) { this.height = h; } } class Cow extends Animal { void InitCow(int h) { InitAnimal(h); } } void Binky(class Cow betsy) { betsy.InitCow(5); } _Animal.InitAnimal: BeginFunc; *(this + 4) = h; EndFunc; VTable Animal = _Animal.InitAnimal, ; _Cow.InitCow: BeginFunc; _t0 = *(this); _t1 = *(_t0); PushParam h; PushParam this; ACall _t1; EndFunc; VTable Cow = _Animal.InitAnimal, _Cow.InitCow, ; _Binky: BeginFunc; _t2 = 5; _t3 = *(betsy); _t4 = *(_t3 + 4); PushParam _t1; PushParam betsy; ACall _t4; EndFunc;
How about some TAC that implements control structures, for example, such the if statement below? void main()( main a=23; u20 if 23 t1=23; e⊥se Ifz t2 Goto Lo t3=10 t4=19 t4 Endfunc Or the even snazzier while loop(for loops are left an exercise for the reader) void maino( int ai BeginFunc; t0=0 while (a< 10)i Print(a暑2 0); t1=10 +1 t1 工 z34 2 Goto Lli 5=0 t6 t4==t5; Pushparam t6 LCall PrintBool t8 = a+ t7 a L1
7 How about some TAC that implements control structures, for example, such the if statement below? void main() { int a; a = 23; if (a == 23) a = 10; else a = 19; } main: BeginFunc; _t0 = 23; a = _t0; _t1 = 23; _t2 = a == _t1; IfZ _t2 Goto _L0; _t3 = 10; a = _t3; Goto _L1; _L0: _t4 = 19; a = _t4; _L1: EndFunc; Or the even snazzier while loop (for loops are left an exercise for the reader): void main() { int a; a = 0; while (a < 10) { Print(a % 2 == 0); a = a + 1; } } main: BeginFunc; _t0 = 0; a = _t0; _L0: _t1 = 10; _t2 = a < _t1; IfZ _t2 Goto _L1; _t3 = 2; _t4 = a % _t3; _t5 = 0; _t6 = _t4 == _t5; PushParam _t6; LCall _PrintBool; _t7 = 1; _t8 = a + _t7; a = _t8; Goto _L0; _L1: EndFunc;
Using TAC with other languages The TAC generation that we have been looking at is fairly generic. Although we have talked about in the context of Decaf, a TAC generator for any programming language would generate a similar sequence of statements. For example, in the dragon book, the following format is used to define the TAC generation for a while loop. (P. 469 Aho/Sethi/Ullman) s -> while e do s1 ( s begin newlabel s. after newlabeli de gen(s begin:' E. code gen('if E place =!'0'goto safter gen('goto s begin) gen(safter One last idea before we finish.. A nice enhancement to a TAC generator is re-using temp variable names. For example, if we have the following expression E->E1+E2 Our usual steps would be to evaluate El into tl, evaluate E2 into t2, and then set t3 to their sum Will tl and t2 be used anywhere else in the program? How do we know when we can reuse these temp names? Here is a method from Aho/Sethi/Ullman(p. 480)for reusing temp names Keep a count c initialized to o 2)Whenever a temp name is used as an operand, decrement c by 3)Whenever a new temp is created, use this new temp and increase c by one x= a *b+c* (C (c=0)T0=T0+T1 TO Note that this algorithm expects that each temporary name will be assigned and used exactly once, which is true in the majority of cases Bibliography J. P. Bennett, Introduction to Compiling Techniques. Berkshire, England: McGraw-Hill, 1990 S Muchnick, Advanced Compiler Design and Implementation. San Francisco, CA: Morgan Kaufmann, 1997 A Pyster, Compiler design and Construction. New york, NY: Van Nostrand reinhold, 1988
8 Using TAC with other languages The TAC generation that we have been looking at is fairly generic. Although we have talked about it in the context of Decaf, a TAC generator for any programming language would generate a similar sequence of statements. For example, in the dragon book, the following format is used to define the TAC generation for a while loop. (P. 469 Aho/Sethi/Ullman) S -> while E do S1 { S.begin = newlabel; S.after = newlabel; S.code = gen(S.begin ':') E.code gen('if' E.place '=' '0' 'goto' S.after) S1.code gen('goto' S.begin) gen(S.after ':') } One last idea before we finish... A nice enhancement to a TAC generator is re-using temp variable names. For example, if we have the following expression: E -> E1 + E2 Our usual steps would be to evaluate E1 into t1, evaluate E2 into t2, and then set t3 to their sum. Will t1 and t2 be used anywhere else in the program? How do we know when we can reuse these temp names? Here is a method from Aho/Sethi/Ullman (p. 480) for reusing temp names: 1) Keep a count c initialized to 0. 2) Whenever a temp name is used as an operand, decrement c by 1 3) Whenever a new temp is created, use this new temp and increase c by one. x = a * b + c * d - e * f (c = 0) T0 = a * b (c = 1) T1 = c * d (c = 2) (c = 0) T0 = T0 + T1 (c = 1) T1 = e * f (c = 2) (c = 0) T0 = T0 - T1 x = T0 Note that this algorithm expects that each temporary name will be assigned and used exactly once, which is true in the majority of cases. Bibliography J.P. Bennett, Introduction to Compiling Techniques. Berkshire, England: McGraw-Hill, 1990. S. Muchnick, Advanced Compiler Design and Implementation. San Francisco, CA: Morgan Kaufmann, 1997. A. Pyster, Compiler Design and Construction. New York, NY: Van Nostrand Reinhold, 1988