A Simple Syntax-Directed Translator CS308 Compiler Theory
A Simple Syntax-Directed Translator CS308 Compiler Theory 1
Lecture Outline We shall look at a simple programming language and describe the initial phases of compilation. We start off by creating a simple'syntax directed translator that maps infix arithmetic to postfix arithmetic. This translator is then extended to cater for more elaborate programs such as (check page 39 Aho) -While (true){x=a[i];a[i]=a[j];a[j]=x; Which generates simplified intermediate code (as on pg40 Aho) CS308 Compiler Theory 2
Lecture Outline • We shall look at a simple programming language and describe the initial ph f il i ases of compilation. • We start off by creating a ‘simple’ syntax directed translator that maps infix arithmetic to postfix arithmetic. • This translator is then extended to cater for more elaborate programs such as (check page 39 Aho) such as (check page 39 Aho) – While (true) { x=a[i]; a[i]=a[j]; a[j]=x; } • Which generates simplified intermediate code (as on pg40 Aho) CS308 Compiler Theory 2
Two Main Phases (Analysis and Synthesis) Analysis Phase :-Breaks up a source program into constituent pieces and produces an internal representation of it called intermediate code. Synthesis Phase :-translates the intermediate code into the target program. During this lecture we shall focus on the analysis phase (compiler front end...see figure next slide) CS308 Compiler Theory 3
Two Main Phases (Analysis and Synthesis) • Analysis Phase :- Breaks up a source program into constituent pieces and d l f ll d d pro duces an interna l representation of it call e d intermed d iate co de. • Synthesis Phase :- translates the intermediate code into the target program. • During this lecture we shall focus on the analysis phase (compiler front end see figure next slide) end … see figure next slide) CS308 Compiler Theory 3
A Model of A Compiler Font End source Lexical tokens syntax Intermediate three-address Parser Code program Analyzer tree Generator code Symbol Table Figure 2.3:A model of a compiler front end CS308 Compiler Theory 4
A Model of A Compiler Font End CS308 Compiler Theory 4
Syntax vs.Semantics The syntax of a programming language describes the proper form ofits programs The semantics of the language defines what its programs mean. CS308 Compiler Theory 5
Syntax vs. Semantics • The syntax of a programming language describes the proper form of its programs • The semantics of the language defines what its programs mean semantics of the language defines what its programs mean. CS308 Compiler Theory 5
A note on Grammars (Context-free) A formal grammar is used to specify the syntax of a formal language (for example a programming language like C,Java) 。 Grammar describes the structure(usually hierarchical)of programming languages. For e.g.in Java an IF statement should fit in if(expression )statement else statement statement->if expression statement else statement Note the recursive nature of statement. CS308 Compiler Theory 6
A note on Grammars (Context-free) • A formal grammar is used to specify the syntax of a formal language (f l i l lik ) (for examp le a programm ing language like C, Java ) • Grammar describes the structure (usually hierarchical) of programming Grammar describes the structure (usually hierarchical) of programming languages. – For e.g. in Java an IF statement should fit in • if ( expression ) statement else statement – statement -> if ( expression ) statement else statement – Note the recursive nature of statement. CS308 Compiler Theory 6
A CFG has four components A set of terminal symbols,sometimes referred to as 'tokens'. A set of non-terminals,sometimes called syntactic variables'. ·A set of productions. A designation of one of the non-terminals as the start symbol. CS308 Compiler Theory 7
A CFG has four components • A set of terminal symbols, sometimes referred to as ‘tokens’. • A set of non-terminals, sometimes called ‘syntactic variables’. • A set of productions. • A designation of one of the non-terminals as the start symbol . CS308 Compiler Theory 7
A Grammar for 'list of digits separated by or-' list->list digit list->list-digit list->digit digit->01...19 Accepts strings such as 9-5+2,3-1,or 7. list and digit are non-terminals 0 1...9,+are the terminal symbols CS308 Compiler Theory
A Grammar for ‘list of digits separated by + or -’ list -> list + digit list -> list – digit list -> digit digit -> 0|1|… |9 • A t ti h 9 Accep ts s t r ings suc h as 9 -5+2 3, 3 -1 7 , or 7. • list and digit are non-terminals • 0|1| |9 0 | 1 | … | 9, +, - are th l bl e termina l sym b o ls CS308 Compiler Theory 8
Parsing and derivations Parsing is the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the grammar A grammar derives strings by beginning with the start symbol and repeatedly replacing a non-terminal by the body of a production If it cannot be derived from the start symbol then reporting syntax errors within the string. CS308 Compiler Theory
Parsing and derivations • Parsing is the problem of taking a string of terminals and figuring out h d i if h bl fh how to der ive it from t he start sym b o l o f t he grammar • A grammar derives strings by beginning with the start symbol and repeatedly replacing a non-terminal by the body of a production • If it cannot be derived from the start symbol then reporting syntax errors within the strin g. CS308 Compiler Theory 9
Parse Trees A parse tree pictorially shows how the start symbol of a grammar derives a string in the language A grammar can have more than one parse tree generating a given string of terminals(thus making it ambiguous) CS308 Compiler Theory 10
Parse Trees • A parse tree pictorially shows how the start symbol of a grammar di i i hl derives a string in the language • A grammar can have more than one parse tree generating a given string of terminals (thus making it ambiguous) CS308 Compiler Theory 10