CS308 Compiler Theory CS308 Compiler Theory
CS308 Compiler Theor y CS308 Compiler Theory 1
Grading Homework Pop-quizzes 10% ·Programming assignments:20%↓ 。Final exam:70%↑ CS308 Compiler Theory 2
Grading • Homework + Pop-quizzes : 10% • Programming assignments : 20% ↓ • Final exam : 70% ↑ CS308 Compiler Theory 2
Course Outline Introduction to Compiling A Simple Syntax-Directed Translator ·Lexical Analysis ·Syntax Analysis Context Free Grammars Top-Down Parsing,LL Parsing Bottom-Up Parsing,LR Parsing Syntax-Directed Translation Attribute Definitions Evaluation of Attribute Definitions Semantic Analysis,Type Checking Run-Time Organization Intermediate Code Generation ·Code Generation Machine-Independent Optimizations CS308 Compiler Theory 3
Course Outline • Introduction to Compiling • A Simple Syntax-Directed Translator • Lexical Analysis • Syntax Analysis – Context Free Grammars Context Free Grammars – Top-Down Parsing, LL Parsing – Bottom-Up Parsing, LR Parsing • S t yn ax-Di t d T l ti Directed Translation – Attribute Definitions – Evaluation of Attribute Definitions • Semantic Analysis, Type Checking • Run-Time Organization • Intermediate Code Generation Intermediate Code Generation • Code Generation • Machine-Independent Optimizations 3 p p CS308 Compiler Theory
Phases of A Compiler Source LexicalSyntax Semantic Intermediate Code Code Target Program AnalyzerAnalyzer Generator Optimizer GeneratorProgram Each phase transforms the source program from one representation into another representation. They communicate with error handlers. They communicate with the symbol table. CS308 Compiler Theory 4
Phases of A Compiler Lexical Analyzer Semantic Analyzer Syntax Analyzer Intermediate Code Generator Code Optimizer Code Generator Target Program Source Program • Each phase transforms the source program from one representation into another re presentation. • They communicate with error handlers. • They communicate with the symbol table. CS308 Compiler Theory 4
Lexical Analyzer Lexical Analyzer reads the source program character by character and returns the tokens of the source program. A token describes a pattern of characters having same meaning in the source program.(such as identifiers,operators,keywords,numbers, delimeters and so on) CS308 Compiler Theory 5
Lexical Analyzer • Lexical Analyzer reads the source program character by character and ret rns the returns the t ko ens of the so rce program of the source program. • A token describes a pattern of characters having same meaning in the source program. (such as identifiers, operators, keywords, numbers, source program. (such as identifiers, operators, keywords, numbers, delimeters and so on) CS308 Compiler Theory 5
Terminology of Languages Alphabet:a finite set of symbols (ASCII characters) String Finite sequence of symbols on an alphabet Sentence and word are also used in terms of string -8 is the empty string -s is the length of string s. ● Language:sets of strings over some fixed alphabet -3 the empty set is a language. -(s}the set containing empty string is a language The set of well-formed C programs is a language -The set of all possible identifiers is a language. Operators on Strings: -Concatenation:xy represents the concatenation of strings x and y.s s =s &s=s -sh =sss..s(n times)s CS308 Compiler Theory 6
Terminology of Languages • Alphabet : a finite set of symbols (ASCII characters) • String : – Finite sequence of symbols on an alphabet – Sentence and ord are also sed in terms of string Sentence and word are also used in terms of string – ε is the empty string – |s| is the length of string s. • Language: sets of strings over some fixed alphabet – ∅ the empty set is a language. – { ε}h ii i i l } t he set containing empty string is a language – The set of well-formed C programs is a language – The set of all possible identifiers is a language. • Operators on Strings: – Concatenation: xy represents the concatenation of strings x and y. s ε = s ε s = s 6 – s n = s s s .. s ( n times) s 0 = ε CS308 Compiler Theory
Operations on languages ·Concatenation: L L2={SiS2I S1E L1 and S2E L2} ·Union -L1UL2={s|s∈L1ors∈L2} ·Exponentiation: -L0={ε}L1=L L2=LL ·Kleene Closure -L-UL i0 ·Positive Closure -=U2 CS308 Compiler Theory 7
Operations on Languages • Concatenation: – L L = { s s | s ∈ L and s ∈ L } 1L 2 { s1s2 | s1 ∈ L1 and s2 ∈ L2 } • Union – L L { | L L } 1 ∪ L 2 = { s| s ∈ L1 or s ∈ L2 } • Exponentiation: – L 0 = { ε} L1 = L L 2 = LL • Kleene Closure Kleene Closure – L* = U ∞ i = 0 i L • Positive Closure – L + = U ∞ i L 7 i =1 CS308 Compiler Theory
Regular Expressions (Rules) Regular expressions over alphabet Reg.Expr Language it denotes 8 {ε} a∈∑ {a} ()|(2) L()UL(2) (1)(2) L(r1)L(2) (r) (L(r)* (r) Lr) ·(r)=(r)r)* ·(r)?=()|E CS308 Compiler Theory 8
Regular Expressions (Rules) Regular expressions over alphabet Σ Reg. Expr Language it denotes ε { ε } a∈ Σ {a} (r ) | (r ) L(r ) ∪ L(r ) 1) | (r 2 ) L(r1) ∪ L(r 2 ) (r1) (r 2) L(r1) L(r 2 ) (r) * (L(r)) * (r) L(r) • (r) + = (r)(r) * • ( r )? = ( r ) | ε 8 ( ) ()| CS308 Compiler Theory
Finite Automata A recognizer for a language is a program that takes a string x,and answers "yes"if x is a sentence of that language,and "no"otherwise. We call the recognizer of the tokens as a finite automaton. A finite automaton can be:deterministic(DFA)or non-deterministic (NFA) This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer. Both deterministic and non-deterministic finite automaton recognize regular sets ·Which one? deterministic-faster recognizer,but it may take more space non-deterministic-slower,but it may take less space Deterministic automatons are widely used lexical analyzers. First,we define regular expressions for tokens;Then we convert them into a DFA to get a lexical analyzer for our tokens. Algorithm1:Regular Expression>NFA>DFA (two steps:first to NFA,then to DFA) Algorithm2:Regular Expression>DFA (directly convert a regular expression into a DFA) CS308 Compiler Theory
Finite Automata • A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language and is a sentence of that language, and “no” otherwise otherwise. • We call the recognizer of the tokens as a finite automaton. • A finite automaton can be: deterministic( ) DFA or non-deterministic ( ) NFA • This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer. • B hd i i i d Both deterministic and non-d i i i fi i i l deterministic finite automaton recognize regular sets. • Which one? – deterministic – faster recog, y p nizer, but it may take more space – non-deterministic – slower, but it may take less space – Deterministic automatons are widely used lexical analyzers. • First we define regular expressions for tokens; Then we convert them into a DFA to First, we define regular expressions for tokens; Then we convert them into a DFA to get a lexical analyzer for our tokens. – Algorithm1: Regular Expression Î NFA Î DFA (two steps: first to NFA, then to DFA) Al ith 2: Re l E e i Î DFA (di e tl e t e l e e i i t DFA) 9 – Algorithm2: Regular Expression Î DFA (directly convert a regular expression into a DFA) CS308 Compiler Theory
Non-Deterministic Finite Automaton (NFA) A non-deterministic finite automaton (NFA)is a mathematical model that consists of: -S-a set of states ->-a set of input symbols (alphabet) move-a transition function move to map state-symbol pairs to sets of states. So -a start(initial)state F-a set of accepting states(final states) 8-transitions are allowed in NFAs.In other words,we can move from one state to another one without consuming any symbol. A NFA accepts a string x,if and only if there is a path from the starting state to one of accepting states such that edge labels along this path spell out x. CS308 Compiler Theory 10
Non-Deterministic Finite Automaton (NFA) • A non-deterministic finite automaton (NFA) is a mathematical model that consists of: that consists of: – S - a set of states – Σ - a set of in p y (p ) ut s ymbols (al phabet ) – move – a transition function move to map state-symbol pairs to sets of states. – s0 - a start (initial) state – F – a set of accepting states (final states) a set of accepting states (final states) • ε - transitions are allowed in NFAs In other words we can move from transitions are allowed in NFAs. In other words, we can move from one state to another one without consuming any symbol. • A NFA accepts a string x if and only if there is a path from the starting A NFA accepts a string x, if and only if there is a path from the starting state to one of accepting states such that edge labels along this path spell out x. CS308 Compiler Theory 10