COMPILER CONSTRUCTION Principles and practice Kenneth C. louden
COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden
2. Scanning (Lexical Analysis) PART ONE
2. Scanning (Lexical Analysis) PART ONE
Contents PART ONE 2. 1 The Scanning Process [Open] 2.2 Regular Expression [Open 2.3 Finite Automata [Open PART TWO 2. 4 From Regular expressions to dFAs 2.5 Implementation of a TINY Scanner 2.6 Use of Lex to Generate a Scanner automatically
Contents PART ONE 2.1 The Scanning Process [Open] 2.2 Regular Expression [Open] 2.3 Finite Automata [Open] PART TWO 2.4 From Regular Expressions to DFAs 2.5 Implementation of a TINY Scanner 2.6 Use of Lex to Generate a Scanner Automatically
2. 1 The Scanning Process
2.1 The Scanning Process
The Function of a scanner Reading characters from the source code and form them into logical units called tokens Tokens are logical entities defined as an enumerated type Typedef enum fIF, THEN, ELSE, PLUS, MINUS, NUM, ID,...) OKen lype
The Function of a Scanner • Reading characters from the source code and form them into logical units called tokens • Tokens are logical entities defined as an enumerated type – Typedef enum {IF, THEN, ELSE, PLUS, MINUS, NUM, ID,…} TokenType;
The Categories of Tokens RESERVED WORDS Such as iF and then. which represent the strings of characters"“ if and“then SPECIAL SYMBOLS Such as PlUS and MINUS, which represent the characters OTHER TOKENS Such as NUM and ID, which represent numbers and Ident fi ers
The Categories of Tokens • RESERVED WORDS – Such as IF and THEN, which represent the strings of characters “if” and “then” • SPECIAL SYMBOLS – Such as PLUS and MINUS, which represent the characters “+” and “-“ • OTHER TOKENS – Such as NUM and ID, which represent numbers and identifiers
Relationship between Tokens and its String The string is called STRING VALUE or LEXEME of token Some tokens have only one lexeme, such as reserved woras a token may have infinitely many lexemes such as the token d
Relationship between Tokens and its String • The string is called STRING VALUE or LEXEME of token • Some tokens have only one lexeme, such as reserved words • A token may have infinitely many lexemes, such as the token ID
Relationship between Tokens and its String Any value associated to a token is called an attributes of a token String value is an example of an attribute A NUM token may have a string value such as 32767and actual value 32767 A PLUS token has the string value+ as well as arithmetic operation The token can be viewed as the collection of all of its attributes Only need to compute as many attributes as necessary to allow further processing The numeric value of a num token need not compute immediately
Relationship between Tokens and its String • Any value associated to a token is called an attributes of a token – String value is an example of an attribute. – A NUM token may have a string value such as “32767” and actual value 32767 – A PLUS token has the string value “+” as well as arithmetic operation + • The token can be viewed as the collection of all of its attributes – Only need to compute as many attributes as necessary to allow further processing – The numeric value of a NUM token need not compute immediately
Some practical issues of the scanner One structured data type to collect all the attributes of a token called a token record Typedef struct iToken Type tokenval char *stringval int numval 3 Token Recorc
Some Practical Issues of the Scanner • One structured data type to collect all the attributes of a token, called a token record – Typedef struct {TokenType tokenval; char *stringval; int numval; } TokenRecord
Some practical issues of the scanner The scanner returns the token value only and places the other attributes in variables Toke Type get Token(void) As an example of operation of get Token, consider the following line of c code Aindex]=4+2 a[index]= 4+2 a[index 4|+ 2 RET
Some Practical Issues of the Scanner • The scanner returns the token value only and places the other attributes in variables TokeType getToken(void) • As an example of operation of getToken, consider the following line of C code. A[index] = 4+2 a [ i n d e x ] = 4 + 2 a [ i n d e x ] = 4 + 2 RET