$30
Your assignment is to build the parser for the μGo language that supports print IO, arithmetic operations and some basic constructs for μGo. You will have to de ne the token classes and the grammar for μGo using the given Lex and Yacc codes, respectively, to produce the parser. You are welcome to make any changes of the given codes to meet your expectations. In addition to the μGo syntax, your produced parser will do simple checking for semantic correctness of the testing cases.
In the previous assignment, you have built the Lex code to split the input text stream into tokens that will be accepted by Yacc. For this assignment, you must build the code to analyze these tokens and check the syntax validity based on the given grammar rules.
Speci cally, you will do the following three tasks in this assignment.
1. De ne tokens and types (Section 1.1)
2. Design μGo grammar and implement the related actions (Section 1.2)
3. Handle semantic errors (Section 1.3)
1.1 De ne Tokens and Types
1.1.1 Tokens
The tokens need to be de ned in both Lex and Yacc code. Lex recognizes a token when it gets one, and Lex forwards the occurrence of the token to Yacc. You should make sure the consistency of the token de nitions in both Lex and Yacc code. You are welcome to add/modify the token de nitions in the given Lex code.
Some tips for token de nition (in Yacc) are listed below:
Declare tokens using %token .
The name of grammar rule, which is not declared as a token, is assumed to be a nonterminal.
1.1.2 Types
Type refers to one of the μGo data types: integer, oat, string and boolean. Useful tips for de ning a type are listed below.
De ne a type for yylval using %union by yourself. For example,
%union { int i_val; } means yylval is able to be accessed via the int type using the i_val variable.
De ne a type for token using %type and give the type name within the less/greater than symbols, < / > ; for example, %type <i_val> INT_LIT means the token INT_LIT has
The concept of CFG (context-free grammar) that you learned in the courses should be used to design the grammar for print IO, arithmetic operations and basic constructs. The conversion from the productions of a CFG to the corresponding Yacc rules is illustrated as below.
Grammar productions for A:
A → B1B2...Bm
A → C1C2...Cn A → D1D2...Dk
Yacc rules:
A
: B1B2...Bm
| C1C2...Cn
| D1D2...Dk
;
Hint: The link is ANSI C grammar rules, you could design your parser grammar base on it.
1.2.2 Actions
An action is C statement(s) that should be performed as soon as the parser recognizes the production rule from the input stream. The C code surrounded by { and } is able to handle input/output, call sub-routines, and update the program states. Occasionally, it is useful to put an action in the middle of a rule. The following code snippet shows that integer literal will be printed out after token INT_LIT is recognized.
Your Yacc code needs to detect semantic errors during parsing the given μGo code. When errors occur, your parser should detect and display error messages upon the termination of the parsing procedure. The messages will include the type of the semantic error and the line number of the code that causes the error.
To be precise, in this assignment, you should at least handle the following four cases:
1. Variable errors:
Operate on any undeclared variable
Re-de ne any existed variable
2. Type errors:
Handle modulo operation ( % ) involving any oating point number or variable
Handle simple type checking for the mismatching (e.g., 3 + 3.14 ) and the condition of
"if" and "for" statements that the values must be the boolean type
Hint: Your %union may need to use struct with some elds to record the types of the value.
Symbol table needs to be built in the Yacc program so as to perform the following tasks.
1. Create a symbol table when entering a new scope. create_symbol
2. Insert an entry for a variable declaration. insert_symbol
3. Look up an entry in the symbol table. lookup_symbol
4. Dump all contents in the symbol table of current scope and its entries when exiting a scope. dump_symbol
Hint: You may add some data elds in the table to facilitate semantic error handling or scoping check.
Hint: You may need to link and organize multiple tables as the operation of a stack.
2.2 Scope Level
The global scope level is zero and is increased by one when entering a new block. When the program leaving a block, you need to dump the symbol table of current level then decrease the level by one. You can nd the example at section 2.4 below.
Note that the level indecate the depth of the scope; however the scopes with the same level cannot communicate with each other. For example, in example 2.4, the scope levels of A and C are 2, but we cannot access the oating-point veriable width in C.
2.3 Table Fields
The structure of the example symbol table is listed below:
Index: the variable index in attached symbol table, and should be unique in that symbol table.
Name: the name of the variable.
Type: the type of the variable.
Addr: abbreviation of address, it should be unique in the whole program. Note that the Addr of function is always -1 .
Lineno: the line number where de ne the variable.
Func_sig: the function signature contains return type and type of parameters. For example, the prototype func foo(x int32, y float32, g int32) int32 should be recorded as (IFI)I in Func_sig.
Index
Name
Type
Addr
Lineno
Func_sig
0 x int32 0 1 -
1 y oat32 1 2 -
1. Handle arithmetic operations, where brackets and precedence should be considered. (20pt, in01 , in02 )
2. Implement the scoping check function in your parser. To get the full credits for this feature, your parser is expected to correctly handle the scope of the variables de ned by the μGo language. (10pt, in03 )
3. Support the variants of the assignment operators. (i.e., = , += , -= , *= , /= , %= ) (10pt, in04 )
4. Handle the type conversion between integer and oating-point. (10pt, in05 )
5. Support if statements. (10pt, in06 )
6. Support for statements. (10pt, in07 )
7. Detect semantic error(s) and display the error message(s). The parser should display at least the error type and the line number. (20pt, in08 , in09 )
8. Support function de ne and function call. (10pt, in10 )
9. Support switch statements. (10pt, in11 )
3.1 Example
Example input code and the expected output from your parser.
!!! Incorrect format wi
l l lose 10pt. !!!
5. Appendix: μGo Speci cation
In this speci cation, the syntax is speci ed using Extended Backus-Naur Form (EBNF). The following
A type determines a set of values together with operations and methods speci c to those values.
"int32": the set of all signed 32-bit integers (-2147483648 to 2147483647)
" oat32": the set of all IEEE-754 32-bit oating-point numbers
"string": a (possibly empty) sequence of bytes
"bool": the set of Boolean truth values denoted by the predeclared constants true and false
Note : Arithmetic operations are written in in x notation, and the precedence for the operators is de ned as below (the smaller number has the higher precedence).
Category
Operators
Precedence
Unary +
-
!
1
Multiplication *
/
%
2
Addition +
-
3
Comparison <
>
<= >= == !=
4
Logical AND &&
5
Logical OR ||
6
Note: 1. The expression within () needs to be evaluate rst. 2. ++ -- = += -= *=
/= %= are invalid in an expression. 3. ! , && , and || are only for boolean type and boolean type does not support arithmetic operation, e.g., multiplication, addition, and comparison. 4. % is only for integer numbers.
A conversion changes the type of an expression to the type speci ed by the conversion.
A variable declaration creates one variables, binds corresponding identi ers to them, and gives a type and an initial value.
Example:
Assignments statements
Each left-hand side operand must be addressable.
In μGo, an expression statement is composed of an Expression, which consists of several expressions with operators (Section 5.2).
Example:
The "++" and "--" statements increment or decrement their operands by the untyped constant 1.
As with an assignment, the operand must be addressable. You can assume that the Expression in
i-- , and 3++ .
Example:
Block
A block is a possibly empty sequence of declarations and statements within matching brace brackets.
"If" statements specify the conditional execution of two branches according to the value of a boolean expression. If the Condition evaluates to true, the "if" branch is executed, otherwise, if present, the "else" branch is executed.
Example:
A "for" statement speci es repeated execution of a block. There are two forms: the iteration may be controlled by a single condition or a "for" clause.
Example:
A "switch" statement is a shorter way to write a sequence of if-else statements. It runs the rst case whose value is equal to the condition expression.
SwitchStmt = "switch" Expression Block
CaseStmt = ( "case" INT_LIT | DEFAULT ) ":" Block
Example:
A declaration begins with the func keyword, followed by the name you want the function to have, a pair of parentheses (it takes zero or more arguments), and then a block containing the function's code.
Implementation restriction: "print" and "println" need not to accept arbitrary argument types, but printing of boolean, numeric, and string types must be supported.