Starting from:

$35

NCKU-Lexical Definition Solved

Your assignment is to write a scanner for the μGo language with lex. This document gives the lexical definition of the language, while the syntactic definition and code generation will follow in subsequent assignments.
Your programming assignments are based around this division and later assignments will use the parts of the system you have built in the earlier assignments. That is, in the first assignment you will implement the scanner using lex, in the second assignment you will implement the syntactic definition in yacc, and in the last assignment you will generate assembly code for the Java Virtual Machine by augmenting your yacc parser.
This definition is subject to modification as the semester progresses. You should take care in implementation that the codes you write are well-structured and able to be revised easily.
1. μGo Language Features
We highlight the features of μGo by comparing it with C language. It is very important to note that
 
 
tokens that will be passed to the parser, and
tokens that will be discarded by the scanner (e.g., recognized but not passed to the parser).
2.1 Tokens that will be passed to the parser
The following tokens will be recognized by the scanner and will be eventually passed to the parser.
2.1.1 Delimiters
Each of these delimiters should be passed back to the parser as a token.
Delimiters
Symbols
Parentheses    ( ) { } [ ]
Semicolon    ;
Comma    ,
Quotation    " "
Newline    \n
2.1.2 Arithmetic, Relational, and Logical Operators
Each of these operators should be passed back to the parser as a token.
Operators
Symbols
Arithmetic    + - * / % ++ --
Relational    < > <= >= == !=
Assignment    = += -= *= /= %=
Logical    && || !
2.1.3 Keywords
Each of these keywords should be passed back to the parser as a token.
The following keywords are reserved words of μC:
Types
keywords
Data type    int32 float32 bool string
Conditional    if else for
Variable declaration    var
Build-in functions    print println
Functional    func return package
Switch    switch case default
2.1.4 Identifiers
An identifier is a string of letters ( a ~ z , A ~ Z , _ ) and digits ( 0 ~ 9 ) and it begins with a letter or underscore. Identifiers are case-sensative; for example, ident , Ident , and IDENT are not the same identifier. Note that keywords are not identifiers.
2.1.5 Integer Literals and Floating-Point Literals
Integer literals: a sequence of one or more digits, such as 1 , 23 , and 666 .
Floating-point literals: numbers that contain floating decimal points, such as 0.2 and 3.141 .
2.1.6 String Literals
A string literal is a sequence of zero or more ASCII characters appearing between double-quote
( " ) delimiters. A double-quote appearing with a string must be written after a " , e.g., "abc" and "Hello world" .
2.2 Tokens that will be discarded
The following tokens will be recognized by the scanner, but should be discarded, rather than returning to the parser.
2.2.1 Whitespace
A sequence of blanks (spaces), tabs, and newlines.
2.2.2 Comments
Comments can be added in several ways:
     C-style is texts surrounded by /* and */ delimiters, which may span more than one line;      C++-style comments are a text following a // delimiter running up to the end of the line.
Whichever comment style is encountered first remains in effect until the appropriate comment close is encountered. For example,
// this is a comment // line */ /* with /* delimiters */ before the end and
/* this is a comment // line with some /* and C delimiters */ are both valid comments.
2.2.3 Other characters
The undefined characters or strings should be discarded by your scanner during parsing.
3. What should Your Scanner Do?
3.1 Assignment Requirements
We have prepared 11 μGo programs, which are used to test the functionalities of your scanner.
     Each test program is 10pt and the total score is 110pt. You will get 110pt if your scanner successfully generates the answers for all eleven programs. Note that the TA will prepare hidden test cases to verify that your scanner is not hardcoded to the attached inputs and outputs. For the hardcoded case, you will get 0pt.  judge program to get the testing score by typing judge in your terminal.
     The output messages generated by your scanner must use the given names of token classes listed below.
Symbol
Token
    Symbol
Token
    Symbol
Token
+
ADD        &&
LAND        print
PRINT
-
SUB        ||
LOR        println
PRINTLN
*
MUL        !
NOT        if
IF
/
QUO        (
LPAREN        else
ELSE
%
REM        )
RPAREN        for
FOR
++
INC        [
LBRACK        int32
INT
--
DEC        ]
RBRACK        float32
FLOAT
>
GTR        {
LBRACE        string
STRING
<
LSS        }
RBRACE        bool
BOOL
>=
GEQ        ;
SEMICOLON        true
TRUE
<=
LEQ        ,
COMMA        false
FALSE
==
EQL        "
QUOTA        var
VAR
!=
NEQ        \n
NEWLINE            
=
ASSIGN        :
COLON        func
FUNC
+=
ADD_ASSIGN        Int Number    INT_LIT        package
PACKAGE
-=
SUB_ASSIGN        Float Number    FLOAT_LIT        return
RETURN
*=
MUL_ASSIGN        String Literal    STRING_LIT        switch
SWITCH
/=
QUO_ASSIGN        Identifier    IDENT        case
CASE
%=
REM_ASSIGN        Comment    COMMENT        default
DEFAULT
3.2 Example of Your Scanner Output
The example input code and the corresponding output that we expect your scanner to generate are as follows.
 
3.3 How to debug
Compile source code and feed the input to your program, then compare with the ground truth.

More products