$25
The goal of this project is to give you experience in writing a top-down recursive descent parser and to get introduced to the basics of symbol tables for nested scopes.
We begin by introducing the grammar of our language. Then we will discuss the semantics of our language that involves lexical scoping rules and name resolution. Finally we will go over a few examples and formalize the expected output.
NOTE: This project is significantly more involved than the first project. You should start on it immediately. For Monday, I expect that you have read the project description completely at least a couple times and maybe started on your parser.
1. Lexical Specification
Comments and Space
In addition to these tokens, our input programs might have comments that should be ignored by the lexical analyzer. A comment starts with // and continues until a newline character is encountered. The regular expressions for comments is: // (any)* \n in which any is defined to be any character except \n . Also, like in the first project, your lexical analyzer should skip space between tokens.
2. Grammar
Here is the grammar for our input language:
about comments, but our lexical analyzer would deal with comments. This is similar to handling of spaces by the lexer, the lexer skips the spaces. In a similar fashion, your lexer should skip comments.
We highlight some of the syntactical elements of the language:
Global variables are optional
The scopes have optional public and private variables
Every scope has a body which is a list of statements
A statement can be either a simple assignment or another scope (a nested scope)
3. Scoping and Resolving References
Here are the scoping rules for our language:
The public variables of a scope are accessible to its nested scopes
The private variables of a scope are not accessible to its nested scopes
Lexical scoping rules are used to resolve name references
Global variables are accessible to all scopes
Every reference to a variable is resolved to a specific declaration by specifying the variable's defining scope. We will use the following notation to specify declarations:
If variable a is declared in the global variables list, we use ::a to refer to it
If variable a is declared in scope b , we use b.a to refer to it
And if reference to name a cannot be resolved, we denote that by ?.a
Here is the example program from the previous section, with all name references resolved (look at the comments):
4. Examples
The simplest possible program would be:
Let's add a global variable:
5. Expected Output
message and it's case-sensitive.
In case the input follows the grammar:
For every assignment statement in the input program in order of their appearance in the program, output the following information:
The resolved left-hand-side of the assignment
The resolved right-hand-side of the assignment in the following format:
NOTE: You can assume that scopes have unique names and variable names in a single scope (public and private) are not repeated.
For example, given the following input program:
The expected output is:
6. Implementation
Start by modifying the lexical analyzer from previous project to make it recognize the tokens required for parsing this grammar. It should also be able to handle comments
Next, write a parser for the given grammar. You would need one function per each nonterminal of the grammar to handle parsing of that non-terminal. I suggest you use the following signature for these functions:
Where X would be replaced by the target non-terminal. The lexical analyzer object needs to be accessible to these functions so that they can use the lexer to get and unget tokens. These functions can be member functions of a class, and the lexer object can be
producing the required output.
Write a function that resolves the left-hand-side and right-hand-side of all assignments and produces the required output. Call this function in your main() function after successfully parsing the input.
NOTE: you might need more time to finish the last step compared to previous steps.
7. Requirements
You should use C/C++, no other programming languages are allowed.
You should test your code on CentOS 6.7
You should submit your code on the course submission website, no other submission forms will be accepted.
8. Evaluation
The submissions are evaluated based on the automated test cases on the submission website. Your grade will be proportional to the number of test cases passing. If your code does not compile on the submission website, you will not receive any points.
Here is the breakdown of points for tests in different categories:
Parsing (including inputs with syntax errors and error-free inputs): 50 points
Name resolution: 50 points
Test cases containing comments: 5 points extra credit
The parsing test cases contain cases that are syntactically correct and cases that have syntax errors. If a syntax test case has no syntax error, your program passes the test case if the output is not Syntax Error . If a syntax test case has syntax error, your program passes the test case if the output is Syntax Error .
Note that if your program prints the syntax error message independently of the input, for example: