$25
For this project you will be writing Datalog queries and will be developing a non-recursive Datalog interpreter. The EBNF rules that will be used for the dialect of Datalog are in the following section. For this project Datalog queries will be constructed of a set of rules that will each have a rule head. Each rule head specifies the rule name and a set of variables. Each rule that does not have a rule body is considered a fact (also known as external rule); these facts will be backed by CSV data files that have the tuples that specify when the rule is true. Any other tuples do not satisfy the fact rules. Rules that have a rule body are to be calculated and attempt to find the values that will satisfy all of the subgoals. If multiple rules exist with the same name, then rule invocations of that name are considered to be a union of the multiple rules. The goal of the Datalog interpreter is to find all tuples that satisfy all of the rules specified, with the base values coming from the fact rules.
Non-Recursive Datalog Programming Language
The following EBNF describes the non-recursive Datalog language. The Non-terminals on the right are identified by italics. Literal values are specified in bold. The operators not in bold or italics describe the options of the EBNF. These operators are {} for repetition, [] for optional, () for grouping, and | for or.
EBNF Rules:
Query := Rule { Rule }
Rule := [ RuleHead [ := RuleBody ] ] NewLine
RuleHead := RuleName ( HeadVariableList )
RuleBody := SubGoal { AND SubGoal }
RuleName := Identifier
HeadVariableList := Identifier { , Identifier }
SubGoal := RuleInvocation | NegatedRuleInvocation | EqualityRelation
RuleInvocation := RuleName ( BodyVariableList )
NegatedRuleInvocation := NOT RuleInvocation
EqualityRelation := InequalityRelation { EQOperator InequalityRelation }
BodyVariableList := InvocationVariable { , InvocationVariable }
InequalityRelation := Term { IEQOperator Term }
EQOperator := != | =
InvocationVariable := Identifier | EmptyIdentifier
Term := SimpleTerm { AddOperator SimpleTerm }
IEQOperator := < | | <= | =
SimpleTerm := UnaryExpression { MultOperator UnaryExpression }
AddOperator := + | -
UnaryExpression := ( UnaryOperator UnaryExpression ) | PrimaryExpression
MultOperator := * | / | %
UnaryOperator := ! | - | +
PrimaryExpression:= ( EqualityRelation ) | Constant | Identifier
Constant := IntConstant | FloatConstant | StringConstant
Token Rules:
Identifier := Alpha { ( Digit | Alpha ) }
EmptyIdentifier := _
IntConstant := Digit { Digit }
FloatConstant := Digit { Digit } [ . Digit { Digit } ]
StringConstant := " { ( CharacterLiteral | EscapedCharacter ) } "
Comment := # { ( WhiteSpace | Printable ) } NewLine
Digit := 0 – 9
Alpha := A – Z | a - z
WhiteSpace := Tab | CarriageReturn | \ NewLine | Space
Printable := Space - ~
CharacterLiteral := Space - ! | # - [ | ] - ~ EscapedCharacter := \b | \n | \r | \t | \\ | \' | \" Semantics:
There are several requirements for the non-recursive Datalog queries that cannot be expressed in the EBNF. The following must be true of all valid queries:
• All subgoals must be defined prior to their use
• All repeated rule names must be defined contiguously
• All repeated rule names must have the same number of variables
• Any variable in the rule head or rule body must appear in a non-negated rule invocation
• A rule invocation may not appear in its own rule definition (i.e. recursive definition)
• Fact rules may only appear once
• Every rule invocation must provide the same number of variables as its definition
CSV Data Files
The facts can be stored in a CSV files. The name of the file without .csv extension will be the name of the facts loaded into the environment. Each column is a different attribute, with the header row containing the attribute name. Four data types are allowed in the CSV files: string, float, integer, and boolean. Rows two and on, each have one tuple in them.
The following is an example of fact R that would be in the file R.csv:
"a"
"b"
"c"
"d"
3
"Hello"
3.4
true
4
"World"
1.1
false
6
"Goodbye"
8.8
false
7
"None"
9.3
true
Datalog Examples
The following are a few Datalog query examples using the grammar described in the previous sections using the R fact described.
Simple fact query:
R(a,b,c,d)
This will result in the following output:
a b c d 3 Hello 3.4 true
4 World 1.1 false
6 Goodbye 8.8 false 7 None 9.3 true
Simple filtering for a being greater than 5:
R(a,b,c,d)
S(x) := R(x,_,_,_) AND x 5
This will result in the following output:
x 6
7
Simple filtering for c being greater than 8.0 or less than 3.0:
R(a,b,c,d)
S(x) := R(_,_,x,_) AND x 8.0
S(x) := R(_,_,x,_) AND x < 3.0
This will result in the following output:
x 1.1 8.8
9.3
Find all b associated with c not being the smallest in R:
R(a,b,c,d)
S(x) := R(_,x,c1,_) AND R(_,_,c2,_) AND c1 c2
This will result in the following output:
x Hello
Goodbye
None
Datalog Queries
Write Datalog queries for the questions posed below. Name your query files query_X.nrdl where X is the number of the question below. You may assume the following facts will be available:
Product(maker, model, year)
Car(model, city, highway, style, passengers, trunk, msrp)
Pickup(model, city, highway, passengers, cargo, towing, msrp)
EV(model, range, battery, passengers, msrp)
1) What Car models have a highway less than 35miles? The final rule head should be Answer(model).
2) Find all of the Pickup models that have a cargo capacity of at least 75cu ft. and a highway fuel economy less than 25MPG. The final rule head should be Answer(model).
3) Find all automakers that sell at least one vehicle that msrp less than $27,000 and at least one vehicle greater than $55,000. The final rule head should be Answer(maker).
4) Find the passenger capacities that exist for three or more vehicles. That is three or more different vehicles should have that passenger capacity. The final rule head should be Answer(passengers).
5) Find the automaker(s) of the highest combined fuel economy (55% city, 45% highway) of conventional vehicles (cars and pickups). The final rule head should be Answer(maker).
6) Find the vehicle model with the highest miles per gallon gasoline equivalent (MPGGE). For this problem assume combined fuel economy formula from above, and that a gallon of gasoline is equivalent to 33.1kWh. The final rule head should be Answer(model).
7) Find automaker(s) that sell a pickup with a highway fuel economy lower than all the cars it sells. The final rule head should be Answer(maker).
Datalog Interpreter
The Datalog interpreter will be constructed from several classes. A given NRDatalog class provides a class that can parse command line arguments and instantiate and call the appropriate classes. The following classes should be created with at least the specified interface.
// Class that parses the non-recursive Datalog language public class NRDatalogParser{ // Constructor for the parser
public NRDatalogParser(PeekableCharacterStream stream); // Parses a query and returns true if valid query public boolean parseQuery();
// Prints out the error if one has occurred public void printError(PrintStream ostream);
}
// Class that will construct the parse tree from public class NRDatalogParseTree extends NRDatalogParser{
// Constructor for the parse tree
public NRDatalogParseTree(PeekableCharacterStream stream); // Parses a query and returns true if valid query public boolean parseQuery();
// Prints out the error if one has occurred public void printError(PrintStream ostream); // Outputs a tree of the parsed query
public void outputParseTree(PrintStream ostream);
}
// Class that will construct and execution tree from the parse
// tree. Executes the query and displays the results. public class NRDatalogExecutionTree extends NRDatalogParseTree{
// Constructor for the execution tree
public NRDatalogExecutionTree(PeekableCharacterStream stream);
// Parses a query and returns true if valid query public boolean parseQuery();
// Prints out the error if one has occurred public void printError(PrintStream ostream);
// Outputs a tree of how execution would occur for queries public void outputExecutionTree(PrintStream ostream);
// Sets the verbosity setting for execution public void setVerbose(boolean verb); // Sets the number of threads during execution public void setThreadCount(int threadcount);
// Sets the path to the data files public void setDataPath(String datapath);
// Executes the parsed query public boolean executeQuery();
}