$30
In this competition project, you need to build a recommendation system (e.g., a hybrid recommendation systems) to provide accurate predictions.
2. Competition Requirements
2.1 Programming Language and Library Requirements
a. You must use Python to implement the competition project. You can use the Python libraries that are available on the Vocareum.
b. You can re-use your own code in assignment 3. If you want to use Spark, please specify the following environment in your code:
2.2 Programming Environment
Python 3.6 and Spark 2.3.0
We will use Vocareum to automatically run and grade your submission. You must test your scripts on the local machine and the Vocareum terminal before submission.
2.3 Write your own code
Do not share code with other students!!
For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code!
TAs will combine all the code we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism to the university.
3. Yelp Data
In this assignment, we generated the review data from the original Yelp datasets with some filters, such as the condition: “state” == “CA”. We randomly took 80% of the data for training, 10% of the data for testing, and 10% of the data as the blind dataset. We do not share the blind dataset.
You can access the files (a-e) under the fixed directory on the Vocareum: resource/asnlib/publicdata/ a. train_review.json
b. user.json – user metadata
c. business.json – business metadata, including locations, attributes, and categories
d. user_avg.json – containing the average stars for the users in the train dataset
e. business_avg.json – containing the average stars for the businesses in the train dataset
Besides, the Google Drive provides the above files (a-e) and the following testing files (f and g) https://drive.google.com/open?id=1ss6Tq-hxeRfyst8u-n8Tx8Ykn1jD8GB8 (USC email only)
f. test_review.json – containing only the target user and business pairs for the prediction task
g. test_review_ratings.json – containing the ground truth rating for the testing pairs
4. Task (5 points)
You need to submit the following files on Vocareum: (all lowercase) a. [REQUIRED] Two Python scripts: train.py, predict.py
b. [REQUIRED] Model files/folders (you can name them yourself)
c. [REQUIRED] One PDF file: model.pdf (describing your model in 200 words)
d. You can include other Python scripts to support your programs (e.g., callable functions).
4.1 Task description
In the competition project, you will build a recommendation system with the provided datasets on the Vocareum and use the model(s) to predict the ratings for a given pair of user and business.
4.2 Execution commands
Training commands: $ python3 train.py
Predicting commands: $ python3 predict.py <test_file> <output_file>
Param
<test_file>: containing the target pairs for prediction, e.g., test_review.json
<output_file>: the prediction results
4.3 Output format:
You must write a target pair and its prediction in the JSON format using exactly the same tags as the example in Figure 1. Each line represents for a predicted pair of (“user_id”, “business_id”).
Figure 1: An example output in JSON format