Starting from:

$35

ITE4005 Data Science Programming Assignment #4 Solved

 Data Science (ITE4005) 
Programming Assignment #4
 

1.  Environment l OS: Windows, Mac OS, or Linux l Languages: C, C++, C#, Java, or Python (any version is ok)

 

2.  Goal: Predict the ratings of movies in test data by using the given training data containing movie ratings of users. You can choose any algorithm to predict (ex. content-based and collaborative filtering algorithms). For a content-based algorithm, you can refer to the web page to get the content related to our training and test data (http://grouplens.org/datasets/movielens/). 

(Note) This assignment is to predict ratings for each user-item pair only within test data. 

 

3. Requirements
The program must meet the following requirements:

l  Execution file name: recommender.exe or recommender.py

l  Execute the program with two arguments: training data name, test data name n  Example:

  

                    -      training data name = ‘u1.base’, test data name = ‘u1.test’

l  File format for a training data

[user_id]\t[item_id]\t[rating]\t[time_stamp]\n 

[user_id]\t[item_id]\t[rating]\t[time_stamp]\n 

[user_id]\t[item_id]\t[rating]\t[time_stamp]\n [user_id]\t[item_id]\t[rating]\t[time_stamp]\n 

... 

n  Row: a record that was already rated by a user for an item n       Example:

  

Figure 1. An example of a training data.  

n  Five training data will be provided: ‘u1.base’, ‘u2.base’, ‘u3.base’, ‘u4.base’, and ‘u5.base’

l  File format for a test data

[user_id]\t[item_id]\t[rating]\t[time_stamp]\n 

[user_id]\t[item_id]\t[rating]\t[time_stamp]\n 

[user_id]\t[item_id]\t[rating]\t[time_stamp]\n [user_id]\t[item_id]\t[rating]\t[time_stamp]\n 

...

n  Row: a record that needs to be predicted by using your algorithm n          Example:

  

Figure 2. An example of a test data.  

n  Five test data will be provided: ‘u1.test’, ‘u2.test’, ‘u3.test’, ‘u4.test’, and ‘u5.test’

l  Output file format n       You must print an output file for each test data n      File format for the output of ‘u#.test’

                    -      ‘u#.base_prediction.txt’

[user_id]\t[item_id]\t[rating]\n 

[user_id]\t[item_id]\t[rating]\n 

[user_id]\t[item_id]\t[rating]\n [user_id]\t[item_id]\t[rating]\n 

... n ‘u#.base_prediction.txt’ should contain all user-item pairs in the test data and ratings that were predicted for the pairs by using your algorithm 

n  Supposed to follow the naming scheme for the output file as above 

 

4. Evaluation measure
l  Compute the difference between each predicted data (u1~u5.base_prediction.txt) and each test data (u1~u5.test)

 

l  Test method n For testing, we will use a measure called RMSE (Root Mean Square Error) defined as follows 

 ∑𝒏 (𝒑 − 𝒂 )𝟐

                                                      𝐑𝐌𝐒𝐄 =        ’ 𝒊$𝟏      𝒊    𝒊   

𝒏

(𝒑𝒊: predicted rating for item i, 𝒂𝒊: original rating for item i, n: the number of ratings)

n    Because RMSE means error rates, the bigger value means that the ratings are predicted more incorrectly 

 

5.  Note l This is a competition project

l If the accuracy of your model is higher, you will get a higher score n We will first give a minimum score at least 80 if (1) you submit your program before the deadline, (2) your program is correctly performed without any errors, and (3) all requirements for this project are satisfied.

             n    Then, we will assign the additional scores from 0 to 20 based on your rank.

 

6.  Submission l Please submit the program files and the report to GitLab n Report 

-         File format must be *.docx, *.doc, *.hwp, *.pdf, or *.odt. 

-         Guideline ü Summary of your algorithm 

ü  Detailed description of your codes (for each function) 

ü  Instructions for compiling your source codes at TA's computer (e.g. screenshot) (Important!!) ü       Any other specification of your implementation and testing

             n    Program and code

                    -      An executable file

ü    If you are in the following two cases, please submit alternative files (e.g., .py file, makefile)

1.      You cannot meet the requirements (.exe file) of the programming assignment due to your computing environment (ex. Mac OS or Linux)

2.      You are using python for implementing your program ü You MUST SUBMIT instructions for compiling your source codes. If TAs read your instructions but cannot compile your program, you will get a penalty. Please, write the instructions carefully.

                    -      All source files

 

6. Testing program  
l  Please put the following files in a same directory: Testing program (PA4.exe), your output files (u1.base_prediction.txt), given test file (u1.test)

  

l  Execute the testing program with one argument (input file name)

  l     Check your RMSE for each input file

 

7. Penalty  l This assignment does not allow late submission!!

       l    Significant penalty up to 30% will be given when the requirements are not satisfied

 

More products