1. Predict the ratings of movies in test data by using the given training data containing movie ratings of users. You can choose any algorithm to predict (ex. content-based and collaborative filtering algorithms). For a content-based algorithm, you can refer to the web page to get the content related to our training and test data (http://grouplens.org/datasets/movielens/).
(Note) This assignment is to predict ratings for each user-item pair only within test data.
3. Requirements
The program must meet the following requirements: l Execution file name: recommender.exe
l Execute the program with two arguments: training data name, test data name n Example:
- training data name = ‘u1.base’, test data name = ‘u1.test’
l File format for a training data
[user_id]\t[item_id]\t[rating]\t[time_stamp]\n [user_id]\t[item_id]\t[rating]\t[time_stamp]\n
n Row: a record that was already rated by a user for an item n Example:
Figure 1. An example of a training data.
n Five training data will be provided: ‘u1.base’, ‘u2.base’, ‘u3.base’, ‘u4.base’, and ‘u5.base’
l File format for a test data
[user_id]\t[item_id]\t[rating]\t[time_stamp]\n [user_id]\t[item_id]\t[rating]\t[time_stamp]\n
n Row: a record that needs to be predicted by using your algorithm n Example:
Figure 2. An example of a test data.
n Five test data will be provided: ‘u1.test’, ‘u2.test’, ‘u3.test’, ‘u4.test’, and ‘u5.test’
l Output file format n You must print an output file for each test data n File format for the output of ‘u#.test’
- ‘u#.base_prediction.txt’
[user_id]\t[item_id]\t[rating]\n [user_id]\t[item_id]\t[rating]\n
... n ‘u#.base_prediction.txt’ should contain all user-item pairs in the test data and ratings that were predicted for the pairs by using your algorithm
n Supposed to follow the naming scheme for the output file as above
4. Evaluation measure
l Compute the difference between each predicted data (u1~u5.base_prediction.txt) and each test data (u1~u5.test)
l Test method n For testing, we will use a measure called RMSE (Root Mean Square Error) defined as follows
∑𝒏 (𝒑 − 𝒂 )𝟐
𝐑𝐌𝐒𝐄 = ’ 𝒊$𝟏 𝒊 𝒊
(𝒑𝒊: predicted rating for item i, 𝒂𝒊: original rating for item i, n: the number of ratings)
n Because RMSE means error rates, the bigger value means that the ratings are predicted more incorrectly