EE4211-Project 2- Deliverables & Naming Conventions Solution

Your shopping cart is empty.

In this project, you are given a dataset collected by an actual IoT system (see description below) and asked to use the dataset to build a forecasting model. You have to answer a set of questions (there are no fixed answers), as well as propose your own interesting questions.
1. Form teams in groups of 4 students and tegister your group under NUS Luminus → EE4211/TEE4211 → Class & Groups → Class Groups. Take note of your group number. If you face any difficulties, please contact the teaching team.
2. For each of the 3 sub-parts below, use a iPython notebook notebook (ipynb file) to do the analysis and answer all the parts of the Question. Use markdown in the ipynb file itself to elaborate and provide your answers to the questions. The iPython notebook should form your report (i.e., your report should not be a separate document file).
3. For each of the 3 sub-parts below, submit a single zip file containing (i) PDF file/Print

preview of your Jupyter notebook, (ii) the original Jupyter notebook with all your code (ipynb file), (iii) any additional data files required to run the notebook.
5. In summary, the project carries a total of 40 marks. There are 4 deliverables: Question 1 including group project proposal (10 marks), Question 2 (10 marks), Question 3 (10 marks), and Presentation (10 marks).
Target Data (Predicted/Output/Response Variable) Description:

An example of using Python to make a data.gov.sg API call for a single time instance is shown in the provided sample code: “EE4211-ExampleAPI.ipynb”. You will have to modify the provided sample code (or write your own code) to collate data from multiple time instances together.
Note that the data.gov.sg API returns the data as a JSON (JavaScript Object Notation) object. The provided sample code transforms this JSON object into a pandas dataframe. An example of the data from the provided sample code is shown below:

Questions:

1. Data Cleaning & Exploring the Data (10 marks)
1.1 Look at the features in the dataset. What does lot type mean? Hint: Note that data.gov.sg gets its data from the Land Transport Authority (LTA). Try searching for the LTA Datamall API documentation.
1.4 Generate hourly readings from the raw data. Select a one month interval and plot the hourly data (time-series) for that interval (aggregate results instead of plotting for each location individually). Identify any patterns in the visualization. Note: You will have to decide what to do if there are no carpark readings for a certain hour, for example, should you impute the missing data or ignore it.
1.5 Intuitively, we expect that carpark availability across certain carparks to be correlated. For example, many housing carparks would experience higher carpark availability during working hours. Using the same interval chosen in 1.4, write a function to find the top five carparks with which it shows the highest correlation. Demonstrate an example of this function call using a randomly selected carpark.
2. Forecasting (10 marks)
2.3 Do the same as Question 2.2 above but use support vector regressor (SVR).
2.4 Do the same as Question 2.2 above but use decision tree (DT) regressor.
2.5 Make a final recommendation for the best regression model (out of the 3 methods above) by choosing a suitable performance metric. To ensure a fair comparison, carry out hyperparameter tuning for all 3 methods. Then, make a final recommendation selecting only one model. Include both quantitative and qualitative arguments for your choice.
3. Group Proposed Project (10 marks)
3.2 Based on the insights derived from the analysis, suggest a practical action that can be taken (i.e., an action that can be taken to benefit society. Do not suggest actions such as hyperparameter tuning here).
4. Presentation (10 marks)
4.1 Prepare slides and a video presentation regarding your group’s contribution to Question 3, the group proposed project. The presentation should cover the analysis done in Question 3: Group Proposed Project. Note: Do not cover Question 1 and 2 in the presentation.
4.2 Slides: Limit the number of slides to 15 slides maximum.
4.3 Video: Make a 10-12 minute video for your group’s presentation. Each group member must present in the video. Please convert your video to mp4 format with a minimum resolution of 480p.
For your benefit, here are some pointers from former students in the class:
1. The project can be an opportunity to talk about data science in interviews, use the project wisely.
2. Keep the explanations concise and use subplots to compress the plots if there are too many plots.
3. To increase the reproducibility of results, Google Collab is a good option (which allows collaboration too).
4. Showing the version of python packages that are used can increase the reproducibility score of the project.
5. Setting random seeds when using randomized operations like train test split will also help with reproducing the results.
6. Start the projects early because it is too much work to rush last-minute.
7. Do the projects yourself, experiment and understand because you will learn a lot.

Shopping cart

US$0

EE4211-Project 2- Deliverables & Naming Conventions Solution

More products