Starting from:

$125

INFO411 Project 8- Data Mining and Knowledge Discovery Solution

THE AIR POLLUTION PREDICTION OF UNITED STATES:
THE CROSS INDUSTRY STANDAR PROCESS FOR DATA MINING
(CRISP-DM) FRAMEWORK
Instructions:
This task is a real-world data mining problem. You are required to prepare a set of presentation slides that must include (1) the full name and student number of each student in the group, the contribution (in percent) of each group member, (2) your proposed data mining approach and methodology; (3) the strengths and weaknesses of your proposed approach; (4) the performance measures that can evaluate your data mining results; (5) the results and a brief discussion. Below is the recommended structure of your slides:
• Introduction (define the problem and the goal)
• Methods (propose approaches, and discuss their strengths and weaknesses)
• Results (Figures and tables of data analysis)
• Discussion (discovered knowledge from data mining)
Task: Air pollution prediction in the United States
Background: The US records daily ozone, SO2, CO and NO2 levels in several counties of every state. The data set for this task contains the yearly summary data for these readings, and associated meteorological data such as air quality index (AQI) and particulate matter (PM) index. The data are available from https://aqs.epa.gov/aqsweb/airdata/download_files.html#Annual.
Requirements:
1. Explore the relationships between air pollution (this could be what you judge to be “bad” AQIdays, or high median/high max AQI, or another criterion of your own definition), the meteorological variables and the states.
2. Present relevant visualisations of the data, which help to illustrate the relationships, trends anddifferences found in the previous items.
3. Develop models to predict the number of days of PM > 2.5 concentrations using the rest of the meteorological data. Two of these that you develop should be the standard linear model and the random forest.
4. Provide the performance evaluation of any fitted models, including details of cross-validation or splitting into training, validation and/or testing sets.
5. Present your interpretations and conclusions.

More products