$25
Task A
Before you start statistical analysis, you have to dene hypotheses, which will be tested. You should state at least dierent hypotheses, each to test dierent data (so not all hypotheses should be checking the same statement just on dierent variables). Remember that there are dierent types of tests and you should use as many as you can (given if they are valid and make sense). Your ultimate goal is to report some ndings. You should also prove that these ndings are statistically correct. Take the below points as hints but do not limit yourself to these:
Look at dierent plots you have created during exploratory analysis. What conclusions can be drawn based on these? These could become your hypotheses.
If you focus on one attribute, what is your intuition about the distribution that could explain such results? You can check and measure how well the data ts some distribution.
For each valid hypothesis test you will get marks. This section consists of marks in total.
Remember that data analysis is not only about nding and proving hypotheses but also about summarising data and communicating it.
It is not a failure if you do not get ”signicant” results, you still have to
report that. If your analysis makes sense (e.g. it is valid from the statistical point of view), there is no such thing as a bad result. Present your analysis in the form of a report. Each hypothesis should be described, you should state what you want to prove. If you are claiming that groups have dierent characteristics, rst show these on plots and comment on them. Report should be written in a way that a person without prior knowledge of the data is able to follow it.
. Task B
Divide the dataset into training and test data. Use / split.
Perform Linear Regression with Multiple Variables to predict the diamond price.
Report adjusted R squared (on training data). Use RMSE and correlation to report the prediction accuracy of the test data.
Normalize the data and repeat the process of performing Linear Regression with Multiple Variables on normalized data to predict the diamond price.
Highlight the dierence in prediction accuracy of both models.
Write your ndings in this section. Each valid iteration Linear regression, will get you marks. This section consists of marks in total.
. Task C
Divide the dataset into training and test data. Use / split.
Use kNN to classify diamond cuts into appropriate types based on their features.
Use C. to classify diamond cuts into appropriate types based on their features.
Use ANN (hidden) to classify diamond cuts into appropriate types based on their features.
Compare the (best) performance of each classier.
Write your ndings in this section. Each valid classication technique