Starting from:

$25

Big Data-Assignment 2 Solved

󱸯 Task A
Before you start statistical analysis, you have to de󰎓ne hypotheses, which will be tested. You should state at least 󱸰 di󰎎erent hypotheses, each to test di󰎎erent data (so not all hypotheses should be checking the same statement just on di󰎎erent variables). Remember that there are di󰎎erent types of tests and you should use as many as you can (given if they are valid and make sense). Your ultimate goal is to report some 󰎓ndings. You should also prove that these 󰎓ndings are statistically correct. Take the below points as hints but do not limit yourself to these:

Look at di󰎎erent plots you have created during exploratory analysis. What conclusions can be drawn based on these? These could become your hypotheses.
If you focus on one attribute, what is your intuition about the distribution that could explain such results? You can check and measure how well the data 󰎓ts some distribution.
For each valid hypothesis test you will get 󱸯󱸳 marks. This section consists of 󱸱󱸮 marks in total.
Remember that data analysis is not only about 󰎓nding and proving hypotheses but also about summarising data and communicating it.

It is not a failure if you do not get ”signi󰎓cant” results, you still have to

󱸰

report that. If your analysis makes sense (e.g. it is valid from the statistical point of view), there is no such thing as a bad result. Present your analysis in the form of a report. Each hypothesis should be described, you should state what you want to prove. If you are claiming that groups have di󰎎erent characteristics, 󰎓rst show these on plots and comment on them. Report should be written in a way that a person without prior knowledge of the data is able to follow it.

󱸱.󱸰 Task B
Divide the dataset into training and test data. Use 󱸵󱸳/󱸰󱸳 split.
Perform Linear Regression with Multiple Variables to predict the diamond price.
Report adjusted R squared (on training data). Use RMSE and correlation to report the prediction accuracy of the test data.
Normalize the data and repeat the process of performing Linear Regression with Multiple Variables on normalized data to predict the diamond price.
Highlight the di󰎎erence in prediction accuracy of both models.
Write your 󰎓ndings in this section. Each valid iteration Linear regression, will get you 󱸯󱸳 marks. This section consists of 󱸱󱸮 marks in total.
󱸱.󱸱 Task C
Divide the dataset into training and test data. Use 󱸶󱸮/󱸰󱸮 split.
Use kNN to classify diamond cuts into appropriate types based on their features.
Use C󱸳.󱸮 to classify diamond cuts into appropriate types based on their features.
Use ANN (hidden󱹫󱸳) to classify diamond cuts into appropriate types based on their features.
Compare the (best) performance of each classi󰎓er.
Write your 󰎓ndings in this section. Each valid classi󰎓cation technique

More products