Starting from:

$25

Big Data-Assignment 1 Solved

Your task is to perform EDA and calculate the strength of relationships between the variables of the dataset. Consider below as a guideline:

1.   Your first task is to clean the dataset and prepare it for analysis by

e.g. removing/replacing NAs and incorrect values.

2.   Begin your analysis with a summary of the variables (use basic statistical methods). Briefly describe your understanding. Prepare 4 plots: pie chart, bar chart, histogram, scatter plot. Each plot should display different variables (do not use price variable now). Each plot must have a title and meaningful labels.

3.   Focus your analysis on the price variable:

(a)    Show the histogram of the price variable. Describe it briefly. Include summary statistics like mean, median, and variance.

(b)    Group diamonds by some price ranges (like low, medium, high, etc.) and summarise those groups separately.

(c)    Explore prices for different cut types. You might want to use the

boxplot.

(d)    How different attributes are correlated with the price? Which 3 variables are correlated the most with price?

4.   Now focus your analysis on the carat, depth, table and dimensions (x, y , z) variables:

(a)    Compute a volume variable from x, y, z - add it to the dataset.

Plot it against the price. Describe your findings.

(b)    Are the carat and volume attributes correlated? Is that a strong relationship? Draw a plot with regression line.

(c)    Explore the relationships between table and depth variables.

(d)    Now explore relationships between table and rest of other variables. Compute correlations and describe your findings.

5.   In your Markdown document, you should use proper headings and commentary for each task. You can get up to for style, clarity and quality of the report and the source code.

More products