$25
Practice the three basic tasks of visual data analytics use data from mini project #1 (or other), begin with IN12500, ID1210) client-server system: python for processing (server), D3 for VIS (client)
Task 1: data clustering and decimation (30 points) implement random sampling and stratified sampling (remove 75% of data) the latter includes the need for k-means clustering (optimize k usingTask 2: dimension reduction on both org and 2 types of reduced data (30) find the intrinsic dimensionality of the data using PCA produce scree plot visualization and mark the intrinsic dimensionality show the scree plots before/after sampling to assess the bias introduced
• obtain the three attributes with highest PCA loadings
Task 3: visualization of both original and 2 types of reduced data (40 points) visualize the data projected into the top two PCA vectors via 2D scatterplot visualize the data via MDS (Euclidian & correlation distance) in 2D scatterplots visualize the scatterplot matrix of the three highest PCA loaded attributes