$34.99
Note 1: Your submission header must have the format as shown in the above-enclosed rounded rectangle.
Note 4: All submitted materials must be legible. Figures/diagrams must have good quality. Note 5: Please use and check the Blackboard discussion for further instructions, questions, answers, and hints.
1. By considering the following 8 2D data points below do:
1st iteration
Instance A1 A2 A3 A4 A5 A6 A7 A8
C1 dist.
C2 dist.
C3 dist.
Solution format:
b. Calculate the SSE (Sum of Square Errors) of the final clustering.
2. Use the distance matrix below to perform the following operations:
a. Group the points by using single link (MIN) hierarchical clustering. Show your results by informing the updated similarity matrix after each merging step and by drawing the corresponding dendrogram that should clearly present the order in which the points are merged.
p1 p2 p3 p4 p5
p1 0.00 0.10 0.41 0.55 0.35
p2 0.10 0.00 0.64 0.47 0.98
p3 0.41 0.64 0.00 0.44 0.85
p4 0.55 0.47 0.44 0.00 0.76
p5 0.35 0.98 0.85 0.76 0.00
Solution format:
p2 p4
b. [3 points] Show the clusters when k = 2, k = 3, and k = 4.
3. [15 points] Complete the Python program (clustering.py) that will read the file training_data.csv to cluster the data. Your goal is to run k-means multiple times and check which k value maximizes the Silhouette coefficient. You also need to plot the values of k and their corresponding Silhouette coefficients so that we can visualize and confirm the best k value found. Next, you will calculate and print the Homogeneity score (the formula of this evaluation metric is provided in the template) of this clustering task by using the testing_data.csv, which is a file that includes ground truth data (clusters). Finally, you will use the same k value found before with k-means to run Agglomerative clustering a single time, checking and printing its Homogeneity score as well.
4. [10 points] The dataset below presents the user ratings on a 1-3 scale for 6 different bands.
Bon Jovi Metallica Scorpions AC/DC Kiss Guns n’ Roses
Fred 1 3 - 3 1 3
Lillian 3 - 2 2 3 1
Cathy 2 2 2 3 - 2
John 3 2 2 2 ? ?
a. [5 points] Apply user-based collaborative filtering on the dataset to decide about recommending the bands Kiss and Guns n’ Roses to John. You should make a recommendation when the predicted rating is greater than or equal to 2.0. Use cosine similarity, a neutral value (1.5) for missing values, and the top 2 similar neighbors to build your model.
b. [5 points] Now, apply item-based collaborative filtering to make the same decision. Use the same parameters define before to build your model.
5. [15 points] Complete the Python program (collaborative_filtering.py) that will read the file trip_advisor_data.csv to make user-based recommendations. Your goal is to predict the ratings of user 100 for the categories: galleries and restaurants. Follow the steps:
1. Weight all users with respect to cosine similarity with the active user
2. Select the top 10 similar users as predictors
3. Compute a prediction from a weighted combination of selected neighbors’ ratings (use the formula that mitigate the bias)
6. [16 points] [21 points] Consider the following transaction dataset.
Suppose that minimum support is set to 30% (minsup) and minimum confidence is set to 60%. a. [4 points] Rank all frequent itemsets according to their support (list their support values).
b. [4 points] For all frequent 3-itemsets, rank all association rules - according to their confidence values - which satisfy the requirements on minimum support and minimum confidence (list their confidence values).
c. [4 points] Show how the 3-itemsets candidates can be generated by the 𝐹𝑘−1 X 𝐹𝑘−1 method and if these candidates will be pruned or not.
d. [4 points] Consider the lattice structure given below. Label each node with the following letter(s): M if the node is a maximal frequent itemset, C if it is closed frequent itemset, N if it is frequent but neither maximal nor closed, and I if it is infrequent.
4. [15 points] Complete the Python program (association_rule_mining.py) that will read the file retail_dataset.csv to find strong rules related to supermarket products. You will need to install a python library this time. Just use your terminal to type: pip install mlxtend. Your goal is to output the rules that satisfy minsup = 0.2 and minconf = 0.6, as well as the priors and probability gains of the rule consequents. The formulas to compute priors and probability gains are given in the template.
7. [10 points] Deep Learning (bonus point – not required). Complete the Python program (deep_learning.py) that will learn how to classify fashion items. You will use the dataset Fashion MNIST, which includes 70,000 grayscale images of 28×28 pixels each, with 10 classes, each class representing a fashion item as illustrated below. You will use Keras to load the dataset which includes 60,000 images for training and 10,000 for test. Your goal is to train and test multiple deep neural networks and check their corresponding performances, always updating the highest accuracy found. This time you will use a separate function named build_model() to define the architectures of your networks. Finally, the weights of the best model will be printed, together with the architecture and the learning curves. To install TensorFlow use: python -m pip install --upgrade tensorflow.