$24.99
.
2. Attachments should be named in the format of: A4 itsc stuid.zip which includes
• A4 itscstuid report.pdf/.docx: Please put all your reports in this file. (Attachments should be original .pdf or .docx, NOT compressed)
• A4 itscstuid code.zip: The zip file contains all your source codes for the assignment.
• A4 itscstuid Q1 code: this is a folder that should contain all your source code for Q1.
• A4 itscstuid Q2 code: same as above.
4. For programming language, in principle, python is preferred.
5. Your grade will be based on the correctness, efficiency and clarity.
6. Please check carefully before submitting to avoid multiple submissions.
8. The email for Q&A: hlicg@connect.ust.hk.
(Please read the guidelines carefully)
1 Fuzzy Clustering using EM (50 marks)
Given the training data EM Points.mat, you should implement the Fuzzy Clustering using EM algorithm for clustering.
1.1 Data Description
The dataset contains 400 2D points totally with 2 clusters. Each point is in the format of [Xcoordinate, Y-coordinate, label].
1.2 Implementation
You are required to implement Fuzzy Clustering using the EM algorithm.
1. You are NOT allowed to use any existing EM library. You need to implement it manually and submit your code.
2. Report the updated centers and SSE for the first two iterations. (If you set any hyper parameter when computing SSE, please write it clearly in the report.)
3. Report the final converged centers for each cluster.
4. In your report, draw the clustering results of your implemented algorithm and compare itwith the original labels in the dataset. You need to discuss the result briefly.
Hint: For terminate condition, you can consider the change of parameters or the max iterations.
2 DBSCAN (50 marks)
Given the dataset DBSCAN.mat with 500 2D points, you should apply DBSCAN algorithm to cluster the dataset and find outliers as the following settings:
2.1 Parameter Setting
1. Set = 5, Minpoints=5.
2. Set = 5, Minpoints=10
3. Set = 10, Minpoints=5.
4. Set = 10, Minpoints=10.
2.2 Implementation
1. Draw a picture for your cluster results and outliers in each parameter setting in your report.For clearly, in each picture, the color of outliers should be BLUE.
2. Add a table to report how many clusters and outliers you find in each parameter setting inyour report.
3. Discuss the results of different parameter settings, and report the best setting that you thinkand write your reason clearly.
4. Note that you are NOT allowed to use any existing DBSCAN library. You need to submit your code.
3 Note
One way to draw the clustering results is shown as below.