$30
In this part, you are required to implement the k-means algorithm and apply your implementation on the given dataset, which contains a set of 2-D points. You are required to implement two different strategies for choosing the initial cluster centers.
Strategy 1: randomly pick the initial centers from the given samples.
Strategy 2: pick the first center randomly; for the i-th center (i1), choose a sample (among all possible samples) such that the average distance of this chosen one to all previous (i-1) centers is maximal.
You need to test your implementation on the given data, with the number k of clusters ranging from 2-10. Plot the objective function value vs. the number of clusters k. Under each strategy, plot the objective function twice, each start from a different initialization.
(Referring to the course notes: When clustering the samples into k clusters/sets Di, with respective center/mean vectors 𝜇1, 𝜇2, … 𝜇k, the objective function is defined as
)
Algorithms:
k-Means Clustering
Resources:
A 2-D dataset to be downloaded from this link: Dataset.
Workspace:
Any Python programming environment.
Software:
Python environment.
Language(s):
Python. (MATLAB is equally fine, if you have access to it.)
https://asu.instructure.com/courses/31489/assignments/686560?module_item_id=1905559 1/2 1/13/2020 Project Part 2: Unsupervised Learning (K-means)
Required Tasks:
1. Write code to implement the k-means algorithm with Strategy 1.
2. Use your code to do clustering on the given data; compute the objective function as a function of k (k = 2, 3, …, 10).
3. Repeat the above step with another initialization.
4. Write code to implement the k-means algorithm with Strategy 2.
5. Use your code to do clustering on the given data; compute the objective function as a function of k (k = 2, 3, …, 10).
6. Repeat the above step with another initialization.
7. Submit a short report summarizing the results, including the plots for the objective function values underdifferent settings described above.
Optional Tasks:
1. Repeat the experiments for different pairs of digits.
2. Consider doing multi-class classification.