$20
1. Perform clustering on a given data set by using DBSCAN.
3. Requirements
The program must meet the following requirements:
l Execution file name: clustering.exe n Execute the program with four arguments: input data file name, n, Eps and MinPts
- Three input data will be provided: ‘input1.txt’, ‘input2.txt’, ‘input3.txt
- n: number of clusters for the corresponding input data
- Eps: maximum radius of the neighborhood
- MinPts: minimum number of points in an Eps-neighborhood of a given point
- We suggest that you use the following parameters (n, Eps, MinPts) for each input data l For ‘input1.txt’, n=8, Eps=15, MinPts=22 l For ‘input2.txt’, n=5, Eps=2, MinPts=7 l For ‘input3.txt’, n=4, Eps=5, MinPts=5 n Example:
- Input data file name = ‘input1.txt’, n = 8, Eps = 15, MinPts = 22
l File format for an input data
[object_id_1]\t[x_coordinate]\t[y_coordinate]\n
[object_id_2]\t[x_coordinate]\t[y_coordinate]\n [object_id_3]\t[x_coordinate]\t[y_coordinate]\n [object_id_4]\t[x_coordinate]\t[y_coordinate]\n
...
n Row: information of an object
- [object_id_i]: identifier of the ith object
- [x_coordinate], [y_coordinate]: the location of the corresponding object in the 2-dimensional space n Example:
Figure 1. An example of an input data.
l Output files n You must print n output files for each input data
- (Optional) If your algorithm finds m clusters for an input data and m is greater than n (n = the number of clusters given), you can remove (m-n) clusters based on the number of objects within each cluster. In order to remove (m-n) clusters, for example, you can select (m-n) clusters with the small sizes in ascending order
- You can remove outlier. In other words, you don't need to include outlier in a specific cluster n File format for the output of ‘input#.txt’ - ‘input#_cluster_0.txt’
[object_id]\n
[object_id]\n
...
- ‘input#_cluster_1.txt’
[object_id]\n
[object_id]\n
...
- ‘input#_cluster_n-1.txt’
[object_id]\n
[object_id]\n
...
n ‘output#_cluster_i.txt’ should contain all the ids belonging to cluster i that were obtained by using your algorithm n Supposed to follow the naming scheme for the output file as above