$30
CSE4063 Fundamentals of Data Mining
Project #2
1) Dataset
a) Click the link “View All Data Sets” on URL http://archive.ics.uci.edu/ml. to get the data sets assigned:
No
Group Members
Dataset [Rows x Columns]
Presentation Slot
1
Ayberk Ömer Altuntabak *
Abdulhalik Şensin
Amela Karmaj
Absenteeism at work [740x21]
2
Emin Kağan Kadıoğlu *
Ayşenur Yılmaz
Mert Mengü
Anuran Calls (MFCCs) [7,195x22]
3
Mehmet Nusret Odabaşı *
Abbas Kutay
Orhan Fatih Bayazıt
Apartment for rent classified [10,000x22]
4
Ahmet Enes Gündüz *
Hakan Yalçın
Muhammed Fethullah Eroğlu
BLE RSSI Dataset for Indoor localization and Navigation [6,611x15]
5
Furkan Akman *
Burak Fidan
Mustafa Sertaç Öztürk
Codon usage [13,028x69]
6
Ferihan Çabuk *
Ali Berat Çetin
Muhammed İsa Akbaba
Estimation of obesity levels based on eating habits and physical condition [2,111x17]
7
Halid Seyfullah Sert *
Mert İlik
Facebook Live Sellers in Thailand [7,051x12]
8
Sedanur Kara *
Berke Şahin
Sinem Onal
Geo-Magnetic field and WLAN dataset for indoor localisation from wristband and smartphone [153,540x25]
9
Diala Jassem M.B.J. *
Münevver Sueda Kocatürk
Nurhande Akyüz
Mice Protein Expression [1,080x82]
10
Ahmet Hakan Ekşi *
Belgin Taştan
Kevser İldeş
Motion Capture Hand Postures [78,095x38]
11
Deniz Arda Gürhizin *
Can Berk Durmuş
Tarkan Batar
Online Shoppers Purchasing Intention Dataset [12,330x18]
12
Zahide Gür Taştan *
Merve Ayer
Zeynep Naz Akyokuş
Sales_Transactions_Dataset_Weekly [811x53]
13
Cem Güleç *
Buğra Akdeniz
Kadir Hızarcı
Shill Bidding Dataset [6,321x13]
14
İlker Fener *
Doğukan Deniz
Halil İbrahim Şimşek
South German Credit (UPDATE) [1,000x21]
15
Osman Mantıcı *
Buse Batman
Fatmanur Özdemir
Turkiye Student Evaluation [5,820x33]
a) The first students indicated by * sign are the group representatives.
b) Learn / Get information about your data.
2) Python Platform & Environment
a) Get a platform/environment for python work on, if you do not have any. Install it on your computer.
b) You may use any libraries you want; however, you should have complete understanding to use and explain it in demo sessions.
c) Implement your work with your own code as possible as you can.
3) Model Construction: Frequent Pattern Mining & Clustering Analysis
a) Do the data preprocessing steps, if required.
b) Use your dataset to construct 3 frequent pattern mining models as follows: i) Apriori. ii) FP-Growth.
iii) ECLAT.
c) Use your dataset to construct 3 clustering analysis methods as follows:
i) K-Means. ii) AGNES.
iii) DBSCAN.
4) Implementation & Model Evaluation
a) Implement six algorithms above on your dataset using python.
b) Compare the performance and the results of three frequent pattern mining algorithms on your dataset. Discuss the results.
c) Compare the performance and the results of three clustering analysis algorithms on your dataset. Discuss the results.
d) Compare the performances of your classifiers with performances of the relevant papers given on the site.
5) Presentation
a) You are going to present your work done online in 12 minutes at the time slot reserved for your group. Group members should equally participate the presentation. See the table above.
b) Prepare a presentation file discussing the details of your work done and results of the classifiers.
c) Your presentation should contain the following parts at least:
i) Problem definition ii) Dataset
(1) Information about the dataset.
(2) Number of instances, columns, etc.
iv) Data preprocessing, cleaning
(1) Missing values, and how you conduct on these.
(2) Transformations and normalizations.
v) Python implementation for each of the 6 algorithms
(1) IDE/environment used.
(2) Implementation details.
(3) Libraries used.
vi) Model evaluation & performance results
(1) Performance measures.
(2) Comparison of all 6 algorithms.
vii) Conclusion
6) Demo with Presentation
a) You are going to demonstrate your work done online in 5 minutes after your presentation. See the table above.
b) You are going to have 17 minutes in total for your group’s session (12 minutes for presentation, and 5 minutes for demonstration).
c) Please keep in mind that all the presentation and demo sessions will be recorded.
d) All the students should attend all sessions.
7) Related Questions & Answers
a) Prepare 5 questions and answers related to your topic. These questions may be asked to other students.
b) Question types can be multiple choice (single or multiple selection), fill in the blanks, matching, essay, etc.
c) Prepare a presentation file with 11 slides consisting these 5 questions and answers. First slide will be used for your topic and group members’ info. Use 1 slide per each question, and 1 slide per each answer.
8) Evaluation
a) Your grade related to project #2 will cover 10% of your total grade at least; may increase subject to coronavirus issues.
b) Evaluation will be done out of 100 points:
i) [4 pts] Data set understanding. ii) [4 pts] Data preprocessing. iii) [20 pts] Implementation of frequent pattern algorithms. iv) [20 pts] Implementation of clustering analysis algorithms. v) [14 pts] Results, comparison, discussion & conclusion.
vi) [20 pts] Presentation quality. vii) [8 pts] Demo quality. viii) [10 pts] Questions & answers quality.
9) Submission
a) You are going to submit the followings:
i) Python codes implemented.
ii) Presentation file.
iii) Questions & answers presentation file.
b) Write the following sentence in a text file: “We hereby swear that the work done on this project is totally our own; and on our honor, we have neither given nor received any unauthorized and/or inappropriate assistance for this project. We understand that by the school code, violation of these principles will lead to a zero grade and is subject to harsh discipline issues.” Rename it as “we_swear.txt” and include this file in the zip submission file.
c) Only one of the group members (i.e. group representative, in short “GrRep”) is going to submit the project using GrRep’s info all the time. However, all group members should have a complete and comprehensive understanding of all the work done for all tasks and steps of the project.
d) Zip all your documents into a single file using filename GrRepStudentNumber_P2.zip (e.g. 150118123_P2.zip) and submit it to the site http://ues.marmara.edu.tr before deadline.
e) In case of any form of copying and cheating on solutions, all parts will get ZERO points. You should submit your own work. In case of any forms of cheating or copying, both giver and receiver are equally culpable and suffer equal penalties. All types of plagiarism will result in zero points from the homework.
f) If case of using your handwriting, your handwriting should be readable, clear and neat. If possible, do not use any handwriting.
g) Do not send project submissions through e-mail. E-mail attachments will not be accepted as valid submissions.
h) You are responsible for making sure you are turning in the right file, and that it is not corrupted in anyway. We will not allow resubmissions if you turn in the wrong file, even if you can prove that you have not modified the file after the deadline.
i) Grade evaluation may be done on selected parts of the project, so try to complete all parts of your project successfully.
j) No late submissions will be accepted.