$20
This document contains instructions for Project for Programming and Scripting. You are not expected to know how to do the whole project from the beginning. Rather, we expect that you research ways to tackle the project and formulate your own submission based on your investigations. Remember, all students are bound by the GMIT’s Quality Framework [2] including the Code of Student Conduct and the Policy on Plagiarism.
Problem statement
This project concerns the well-known Fisher’s Iris data set [3]. You must research the data set and write documentation and code (in Python [1]) to investigate it. An online search for information on the data set will convince you that many people have investigated it previously. You are expected to be able to break this project into several smaller tasks that are easier to solve, and to plug these together after they have been completed.
You might do that for this project as follows:
1. Research the data set online and write a summary about it in your README.
2. Download the data set and add it to your repository.
3. Write a program called analysis.py that:
• outputs a summary of each variable to a single text file, • saves a histogram of each variable to png files, and
• outputs a scatter plot of each pair of variables.
It might help to suppose that your manager has asked you to investigate the data set, with a view to explaining it to your colleagues. Imagine that you are to give a presentation on the data set in a few weeks’ time, where you explain what investigating a data set entails and how Python can be used to do it. You have not been asked to create a deck of presentation slides, but rather to present your code and its output to them.
Minimum Viable Project
The minimum standard is a GitHub repository containing a README, a Python script, a generated summary text file, and images. The README should contain a summary of the data set and your investigations into it. It should also clearly document how to run the Python code and what that code does. Furthermore, it should list all references used in completing the project.
A better project will be well organised and contain detailed explanations. The analysis will be well conceived, and examples of interesting analyses that others have pursued based on the data set will be discussed. Note that the point of this project is to use Python. You may use any Python libraries that you wish, whether they have been discussed in class or not.
You should not be thinking of using spreadsheet software like Excel to do your calculations.
GitHub must be used to manage your project submission. Your GitHub repository will form the main submission of the project. You must submit the URL of your GitHub repository using the link on the course Moodle page before the deadline. You can do this at any time, as the last commit before the deadline will be used as your submission for this project.
References
[1] Python Software Foundation. Welcome to python.org. https://www.python.org/.
[2] GMIT. Quality assurance framework. https://www.gmit.ie/general/quality-assuranceframework.
[3] UC Irvine Machine Learning Repository. Iris data set. http://archive.ics.uci.edu/ml/datasets/Iris.