$30
Programming and Scripting
This document contains instructions for Project 2021 for Programming and Scripting. You are not expected to know how to do the whole project from the beginning. Rather, we expect that you research ways to tackle the project and formulate your own submission based on your investigations. Remember, all students are bound by the GMIT’s Quality Framework [2] including the Code of Student Conduct and the Policy on Plagiarism.
Problem statement
This project concerns the well-known Fisher’s Iris data set [3]. You must research the data set and write documentation and code (in Python [1]) to investigate it. An online search for information on the data set will convince you that many people have investigated it previously. You are expected to be able to break this project into several smaller tasks that are easier to solve, and to plug these together after they have been completed.
You might do that for this project as follows:
1. Research the data set online and write a summary about it in your README.
2. Download the data set and add it to your repository.
3. Write a program called analysis.py that:
• outputs a summary of each variable to a single text file, • saves a histogram of each variable to png files, and
• outputs a scatter plot of each pair of variables.
It might help to suppose that your manager has asked you to investigate the data set, with a view to explaining it to your colleagues. Imagine that you are to give a presentation on the data set in a few weeks’ time, where you explain what investigating a data set entails and how Python can be used to do it. You have not been asked to create a deck of presentation slides, but rather to present your code and its output to them.
1
Minimum Viable Project
The minimum standard is a GitHub repository containing a README, a Python script, a generated summary text file, and images. The README should contain a summary of the data set and your investigations into it. It should also clearly document how to run the Python code and what that code does. Furthermore, it should list all references used in completing the project.
A better project will be well organised and contain detailed explanations. The analysis will be well conceived, and examples of interesting analyses that others have pursued based on the data set will be discussed. Note that the point of this project is to use Python. You may use any Python libraries that you wish, whether they have been discussed in class or not.
You should not be thinking of using spreadsheet software like Excel to do your calculations.
References
[1] Python Software Foundation. Welcome to python.org. https://www.python.org/.
[2] GMIT. Quality assurance framework. https://www.gmit.ie/general/quality-assuranceframework.
[3] UC Irvine Machine Learning Repository. Iris data set. http://archive.ics.uci.edu/ml/datasets/Iris.