$25
Concretely, you should use two algorithms from scikit-learn and compare their performance on the dataset. You should also compare the performance of your chosen models against a baseline–i.e. a simple model that more complex models should be able to beat. sklearn has a module sklearn.dummy which may be useful in generating a baseline. You should use techniques to assess the ability of the models to generalise to unseen data and to ensure that your assessment of the models’ performance is robust.
Material from worksheets 13, 14, 16, and 17 will be helpful here.
Your answer to this question should take the form of a short report (maximum 4 pages), together with commented code, detailing the approach you will take. Make sure you address all the bullet points above, and explain your decisions. For example: ‘I chose to use a X algorithm because Y’. ‘Because of Z, I used metric M’. You should use plots and figures as appropriate to illustrate your decisions.
Q1 mark scheme (40 pts)
At least 2 algorithms should be tested. If only 1 is tested then the maximum points for the question is 20. You can obtain full marks using 2 algorithms plus the baseline.
(5pts) Overall presentation of the report, including use of appropriate sections, plots, diagrams, or tables to make your point. Do not include code snippets in the report. Instead, describe in words or equations what you are implementing. Format equations correctly.
(3pts) Picking a suitable type of algorithm (classification/regression/clustering) and justifying this choice. The lectures and worksheet from week 13 will be helpful here.
(3 pts) An appropriate choice of performance metric (e.g.: accuracy/precision/mean squared error etc) and justification. The lectures and worksheet from week 13 will be helpful here.
(4 pts) Discussion of the kind of baseline to compare against. (sklearn has a module sklearn.dummy which may be useful in generating a baseline).
(15 pts) Use of an appropriate method to select the hyperparameters of the chosen algorithms. The explanation of which hyperparameters are selected should be backed up with e.g. tables and plots to show which hyperparameter values were chosen and why. Please choose at least one model that uses hyperparameters so that you can show your knowledge in this area. If you choose one model without hyperparameters then please explain in a couple of sentences what the benefits of choosing a model without hyperparameters are. The lectures and worksheet from week 13 will be helpful here.
Breakdown
• 3 pts: Show that you understand what hyperparameters are and how they can be selected.
• 5 pts: Look at the effects of different hyperparameter choices on the performance of your models.
• 5 pts: Present the effects of the different hyperparameter choices on the performance of your models using tables, plots, or other presentation.
• 2 pts: State what hyperparameter choices you make and why.
(10 pts) Training and testing the performance of the models in a way that shows whether the models are able to generalise to unseen data and that ensures that the performance of the models is robust. The lectures and worksheet from week 13 will be helpful here.
• 4 pts: Train models and select hyperparameters in a way that gives robust performance • 3 pts: Test the performance of your models and compare their performance
• 3 pts: Make sure your models are tested in a way that shows whether they are able to generalise to unseen data
Recommended structure of the short report
The short report should be no more than 4 pages. Shorter is fine. You should use LATEX, MS Word, or a similar text editor to prepare the report and submit it as a pdf document. • Introduction: State what the problem is. State what kind of algorithm needs to be used (classification/regression/clustering) and explain why that kind of algorithm needs to be used.
• Methods: State which specific algorithms you will use. State which performance metric you will use and why. Describe the baseline that you will measure your algorithms against. Describe how you will choose the hyperparameters of the algorithms. Explain which hyperparameters you have selected for each model using tables or plots to illustrate your decision.
• Results: Report the results of your models. Use tables or plots as appropriate to illustrate your results.
Question 2: 10pts
The Flickr-Faces-HQ (FFHQ) dataset is available at https://github.com/NVlabs/ffhq-dataset and described in appendix A of the paper Karras et al. [2019]. NB: You do not need to read the whole paper. I have provided a template with a selection of the datasheet questions in sections 3.2 (Composition), 3.3 (Collection Process) and 3.5 (Uses) of the paper Gebru et al. [2021]. Please provide answers to the questions in the template.
Page guide: The template is 2 pages long. The completed template with your answers should be about 3 pages long - most questions need a sentence or two answer. Some may need longer or shorter answers.