Starting from:

$30

COMP551-MiniProject 4 Reproducibility in ML Solved

Background
One goal of publishing scientific work is to enable future readers build upon it. Reproducibility is the central theme to achieve this target, yet it is unfortunately one of the biggest challenges of Machine Learning Research. Everyone is encouraged to follow the reproducicbility checklist while publishing scientific research, to make the results reliable and reproducible. In addition, a challenge is organized every year to measure the progress of our reproducbility effort. The participants select a published paper from one of the listed conferences, and attempt to reproduce its central claims. The objective is to assess if the conclusions reached in the original paper are reproducible. The focus of this challenge is to follow the process described in the paper and attempt to reach the same conclusions. We have designed this miniproject in the spirit of the reproducibility challenge.

Problem definition
The goal of this assignment is to select a paper and reproduce the results of the paper by following the exact methods mentioned in the paper. You can choose a paper from the few example papers listed here or find one of your choice that meets the criteria mentioned below. For this mini project, you are not expected to implement anything from scratch. You are encouraged to use any code repository published with the paper or any other implementation you might have found online.

Paper selection guidelines
•  To minimize the overlap between this miniproject with the previous ones, we have decided on a few broad categories the paper must belong to :

1.   Vision/Image Processing - The paper should have a vision/image processing unit (Convolutional Neural Network(CNN), ResNet etc.). It can be a combination of vision and text data, but given we have not covered the state of the art text processing elements (Recurrent Neural Network (RNN)/LSTM/transformers etc.), we are not expecting you to use them. It’s perfectly fine if you pick such a paper and choose to use them though.

2.   Clustering

3.   Dimensionality reduction

4.   Ensemble Methods

5.   Random Forest

6.   Reinforcement Learning

•  You should be able to access the data or environment you will need to reproduce the paper’s experiments.

•  In many cases a codebase might be available directly from the authors or another source (if the paper is old). You should definitely check whether you can handle the code before deciding on the paper.

•  You should estimate the computational requirements for reproducing the paper and take into account the resources available to you for the project. Some authors might have had access to infrastructure that is way out of your budget; you might not want to choose such a paper.

•  You are free to choose any paper from the current pool of papers of the reproducibility challenge, or any classic paper such as the example papers mentioned below. Just make sure that the paper chosen overlaps significantly with at least one of the above mentioned broad categories. Given the advanced state of the art, choosing the former might need more computational resources, but it also presents to you an opportunity to submit to the ongoing reproducibility challenge, which is peer reviewed. Another great place to look for a relevant paper is Papers with Code.

A few example papers:

–   CNN+SVM paper: Deep Learning using Linear Support Vector Machines

–   AlexNet paper: ImageNet Classification with Deep Convolutional Neural Networks

–   t-SNE paper: Visualizing Data using t-SNE

–   VGG paper: Very Deep Convolutional Networks for Large-scale Image Recognition

–   ResNet paper: Deep Residual Learning for Image Recognition

–   Dropout paper: Dropout: A Simple Way to Prevent Neural Networks from Overfitting

–   Kernel SVM paper: Online Learning with Kernels

Experiments
You don’t need to reproduce all the experiments of your selected paper. From your selected paper, you can choose a subset of the experiments that’s feasible for you to reproduce in terms of computation resources.

Some state of the art models can demand higher computation power than you have access to. In such cases, you might want to reproduce only the baseline model described in the paper. Often hyper-parameter search on the baseline models has not been performed well and there can be a better model than the one reported in the paper. You can implement the models from scratch or use the code provided by the authors. But make sure to add all the resources you have used in your references.

Several models above also have pretrained weights available to download. Since these have been trained on huge datasets, you are encouraged to code up the models and directly import these weights instead of training from scratch. You can then use the pretrained model for experimentation and ablation studies as well as fine-tune the weights on new data.

•  You will first reproduce the results reported in the paper by running the code provided by the authors or by implementing on your own, if no code is available

•  You will try to modify the model and perform ablation studies to understand the model’s robustness and evaluate the importance of the various model components. (In this context, the term “ablation” is used to describe the process of removing different model components to see how it impacts performance.)

•  You should do a thorough analysis of the model through an extensive set of experiments.

•  Note that some experiments will be difficult to replicate due to computational resources. It is fine to reproduce only a subset of the original paper’s results or to work on a smaller variant of the data—if necessary.

•  At a minimum, you should use the authors code to reproduce a non-trivial subset of their results and explore how the model performs after you make minor modifications (e.g., changes to hyperparameters).

•  An outstanding project would perform a detailed ablation study and/or implement significant/meaningful extensions of the model.

More products