$25
1 Introduction
This worksheet is a little different from the previous ones: there is no programming! Instead, the idea is for you to read about a dataset and create a datasheet for it, using the ideas from Gebru et al. [2021].
We will be using the datasheet questions from Gebru et al. [2021], and applying them to the ImageNet dataset [Deng et al., 2009] to create a datasheet for the ImageNet dataset. If you cannot find the answers to all the questions, don’t worry! The idea is for you to start thinking about the ideas and also to see the extent to which the information needed for the datasheet is publicly available.
2 Task
First of all, open up the paper ‘Datasheets for Datasets’. The url is here: https://arxiv.org/abs/1803.09010.
The set of questions that the authors propose should be used is in sections 3.1-3.7. In each section pick 2 or 3 of the questions and try to answer them in relation to the ImageNet dataset (paper here: http://www.image-net.org/ papers/imagenet_cvpr09.pdf, website here: http://www.image-net.org/).
At the end of the ‘Datasheets for Datasets’ paper there are two example datasheets that you can look at to get ideas. Feel free to work together if it is helpful!
• How easy is it to answer the questions?
• Is one category of questions easier to answer than another?
• How easy would it be to find out whether the dataset is suitable for a project you want to carry out?
• Do you identify any aspects of the dataset that you think should be altered?
• Any other points you want to make?
1
References
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255, 2009.
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daum´e III, and Kate Crawford. Datasheets for datasets. Communications of the ACM, 64(12):86–92, 2021. URL https://arxiv.org/abs/1803.09010.