Starting from:

$30

NSYSU-Image Captioning Solved

Overview
•   In the previous assignments, we implemented multiple image classification tasks.

•   In this assignment, you will design and train a neural network which combines CNN and RNN to process an input image, then output a sequence that describe the image. 

•   You are free to use pre-trained models like ResNet or LSTM as your backbone structure.

Image Captioning
Image captioning is an interdisciplinary research problem that stands between computer vision and natural language processing.

 

Flickr8k  Dataset
•      Flickr8k-Images-Captions

•     Collected by Alexander Mamaev.

•     Sentence-based image description and search

•     Consisting of 8,091 images that are each paired with five different captions

A child in a pink dress is climbing up a set of stairs in an entry way .

Assignment #5 Dataset
8091 imags

 

captions

Your task
•   We have code skeleton for you guys.

•   https://colab.research.google.com/drive/1E96yjndJyBTAEEcgSthRyAVqd4H1WcW?usp=sharing

•   Design a convolutional neural network to do image captioning. 

•   The images provided are of different resolutions. You’ll need to resize the images into a fixed size of your own choice.

•   To get a high accuracy, you’ll need to experiment with different filter sizes, different number of layers, and other design principles discussed in class to figure out a network architecture that works best.

•   You’ll also need to try data augmentation, dropout, batch normalization as well as different optimizers and other tricks to boost performance.

Things you cannot do
•   You cannot copy trained models from others.

•   You cannot copy a whole page of code from the Internet.

Any violation will result in no points!

More products