Starting from:

$25

CS520 - Assignment 4 - Colorization - Solved

The purpose of this assignment is to demonstrate and explore some basic techniques in supervised learning and computer vision.

The Problem: Consider the problem of converting a picture to black and white.

 

Figure 1: Training Data - A color image and its corresponding greyscale image.

Typically, a color image is represented by a matrix of 3-component vectors, where Image[x][y] = (r,g,b) indicates that the pixel at position (x,y) has color (r,g,b) where r represents the level of red, g of green, and b blue respectively, as values between 0 and 255. A classical color to gray conversion formula is given by

                                                                               Gray(r,g,b) = 0.21r + 0.72g + 0.07b,                                                                         (1)

where the resulting value Gray(r,g,b) is between 0 and 255, representing the corresponding shade of gray (from totally black to completely white).

Note that converting from color to grayscale is (with some exceptions) losing information. For most shades of gray, there will be many (r,g,b) values that correspond to that same shade.

However, by training a model on similar images, we can make contextually-informed guesses at what the shades of grey ought to correspond to. In an extreme case, if a program recognized a black and white image as containing a tiger (and had experience with the coloring of tigers), that would give a lot of information about how to color it realistically.

 

Figure 2: Trained on the Color/Grayscale image in Fig.1, recovers some green of the trees, and distinguishing blues between sea and sky. But there are definitely some obvious mistakes as well.

You have a lot of freedom in your approach to this, but carefully formulate each of the following in outlining your solution to the problem, expressing your design choices, the math, and the algorithms behind your solution:

1

Computer Science Department - Rutgers University                                                                                                     Fall 2018
•    Representing the Process: How can you represent the coloring process in a way that a computer can handle? What spaces are you mapping between? What maps do you want to consider? Note that mapping from a single grayscale value gray to a corresponding color (r,g,b) on a pixel by pixel basis, you do not have enough information in a single gray value to reconstruct the correct color (usually).

•    Data: Where are you getting your data from to train/build your model? What kind of pre-processing might you consider doing?

•    Evaluating the Model: Given a model for moving from grayscale images to color images (whatever spaces you are mapping between), how can you evaluate how good your model is? How can you assess the error of your model (hopefully in a way that can be learned from)? Note there are at least two things to consider when thinking about the error in this situation: numerical/quantified error (in terms of deviation between predicted and actual) and perceptual error (how good do humans find the result of your program).

•    Training the Model: Representing the problem is one thing, but can you train your model in a computationally tractable manner? What algorithms did you draw on? How did you determine convergence? How did you avoid overfitting?

•    Assessing the Final Project: How good is your final program, and how can you determine that? How did you validate it? What is your program good at, and what could use improvement? Do your program’s mistakes ‘make sense’? What happens if you try to color images unlike anything the program was trained on? What kind of models and approaches, potential improvements and fixes, might you consider if you had more time and resources?

Some Possible Approaches
Some possible approaches you might take to the problem include the following (and where used to generate the small example above):

•    While mapping from gray 7→ (r,g,b) cannot reliably reconstruct the true color of a pixel, not having enough information in a single gray value, consider looking at a small 3 × 3 pixel window of gray values, and mapping this set of nine gray values to a single (r,g,b) color vector, which could for instance be the color of the middle pixel in this window. In this case, the surrounding eight gray values give additional context and information to build a color for the central pixel. With such a map, a grayscale image could be colored by simply taking every 3 × 3 pixel patch, and determining what color the central pixel should be.

•    To further simplify things, the problem can be shifted from a regression problem to a discrete classification problem in the following way: consider building an initial palette of K representative colors, and instead of trying to reconstruct the true color of a pixel, determine which of these K colors should best be applied to a given pixel. How can you determine which K colors are best to use, however? And be careful as well - how should you assess error and the quality of a model when coloring in this way?

•    It may also be useful to reduce the input space as well as the output space - consider for instance the set of all possible 3 × 3 pixel patches that occur in a given image, much like overlapping jigsaw puzzle pieces. Do all possible jigsaw puzzle pieces occur in representing a given image, or could the overall space be reduced to consider only a set of ‘representative’ puzzle pieces?


More products