$25
The goal of this project is to implement the four simple motion detection algorithms described in Lecture 24 and run them on short video sequences. As described (and pseudocoded) in Lecture 24, the four algorithms are:
• Simple Background Subtraction
• Simple Frame Differencing
• Adaptive Background Subtraction
• Persistent Frame Differencing
You are to implement all four in one program and generate, as output, a four-panel frame showing the results of each algorithm, on each video frame (see Figure 1), writing the resulting images out to numbered files, i.e. out0001.jpg, out0002.jpg, etc. Note: you don’t have to label the images with text as in Figure 1, but the results of the four algorithms should be displayed in the order as shown in Figure 1.
Figure 1: Example output frame showing results of the four motion detection algorithms run on the same input sequence. Simple Background Subtraction (top left), Simple Frame Differencing (top right), Adaptive Background Subtraction (bottom left), Persistent Frame Differencing (bottom right)
The specific project outcomes include:
• Experience in efficient and effective Matlab programming
• Understanding background subtraction and temporal frame differencing algorithms
• Generating videos visualizing the temporal performance of implemented algorithms
• Implementing efficient code to ensure timely operation and testing of algorithms
2 Detailed Description
Refer to lecture 24 slides for detailed descriptions of each of the four motion/change detection algorithms. The steps of your program should be roughly the following:
1) For a given video sequence, read in each image in a loop and convert it to greyscale using whatever method you are familiar with (for example by taking the green channel).
2) After reading each image, compute an output frame using each of the four motion detection algorithms and concatenate them together into a single four-panel (four quadrant) frame. Concatenation is easy to do in Matlab, since images are arrays. For example, if the four motion detection results image arrays are A, B, C and D (you will want to make them more descriptive names), the output quad image can be generated as “outimage = [A B; C D];”. Caution: the persistent frame differencing image will have a different greyscale range (0-255) than the other three methods (0-1), so when generating the output quad image you want to convert it to the range 0-1 by dividing by 255 so all output brightness ranges are compatible. Alternatively, you could scale up the intensity range of the other three by multiplying by 255.
3) Generate a numbered filename for your output image (e.g. if you are currently processing frame 0036 of the input sequence your output image will be numbered 0036) and save the image, either as a JPG or PNG. If you save as a PNG, your results will have better resolution in the final movie, since it does not compress the image while writing it to a file. Continue the loop until all input images have been processed.
4) After your program finishes generating quad image results files, generate a video of those results offline. How you do this is totally up to you. One program that can take numbered images and combine them into a movie file is “ffmpeg”, which is open-source software with versions that work on Windows, Linux or Mac. You may alternatively use Windows moviemaker, Mac iMovie, Matlab videoWriter, or whatever video creation tool you would like. Make your movies in some obviously playable format, like mp4.
Pro Tip: Although you are running all four algorithms at the same time, it is a good idea to keep their background image data structures independent from each other. For example, in the lecture pseudocode, variable B or B(t) is used to denote the current background frame. However, this image will be different for each algorithm, so you really should be keeping four current background frames, one computed by each algorithm.
Pro Tip 2: You don’t want to be explicitly keeping indexed arrays of images around, even if that is what it looks like the pseudocode is doing. For example, when the pseudocode says B(t-1) and B(t), you really just have one background image B (per algorithm), and it is interpreted as B(t-1) when you start processing incoming frame I(t), and then after you update it you can consider it to be B(t). Similarly, with M(t) and H(t) in the various pseudocodes. The same is even true of incoming image I(t)… you can just read the image file for frame t when you need it in the loop. By not keeping all images in memory at once, you can in principle process real-time streams of frames coming in from a web camera, with no upper bound on number of frames you may eventually see.