$30
This is the first of three mini-projects to be demonstrated and assessed within the regular scheduled laboratory sessions. This project is due in the Week 9 laboratory session. The project is worth a nominal 10 marks, out of the 30 marks available for the laboratory component of Elec4622/Elec9722. However, there are optional elements which can attract a lot of potential bonus marks.
Get ready to put a lot of work into this project, but also to learn a huge amount from it! Success will build confidence, so invest the effort and use your remaining laboratory sessions wisely.
In this project, you will implement and explore the implications of various strategies for image resampling. To simplify matters, we consider only two cases here:
1. Reducing the size of an image by a factor of 5/3 in each direction — i.e., zoom-out.
2. Increasing the size of an image by a factor of 5/3 in each direction — i.e., zoom-in.
1 A Brief Review of the Theory
Let x[n] denote a discrete digital image, which we assume to arise from the sampling of an underlying bandlimited signal f (s). This continuous signal can then be recovered via
f (s) = [x[n]q (s−n)
n
where the interpolation kernel ideally satisfies q (s) = sinc(s1)sinc(s2).
1.1 Expansion
To expand the image by a factor of 5/3, we simply need to resample f (s) at the locations s = 35m, obtaining
(1)
n
Resizing is a separable operation. This can be deduced by recognizing that the q (s) interpolation kernel used in the above formula is separable. However, it is also intuitively obvious that we can first expand each image row and then expand each column of the result, or vice-versa. In view of this, we can simplify our analysis (and implementation) by considering the expansion in one dimension only. In this case, equation
(1) becomes
In the expanded result, those locations whose indices are multiples of 5 are found from
,
since q (s) = sinc(s) is equal to δ (n) at integer locations n. This just means that every second input sample aligns perfectly with every fifth output sample — no interpolation is required. The other locations are found from
where
That is, qσ [n] is the digital filter whose PSF is obtained by delaying (translating to the right) the basic sinc(s) function by σ and then sampling it at the integer locations. To obtain , we convolve x by and keep only the locations 3m+1 from the result. Similarly, we obtain by convolving x with and keeping only the locations 3m + 1, and so forth.
It is better computationally, and much simpler conceptually, to rewrite the above expressions as inner products. In this case we have
— noting that q (s) = q (−s)
and, similarly,
What this means is that you obtain for any given m by translating the function q−0.4 [n] across by 3m + 1 samples and taking the dot product with the input sequence — note that 3m + 1 is the nearest integer location to the desired location of . Similarly, you obtain by translating the function q0.2 [n] by 3m + 1 and taking its dot product with the input sequence — again, 3m + 1 is just the nearest integer to .
In the end, you need to evaluate and store 4 different inner product “kernels,” corresponding to q±0.2 [n] and q 0.4 [n]. Then, in every group of 5 output samples, one of them is a direct copy of the corresponding ± input sample, and each of the other four are obtained by taking inner products with the relevant q kernel, translated to the sample location which is closest to the one you want.
Of course, q±0.2 [n] and q±0.4 [n] are infinite extent sequences, so you will have to window them or apply some other approximation technique in practice. You are now in a position to develop an implementation that is closely related to what you have seen in Lab 2.
Before proceeding, it is very strongly recommended (pretty much mandatory) that you draw up a figure to convince yourself that you thoroughly understand exactly how each group of 5 output samples on a line is derived from input samples via inner products. Your figure should depict exactly what inner products are involved. The figure is useless unless you can actually put windowed sinc sample values into your figure and work out exactly what you would get for the interpolated value at a given location, based on a given set of inputs. If your program has bugs, you will need this figure, since it tells you exactly what to expect at a given location, based on the inputs that may be available. Debugging is all about comparing your expectations against what is actually happening at a given location, as reported live via the debugger. Actually using a debugger is the most efficient way to discover and fix problems, so if all you have ever done is use print statements, now is the time to make sure you know how to use the integrated debugger in the Visual Studio platform (or another platform if you are using something else).
1.2 Reduction
To reduce the image by a factor of , we need to resample f (s) at multiples of . However, to avoid aliasing, we must be sure first to reduce the bandwidth of f (s) so that fˆ(ω) ≈ 0 for ω∈/ [−3π/5,3π/5]2. Equivalently, we first shrink f (s) by a factor of , bandlimit it to [−π,π]2, and then sample at the integer locations. In this brief review, we shall adopt the former perspective, forming
where g (s) = (f ∗ hlpf)(s)
Since f (s) is just a linear combination of shifted copies of the q (s) interpolation kernel, we can obtain g = f ∗ hlpf by applying the low-pass anti-aliasing filter hlpf directly to the interpolation kernel. That is,
g (s) = [x[n]qlpf (s−n)
n
where
qlpf (s) = (q ∗ hlpf)(s)
In our case, q (s) is ideally sinc(s1)sinc(s2), while hlpf (s) is ideallyis equal to hlpf (s).[1]It follows that
n
Again, the simplest and most efficient way to implement this is to perform the reduction separably (rows then columns, or vice-versa) and adopt the inner products (output-driven) perspective, as follows. For the even locations of the output, we get
That is, we take the inner product between the input sequence and a copy of qlpf that is translated by 5m and sampled at the integer locations. Similarly, for the other locations of the output, we get
,qlpf,
That is, we need three sets of discrete kernels, and that are obtained by sampling the stretched sinc function qlpf, shifted by 0 and , then each output sample is obtained by translating the relevant kernel to the nearest integer location to the desired one — is closest to integer location 5m+2, and is closest to integer location 5m+3 — and taking the inner product between the input sequence and the shifted kernel.
It is worth sketching what is happening in the above equations on paper. In particular, you will find that everything is remarkably simple. All we are doing in every case is translating an appropriate sinc function so that it is centred at the location where we want to obtain an output sample, whereupon we take the inner product between the input samples and the samples of the translated sinc function.
Again, since sinc functions have infinite extent, you will need to window them or use some other approximation technique. Be sure to remember to use a window that has the same centre of symmetry as the underlying continuous sinc function that will be sampled.
2 How to go about the tasks
For demonstration purposes, you should create a separate project for each task, within a single workspace. You can do this by using the “File → New → Project” option in Visual C++, replacing the “Create new solution” option with “Add to solution”.
Be sure to arrange for all your executable programs to be placed in a single location (e.g., c:\elec4642\bin), which is referenced from the path environment variable. Also, please place all the images you are working with into a single directory (e.g., c:\elec4642\data). That way, it will be much easier to demonstrate your work in a time effective manner — it will also save you personally a lot of time. In any event, you will not be marked for your work unless it is organized in this way.
You should also rely much more heavily on viewing images using mi_viewer than the Windows previewer. There are lots of reasons for this, but the obvious ones are that you can launch multiple copies to view multiple images at once, and you can zoom in and inspect individual pixel values with ease. Ultimately, it is a great deal faster and less confusing to execute your programs and view your images directly from the DOS prompt, than by using mouse clicks — assuming you can type. Remember that the up and down arrows allow you to scroll through and edit previous commands easily. Remember also that you can use the tab key to expand file names, program names, etc., so you rarely have to type everything in full.
One thing you need to be prepared for is that it may be difficult at first to know whether your program is working correctly. This is a real world dilemma that you need to learn how to address. Just because your program compiles and produces something which looks like an image does not mean it is doing the right thing. So you need to have a good feeling for what the result should look like, based on your understanding of the fundamental concepts.
• To debug your program, you may find it useful to process very tiny images with only a few samples, so you can check that the output is as expected; you can use something like "mi_pipe2 -i image.bmp -o tiny.bmp -form bmp :: crop_n_shuffle -crop 0 0 16 16" to create a 16x16 image by cropping a much bigger one. You can also read the sample values directly within the “mi_viewer” application by zooming in (z accelerator) and holding the mouse button down.
• It is a good idea to rely upon the theory you have learned to verify your programs. For example, you know that a good resolution reduction algorithm should be shift invariant, or nearly so. If this is the case, you should not observe any kind of modulation or pattern in the output which did not exist in the input — pay special attention around oriented edges, which can be understood as a single feature that consistently shifts from row to row or column to column by a non-integer amount.
• You can also use apply your interpolation program to the output of your reduction program and vice-versa. In each case you will obtain an image of the same size as the original (or a little larger, in which case you can crop it down), so you can use the Media Interface tools to compare the resulting image with the original — this is a powerful way to check for any shift variance issues. One useful tool is the Media Interface “image_arith” module, which can be used to form the difference between two images as follows:
— mi_pipe2 -in1 image1.bmp -in2 image2.bmp :: image_arith -add 1 -1
It should be apparent that to verify your work, it may be helpful to collect sequences of command-line statements together into a script that you can run quickly. You can do this with a Windows batch file (i.e., any file with the ".bat" suffix), which can then be executed like any other program from the command-line. This will greatly facilitate the marking for demonstrators, who might insist upon it.
3 Tasks
Task 1: (5 marks + up to 3 possible bonus marks) Write a program which accepts a BMP image as input and writes out a reduced image, having dimensions . Notice that you should round the dimensions up to the nearest integer (ceiling function). Your program should be based upon the method of Section 2.2, with windowed sinc filters. You should adopt a Hanning window. Here are some additional specifics:
1. Your program should accept a command-line argument H (loosely interpreted as the window half-length) such that the filter PSFs (or inner product kernels) qlpf and all have their nonzero values falling within a (2H + 1)×(2H + 1) region of support. Your program should function correctly for values of H in the range 0 through to 14 at least, so that you can demonstrate the effect of different filter complexities. The special value H = 0 means, of course, that your inner products all involve exactly one coefficient — if your program is any good, this will be equivalent to nearest neighbour interpolation — check that it works correctly by looking at the numerical values with “mi_viewer.”
2. Windowing may impact the DC gain of the filter somewhat. Be sure to renormalize your filters so that the DC gain is always 1 in this application.
3. It is best to apply a continuous window function directly to the sinc function qlpf (s), then shift and sample the windowed result, so that you do not accidentally introduce further shifts due to a window that has a different centre of mass to the sinc function. Draw sketches and figure out what continuous window parameter τ should be selected for a given value of the integer parameter H that your program accepts on the command-line.
4. You are free to use either a floating-point or a fixed-point (integer) implementation, but you will probably find a floating-point implementation to be easier for this exercise.
5. Remember to apply a suitable boundary extension method prior to filtering.
6. Up to 3 bonus marks may be awarded for efficient implementation of this task. Ideally, your implementation should strive to perform as few computations as possible, while still correctly implementing the reduction filter for the specified window size. You should consider separability as an important basis for minimizing the complexity. You may also consider the option of fixed point processing with 16-bit sample and coefficient precisions. To gain these bonus marks you will need to demonstrate actual measured improvements in processing speed.
Task 2: (2 marks) Write a program which accepts a BMP image as input and writes out an expanded image, having times the number of rows and columns. The new dimensions should be . Your program in this part should be based on bi-linear interpolation (see Chapter 3 of your lecture notes). Note that bi-linear interpolation is similar to sinc-interplation, in that you are finding the nearest location in the input image to the desired one, and taking the inner product of a small neighbourhood of samples around that location with the coefficients of a kernel that depends on the shift amount. In this case, however, the kernel only has 4 coefficients and the formula is very simple. You might like to go back over the Week 5 lecture to understand what it means to interpolate an image using the bi-linear kernel, rather than a sinc kernel. You should look for artefacts in the interpolated result that might reveal the effects of aliasing, even though it will not be a very strong effect for most inputs.
Task 3: (3 marks + up to 2 possible bonus marks) Write a program which accepts a BMP image as input and writes out an expanded image, having times the number of rows and columns, as above. Your program in this part should be based on the method of Section 2.1, with windowed sincs. Here are some additional specifics:
1. The half-window size parameter H should be a command-line argument to your program, so that you can demonstrate the effect of different filter lengths. The meaning of H is, as before, that all inner products (if implemented non-separably) should involve kernels with a region of support that is limited to (2H + 1)×(2H + 1). Your program must work for values of H in the range 0 through 7, where H = 0 again corresponds to nearest neighbour interpolation — if you have implemented things correctly, this does not need to be treated as a special case.
2. Windowing may impact the DC gain of the interpolation kernels somewhat. Be sure to renormalize your coefficients so that the DC gain is always 1.
3. Remember to apply a suitable boundary extension method prior to filtering.
4. Up to 2 bonus marks may be awarded for efficient implementation of this task. Again, you should consider separability and you will need to demonstrate actual measured improvements in speed. You may also be rewarded for efficient memory utilization in the implementation of your expansion operator.
Task 4: (Up to three bonus marks) Write a program which accepts two BMP images x[n] and y [n] as input, generates a difference imageas output, and also computes the mean error and mean squared error (MSE) between the input images. If the images are each of size M ×N, these quantities are
Mean Error
and
MSE
From the MSE, you should derive the peak signal to noise ratio (PSNR), which is defined by
2552
PSNR = 10log10
MSE
for 8-bit sample values.
Note: While this task is simple, you will need to deal with images that do not have identical dimensions to perform all the investigations below. Where the input images have different dimensions, your program should ignore extra rows on the bottom or columns on the right of the larger image.
1. 1 bonus mark: Arrange for your program to print the Mean Error, MSE and PSNR for each colour plane, as well as generating the difference image.
2. 1 bonus mark: Use your program to investigate the difference between an original image and one obtained by first expanding it and then reducing it back to its original size. Compare the performance of the bi-linear and windowed sinc expansion algorithms represented by Tasks 2 and 3. Also, determine the impact of the window sizes. To receive these bonus marks you must be prepared to explain what you observe.
3. 1 bonus mark: Use your program to investigate the difference between an original image and one obtained by first reducing it and then expanding it back to its original size. Again, study the effect of window size and bi-linear vs. windowed sinc expansion. To receive these bonus marks you must be prepared to explain what you observe.
[1] If this is not obvious to you, consider what happens in the Fourier domain. qˆ(π,π]2 and 0 elsewhere. hˆlpf (ω) is equal to 1 inside [−3π/5,3π/5]2 and 0 elsewhere. Consequently,