$30
1. In the basic stereo imaging setup below, the origin of the world coordinate system W is located at the lens center of the left camera. The distance between the lens centers of the two cameras is 12 cm. The two cameras have a focal length of 50 mm and the sensor chips (real image planes) of the cameras have a physical size of 1.2 cm × 1.2 cm. The output of the cameras is a pair of digital stereo images, each of size 512 × 512 pixels. The tip of vertical pole # 1 appears in the left image at pixel location (𝑖𝑖, 𝑗𝑗) = (185,125) and appears in the right image at location (𝑖𝑖, 𝑗𝑗) = (185,115). The tip of vertical pole # 2 appears in the left image at pixel location (𝑖𝑖, 𝑗𝑗) = (185,179) and appears in the right image at location (𝑖𝑖, 𝑗𝑗) = (185,169). Compute the horizontal distance between the tips of the two poles in the world coordinate system (horizontal distance = distance in the 𝑥𝑥 direction.) Show all work to get full credits. (The integer image plane uses the i-j coordinate system with i going from top to bottom and j going from left to right.)
2. We would like to use a minimum-distance classifier formulated using linear discriminant functions 𝐷𝐷𝑖𝑖(𝑋𝑋) to classify input X into one of three classes. The prototype vectors for the three classes are given below. Find the equation of the decision boundary between classes 1 and 3 and simplify the equation into an algebra equation (not matrix equation) and then plot the decision boundary as a graph.
3. Given an input grayscale image, we would like to use Harris Corner Detector to detect interest points from the image. Write the pseudo code to compute the Local Structure Matrix A of the image at every pixel location. Do not write more than 10 lines in your pseudo code.
4. ] We would like to use the signed representation of the Histogram of Oriented Gradients (HOG) descriptor to detect human in images. In the signed representation, the histogram has 18 bins.
(a) What is the dimension of the descriptor if we assume the following parameter settings:
detection window size = 296 x 168 pixels (rows x columns), cell size = 8 x 8 pixels, block size = 3 x 3 cells, and block overlap = 8 pixels.
(b) The bin centers for the 18 histogram bins, the gradient magnitudes and gradient angles of an 8 x 8 cell are as given below, compute the histogram of the cell (before block normalization.)
Bin #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Bin centers (in degrees)
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
340
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
220
0
0
0
0
0
0
0
0
0
180
0
0
0
0
120
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Gradient Magnitudes
200
45
23
98
130
260
255
250
125
295
85
90
130
265
249
240
123
35
85
95
125
260
250
240
100
90
45
90
120
265
240
230
95
99
105
106
355
120
100
110
90
205
110
120
120
130
125
120
85
90
100
110
110
120
120
110
80
80
100
110
100
100
100
110
Gradient Angles
5. Suppose we have already computed the normalized co-occurrence matrix 𝑃𝑃[𝑖𝑖, 𝑗𝑗] of an input image using displacement vector 𝑑𝑑 = (𝑑𝑑𝑑𝑑, 𝑑𝑑𝑑𝑑), can we obtain the normalized co-occurrence matrix
𝑃𝑃′[𝑖𝑖, 𝑗𝑗] for displacement vector 𝑑𝑑′ = (−𝑑𝑑𝑑𝑑, −𝑑𝑑𝑑𝑑) without referring to the original input image? If so, how do we do that? Do not write more than six sentences. (Hint: displacement vector 𝑑𝑑′ has the same magnitude as d but in the opposite direction.)
6. Consider the camera coordinate system C and the world coordinate system W as
shown in the figure below. The origin of the camera coordinate system is located at 𝑤𝑤(𝑥𝑥,𝑦𝑦,𝑧𝑧)=𝑤𝑤(6,2,0) with respect to the world coordinate system. The x axis of the camera coordinate system is parallel to the y axis of the world coordinate system, the y axis of the camera coordinate system is parallel but points in the opposite direction of the x axis of the world coordinate system, and the z axis of the camera coordinate system is parallel to the z axis of the world coordinate system. The camera has a focal length of 45 mm and the real image plane (𝑥𝑥′, 𝑦𝑦′) of the camera is of size 1 cm × 1 cm. The real image plane is digitized into a digital image of size 1024 × 1024 pixels. Derive the 𝟑𝟑 × 𝟒𝟒 camera transform that transforms points in the world coordinate system to the pixel coordinate system of the camera.
Note: Assume that the real image plane has origin at the lower left corner, with the 𝑥𝑥′ axis pointing to the right and the 𝑦𝑦′ axis pointing upward. The digital image plane has origin (0,0) at the upper left corner, with the i axis pointing downward and the j axis pointing to the right. The range for both i and j is [0, 1023].
7. In the LeNet-5 convolutional neural network below, (a) what is the total number of links between the input layer and the C1 layer? (b) How many different parameters need to be trained for the links between the input layer and the C1 layer?
8. A deep neural network has been designed to classify the input into one of five classes. The final output layer of the network is a Softmax layer. Suppose the input to the Softmax layer is [0 7 5 0 1]𝑇𝑇, what are the final outputs of the neural network?
Hint: the formula for the Softmax function is:
9. In the Eigenface method for face recognition, we compute the distance between an input face and its reconstruction as 𝑑𝑑0 = dist(𝐼𝐼𝑅𝑅⃗, 𝐼𝐼⃗). The distance between an input face image and its reconstruction should be small. Explain why the distance will be large for a non-face input image. Do not write more than six sentences.