1 Data Capture and Overview
Figure 1: You will capture your cellphone images to reconstruct camera pose and 3D points.
In this assignment, you will use your cellphone images (more than 5) to reconstruct 3D camera poses and points with full bundle adjustment. Make sure you have enough baseline (translation) between images for well conditioned fundamental matrix while retaining enough number of correspondences between image. Avoid a scene dominated by a planar surface, i.e., the images need to contain many 3D objects as shown in Figure 1.
You will write a full pipeline of the structure from motion algorithm including matching, camera pose estimation using fundamental matrix, PnP, triangulation, and bundle adjustment. A nonlinear optimization is always followed by the initial estimate by linear least squares solution. The pipeline is described in Algorithm 1.
Algorithm 1 Structure from Motion
1: [Mx, My] = GetMatches(I1, ···, IN)
2: Normalize coordinate in Mx and My, i.e., x = K−1x.
3: Select two images Ii1 and Ii2 for the initial pair reconstruction.
4: [R, C, X] = CameraPoseEstimation([Mx(:,i1) My(:,i1)], [Mx(:,i2) My(:,i2)])
5: P = {P1,P2} where P
6: R = {i1,i2}
7: while |R| < N do
8: i = GetBestFrame(Mx, My, R);
9: [Ri, Ci] = PnP RANSAC([Mx(:,i) My(:,i)], X)
10: [Ri, Ci] = PnP Nonlinear(Ri Ci, [Mx(:,i) My(:,i)], X)
11: P
12: for f = 1 : |R| do
13: U = FindUnreconstructedPoints(X, Rf, i, Mx, My)
14: for j = 1 : |U| do
15: u = [Mx(Uj, i), My(Uj, i)] and v = [Mx(Uj, Rf), My(Uj, Rf)]
16: x = LinearTriangulation(u, Pi, v, PRf)
17: x = NonlinearTriangulation(X, u, Ri, Ci, v, RRf, CRf)
18: X = X ∪ x
19: end for
20: end for
21: P = P ∪ Pi and R = R ∪ i.
22: [P, X] = BundleAdjustment(P, X, R, Mx, My)
23: end while
2 Matching
Given a set of images, I1,··· ,IN, you will find matches across all images where N is the number of images similar to HW #4. Pick a reference image, Iref, and match with other images using SIFT features from VLFeat, i.e., Iref ↔ I1,··· ,Iref ↔ IN (no need to match Iref ↔ Iref).
Your matches are outlier free, i.e., bidirectional knn match → ratio test → inliers from the fundamental matrix based RANSAC. Based on the matches, you will build a measurement matrix, Mx and My:
[Mx, My] = GetMatches(I1, ···, IN)
Mx: F×N matrix storing x coordinate of correspondences
My: F×N matrix storing y coordinate of correspondences
The fth feature point in image Ii corresponds to a point in image Ij. The x and y coordinates of the correspondence is stored at (f,i) and (f,j) elements in Mx and My, respectively. If (f,i) does not correspond to any point in image Ik, you set -1 to indicate no match as shown in Figure 2.
Important: For better numerical stability, you can transform the measurements to the normalized coordinate by multiplying K−1, i.e., x = K−1x where x is 2D measured points in homogeneous coordinate. You can run structure from motion in the normalized coordinate by factoring out K. When visualizing projection in the image, the coordinate needs to be transformed back to original coordinate by multiplying K.
f
x
x
f ,i f
Mx
, j −
1
f
y
y
f ,i
My
f , j −
1
N N
F
(xf ,i , y f ,i )↔(xf , j , y f , j )↔(xf ,k , y f ,k )
Figure 2: The fth feature point in image Ii corresponds to a point in image Ij. The x and y coordinates of the correspondence is stored at (f,i) and (f,j) elements in Mx and My, respectively. If (f,i) does not correspond to any point in image Ik, you set -1 to indicate no match.
3 Camera Pose Estimation
You will write a camera pose estimation code that takes correspondences between two images, Ii1 and Ii2 where i1 and i2 are the indices of the initial images to reconstruct selected manually.
[R, C, X] = CameraPoseEstimation(u1, u2)
R and C: the relative transformation of the i2 image u1 and u2: 2D-2D correspondences
As studied in HW #4, you will compute:
1. Fundamental matrix via RANSAC on correspondences, Mx(:,i1), My(:,i2)
2. Essential matrix from the fundamental matrix
3. Four configurations of camera poses given the essential matrix
4. Disambiguation via chierality (using 3D point linear triangulation): X = LinearTriangulation(u, Pi, v, Pj)
Write-up:
Figure 3: Camera pose estimation.
(1) Visualize inlier matches as shown in Figure 3(a).
(2) Visualize camera pose and 3D reconstructed points in 3D as shown in Figure 3(b).
4 Nonlinear 3D Point Refinement
You will write a nonlinear triangulation code. Given the linear estimate for the point
T triangulation, X, you will refine the 3D point Xto minimize geometric error (reprojection error) via iterative nonlinear least squares estimation,
T !−1 T
∂f(X) ∂f(X) ∂f(X)
∆X = (b − f(X)). (1) ∂X ∂X ∂X
Write-up:
(1) Derive the point Jacobian, i.e., ∂f ∂(XX)j and write the following code. df dX = JacobianX(K, R, C, X)
(2) Write a code to refine the 3D point by minimizing the reprojection error and visualize reprojection error reduction similar to Figure 5.
X = NonlinearTriangulation(X, u1, R1, C1, u2, R2, C2)
Algorithm 2 Nonlinear Point Refinement
T
1:
2: for j = 1 : nIters do
3: Build point Jacobian, 4: Compute f(X).
5:
6: X = X + ∆X
7: end for
.
5 Camera Registration
You will register an additional image, Ij using 2D-3D correspondences.
Write-up:
(1) (3D-2D correspondences) Given 3D triangulated points, find 2D-3D matches, X ↔ u.
(2) (Perspective-n-Point algorithm) Write a code that computes 3D camera pose from 3D-2D correspondences:
[R, C] = LinearPnP(u, X)
X: n × 3 matrix containing n 3D reconstructed points u: n × 2 matrix containing n 2D points in the additional image I3 R and C: rotation and translation for the additional image.
Hint: After the linear solve, rectify the rotation matrix such that det(R) = 1 and scale C according to the rectification.
(3) (RANSAC PnP) Write a RANSAC algorithm for the camera pose registration (PnP) given n matches using the following pseudo code:
Algorithm 3 PnP RANSAC
1: nInliers ← 0
2: for i = 1 : M do
3: Choose 6 correspondences, Xr and ur, randomly from X and u.
4: [Rr, tr] = LinearPnP(ur, Xr)
5: Compute the number of inliers, nr, with respect to Rr, tr.
6: if nr nInliers then 7: nInliers ← nr
8: R = Rr and t = tr
9: end if
10: end for
Visualize 3D registered pose as shown in Figure 4.
(a) Front view (b) Top view
Figure 4: Additional image registration.
(4) (Reprojection) Visualize measurement and reprojection to verify the solution.
6 Nonlinear Camera Refinement
Given the initial estimate Ri and ti, you will refine the camera pose to minimize geometric error (reprojection error) via iterative nonlinear least squares estimation,
T !−1 T
∂f(p) ∂f(p) ∂f(p)
∆p = (b − f(p)), ∂p ∂p ∂p
u1/w1
x1
v1/w1 y1
... ,, b = ... (2)
f(p) =
un/wn xn
vn/wn yn
where p CT qT T. C ∈ R3 is the camera optical center and q ∈ S3 is the
quaternion representation of the camera rotation.
It is possible to minimize the overshooting by adding damping, λ as follows:
T !−1 T
∂f(p) ∂f(p) ∂f(p)
∆p = + λI (b − f(p)),
(3)
∂p ∂p ∂p
where λ is the damping parameter. You can try λ ∈ [0,10].
Note that the conversion between quaternion and rotation matrix is given as follows:
R ,
q , where Write-up:
(1) Derive the quaternion Jacobian to rotation using Equation (4), i.e., ∂∂Rq and write the following code. Note: ignore the normalization kqk = 1. dR dq = JacobianQ(q)
(2) Derive the rotation Jacobian to projection using Equation (2), i.e., ∂f ∂(Rp)j where
T and write the following code. Note: use vectorized form of
the rotation matrix.
df dR = JacobianR(R, C, X)
(3) Derive the expression of ∂f∂(qp)j using the chain rule.
(4) Derive the camera center Jacobian to projection using Equation (2), i.e., ∂f ∂(Cp)j and write the following code.
df dC = JacobianC(R, C, X)
(5) Write a code to refine the camera pose by minimizing the reprojection error and visualize reprojection error reduction as shown in Figure 5:
[R, C] = PnP Nonlinear(R C, u, X)
Algorithm 4 Nonlinear Camera Pose Refinement
1: CT qT T
2: for j = 1 : nIters do
3: C = p1:3, R=Quaternion2Rotation(q), q = p4:7
4: Build camera pose Jacobian for all points, 5: Compute f(p).
6: )) using Equation (3).
7: p = p + ∆p
8: Normalize the quaternion scale, p4:7 = p4:7/kp4:7k.
9: end for
.
Figure 5: Nonlinear refinement reduces the reprojection error (0.19→0.11).
7 Bundle Adjustment
You will write a nonlinear refinement code that simultaneously optimizes camera poses and 3D points using the sparse nature of the Jacobian matrix. [P, X] = BundleAdjustient(P, X, R, Mx, My)
For example, consider 3 camera poses and 2 points. The Jacobian matrix can be written as follows:
02×7
2 ∂f 1
JJp JX (5)
02×7 ∂02×3
02×3
where Jp and JX are the Jacobian for camera and point, respectively, and λ ∈ [0,10].
The normal equation, JTJ∆x = JT(b − f(x)) can be decomposed into:
A Bep
= , (6)
BT D ∆X eX
where
A = JTpJp + λI, B = JTpJX, D = JTXJX + λI ep = JTp(b − f(x)), eX = JXT(b − f(x))
where and X where I and M are the number of images and points, respectively.
The decomposed normal equation in Equation (6) allows us to efficiently compute the inverse of JTJ using Schur complement of D:
∆pb = (A − BD−1BT)−1(ep − BD−1eX),
where D is a block diagonal matrix whose inverse can be efficiently computed by inverting small block matrix:
d1
D =
...
,
dM
−1 d1
D−1 =
...
−1
dM
(7)
The bundle adjustment algorithm is summarized in Algorithm 5. Note that not all points are visible from cameras. You need to reason about the visibility, i.e., if the point is not visible from the camera, the corresponding Jacobian and measurement from J and b will be omitted, respectively.
Algorithm 5 Bundle Adjustment
T
1:and
2: for iter = 1 : nIters do
3: Empty Jp, JX, b, f, Dinv.
4:
for i = 1 : M do
5:
d = 03×3
6:
for j = 1 : I do
7:
if the ith point is visible from the jth image then
8:
J1 = 02×7I and J2 = 02×3M
9:
J
10:
J
11:
Jp
JTp
J
T
and JX
JTX
T J
12:
d
13:
b = bT
u
14:
f fT
x
where
I
15:
end if
16:
end for
17:
d = d + λI
18:
Dinv = blkdiag(Dinv, d−1)
19:
end for
20:
ep = JTp(b − f)
21:
eX = JTX(b − f)
22:
A = JTpJp + λI, B JX, D−1 = Dinv
23:
∆p = (A − BD−1BT)−1(ep − BD−1eX) b
24:
Normalize quaternions.
25:BT∆p) b
26: end for
Write-up: You will first start with two images and 10 3D points to test your bundle adjustment program.
(1) Derive Jp and JX.
(2) Run Algorithm 5 and visualize the reprojection error similar to Figure 5.
8 Putting All Things Together
Write-up: You will run with all images and 3D points based on Algorithm 1.
(1) Visualize 3D camera pose and points as shown in Figure 6.
(2) Visualize reprojection for all images.
Figure 6: You will reconstruct all images and 3D points using structure from motion