Starting from:

$30

CSE527 Homework-estimating the 3D pose of a person given their 2D pose Solved

In this homework we are going to work on estimating the 3D pose of a person given their 2D pose. Turns out, just regressing the 3D pose coordinates using the 2D pose works pretty well [1] (you can find the paper here (https://arxiv.org/pdf/1705.03098.pdf)). In Part One, we are going to work on reproducing the results of the paper, in Part Two, we are going to try to find a way to handle noisy measurement.

Some Tutorials (PyTorch)
You will be using PyTorch for deep learning toolbox (follow the link (http://pytorch.org) for installation).

For PyTorch beginners, please read this tutorial

(http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) before doing your homework.

Feel free to study more tutorials at http://pytorch.org/tutorials/ (http://pytorch.org/tutorials/).

Find cool visualization here at http://playground.tensorflow.org (http://playground.tensorflow.org).

Starter Code


In the starter code, you are provided with a function that loads data into minibatches for training and testing in PyTorch.

Benchmark
Train for a least 30 epochs to get a least 44mm avg error. The test result(mm error) should be in the following sequence direct. discuss. eat. greet. phone photo pose purch. sit sitd. somke wait walkd. walk walkT avg

Problem 1:

{60 points} Let us first start by trying to reproduce the testing accuracy obtained by in the paper

((https://arxiv.org/pdf/1705.03098.pdf) above using PyTorch. The 2D pose of a person is represented as a set of 2D coordinates for each of their n = 32 joints i.e Pi2D= {(x1i , yi1), . . . , (xi32 , yi32)}, where (xji, yij) are the 2D coordinates of the j’th joint of the i’th sample. Similarly, the 3D pose of a person is Pi3D = {

(x1i , yi1, zi1), . . . , (x32i , yi32, zi32)}, where (xji, yij, zij) are the 3D coordinates of the j’th joint of the i’th sample.

The only data given to you is the ground truth 3D pose and the 2D pose calculated using the camera parameters. You are going to train a network fθ : R2n → R3n that takes as input the Pi2D and tries to regress the ground truth 3D pose Pi3D. The loss function to train this network would be the L2 loss between the ground truth and the predicted pose

M

                                  L ;            for a minibatch of size M           (2)

i=1

Download the Human3.6M Dataset here (https://www.dropbox.com/s/e35qv3n6zlkouki/h36m.zip).

Bonus: Every 1mm drop in test error from 44mm till 40mm gets you 2 extra points, and every 1mm drop below 40mm gets you 4 extra points.

In [1]: # Mount your google drive where you've saved your assignment folder from google.colab import drive drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client _id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com& redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=ema il%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww. googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdri ve.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.read only

Enter your authorization code:

··········

Mounted at /content/gdrive



Collecting pykalman

 Downloading https://files.pythonhosted.org/packages/2f/62/a4adc4516bd5974aa

5583090199dd4b78d1e87018d14e9279f72ccbf0b9b/pykalman-0.9.5.tar.gz (228kB)     |████████████████████████████████| 235kB 6.3MB/s 

Building wheels for collected packages: pykalman

 Building wheel for pykalman (setup.py) ... done

 Created wheel for pykalman: filename=pykalman-0.9.5-cp36-none-any.whl size=

48464 sha256=e8f4e21597f4f3f54c03d55d5b696eb469869345fbce8db09f77b9c9a121a9f5

 Stored in directory: /root/.cache/pip/wheels/d9/e8/6a/553d9832679cb74a8434f a597c3abdb07313e40054a0adf9ac

Successfully built pykalman

Installing collected packages: pykalman

Successfully installed pykalman-0.9.5

Report the test result(mm error) in the following sequence direct. discuss. eat. greet. phone photo pose purch. sit sitd. somke wait walkd. walk walkT avg

Problem 2:

{40 points} In this task, we’re going to tackle the situation of having a faulty 3D sensor. Since the sensor is quite old it’s joint detections are quite noisy:

x^ = xGT + ϵx y^ = yGT + ϵy

z^ = zGT + ϵz

Where, (xGT, yGT, zGT) are the ground truth joint locations, (x^, y^, z^) are the noisy measurements detected by our sensor and (ϵx, ϵy, ϵz) are the noise values. Being grad students, we’d much rather the department spend money for free coffee and doughnuts than on a new 3D sensor. Therefore, you’re going to denoise the noisy data using a linear Kalman filter.

Modelling the state using velocity and acceleration: We assume a simple, if unrealistic model, of our system - we’re only going to use the position, velocity and acceleration of the joints to denoise the data. The underlying equations representing our assumptions are:

∂x

                                                           xt+1 = xt + ∂tt δt + 0.5 ∗ ∂∂2tx2t δt2     (1)

                                                                                                 ∂y                        2

yt+1 = yt + ∂tt δt + 0.5 ∗ ∂∂ty2t δt2 (2) zt+1 = zt + ∂tt δt + 0.5 ∗ ∂∂2tz2t δt2 (3) ∂z

The only measurements/observations we have (i.e our 'observation space') are the noisy joint locations as recorded by the 3D sensors ot = (x^t, y^t, z^t). The corresponding state-space would be zt = (xt, yt, zt, .

Formallly, a linear Kalman filter assumes the underlying dynamics of the system to be a linear Gaussian model i.e.

z0 ∼ N(μ0, Σ0)

zt+1 = Azt

ot = Czt ϵ ϵ

where, A and C are the transition_matrix and observation_matrix respectively, that you are going to define based on equations (1), (2) and (3). The intitial estimates of other parameters can be assumed as:

initial_state_mean := μ0 = mean(given data)

initial_state_covariance := Σ0 = Cov(given data)

transition_offset := b = 0

observation_offset := d = 0

transition_covariance := Q = I observation_covariance := R = I

The covariance matrices Q and R are hyperparameters that we initalize as identity matrices. In the code below, you must define A and C and use pykalman (https://pykalman.github.io/), a dedicated library for kalman filtering in python, to filter out the noise in the data.

(Hint: \ Gradients could be calculated using np.gradient or manually using finite differences \ You can assume the frame rate to be 50Hz)

More products