$35
task 2.1: least squares regression for missing value prediction Download the file whData.dat, remove the outliers (as in the previous project) and collect the remaining height and weight data in two vectors
x = [x1 x2 ... xn]T and y ,
respectively.
Use the method of least squares to fit polynomial models
d
y(x) = Xwjxj
j=0
to the data. In particular, fit models for d ∈ {1,5,10} and plot your results. Use each of your resulting models to predict weight values for the outliers (i.e. the data points in whData.dat where the weight value is −1).
task 2.2: conditional expectation for missing value prediction Fit a bi-variate Gaussian to the height and weight data in x and y to model the joint density p(x,y) of heights and weights.
Given your fitted bi-variate Gaussian, use the idea of conditional expectation to predict the weight values for the outliers. That is, let xo denote the available height data of an outlier and compute
Do this either analytically as discussed in the lecture or numerically and report your results.
task 2.3: Bayesian regression for missing value prediction Use the method of Bayesian regression to fit a fifth degree polynomial
to the height and weight data in x and y. Assume a Gaussian prior
for the parameter vector w where µ0 = 0 and . Plot you resulting model and compare it to the corresponding model (d = 5) from task 2.1 task 2.4: Boolean functions and the Boolean Fourier transform In the supplementary material for lecture 07, we already briefly discussed elemental cellular automata.
A one-dimensional (or elemental) cellular automaton consists of
• an infinite sequence of cells {xi}i∈N in two possible states; typically these states are assumed to be 0 or 1, here, however, we consider
• a update rule for how to update a cell based on its and its two neighbors’ current states
At time t = 0, the xi are (randomly) initialized, and, at any (discrete) time step t, all cells are updated simultaneously. Here is an illustrative example where we use +1 = and −1 =
t ...... t + 1 ......
This update process continues forever and it is common practice to plot subsequent (finite sized) state sequences below each other to visualize the evolution of an automaton under a certain rule. Here are some examples of possible evolutions starting from the same initial configuration
rule 32 rule 108
rule 126 rule 110
From now on, we remove notational clutter. That is, instead of
we simply write y = f(x) where x ∈ {+1,−1}3 and y ∈ {+1,−1}.
The naming convention for the rules of elemental cellular automata is due to Wolfram (1984) and illustrated in the following tables.
X
y
(y + 1)/2
+1
+1
+1
−1
0
+1
+1
−1
+1
1
+1
−1
+1
+1
1
+1
−1
−1
+1
1
−1
+1
+1
−1
0
−1
+1
−1
+1
1
−1
−1
+1
+1
1
−1
−1
−1
−1
0
X
y
(y + 1)/2
+1
+1
+1
−1
0
+1
+1
−1
+1
1
+1
−1
+1
+1
1
+1
−1
−1
+1
1
−1
+1
+1
+1
1
−1
+1
−1
+1
1
−1
−1
+1
+1
1
−1
−1
−1
−1
0
rule 110 rule 126
For x ∈ {+1,−1}3, there are 23 = 8 possible input patterns for each rule. For each input pattern, there are 2 possible outputs or target values. Hence, the total number of possible rules is 223 = 256.
Using the language of pattern recognition, we may collect the possible input patterns in an 8 × 3 design matrix X and the target values of each rule in an 8 dimensional target vector y. Converting a rule’s target vector from bipolar to binary yields a byte representation of a number between 0 and 255 and provides the rule’s name.
Here is your first sub-task: For rules 110 and 126, determine
w∗ = argmin Xw w
and then compute yˆ = Xw∗
Now, compare y and yˆ. What do you observe?
note:
Don’t be alarmed! What we did here hardly makes sense and was mainly intended to prepare ourselves for what comes next.
Each possible rule y = f(x) governing the behavior of a cellular automaton is an instance of a Boolean function
f : {−1,+1}m → {−1,+1} (1)
where in our case m = 3. In what follows, we let I denote the index set of the elements xi of x, that is
.
The power set 2I of I is the set of all subsets of I. In other words,
and we note that |2I| = 2m.
Every Boolean function f of the form in (1) can be uniquely expressed as a multilinear polynomial
f(x) = X wS Yxi (2)
S∈2I i∈S
which is known as the Boolean Fourier series expansion of f. In contrast to the complex exponentials which form the basis functions for conventional Fourier analysis, here the basis functions are the parity functions
ϕS(x) = Yxi
i∈S
where by definition
ϕ∅(x) = Yxi = 1.
i∈∅
Given these definitions, we can rewrite (2) as follows
f(x) = X wS ϕS = wTϕ
S∈2I
and recognize that any Boolean function is an inner product between two vectors w ∈ R2m and ϕ ∈ {+1,−1}2m.
Here is your second sub-task: Implement a function ϕ : {+1,−1}m → {+1,−1}2m
such that ϕ = ϕ(x). To be specific, for x = [x1,x2,x3]T ∈ {+1,−1}3, your function should produce
1
x1
x2
x3 ϕ = x1 x2
x1 x3
x2 x3 x1 x2 x3
However, implement it such that is works for arbitrary m ∈ N and not just for m = 3. This will come in handy in the next project.
Here is your third sub-task: For rules 110 and 126, turn the design matrix X into a “feature” design matrix Φ, then determine w∗ = argmin w
and finally compute
yˆ = Φw∗
Now, compare y and yˆ. What do you observe?
task 2.4: nearest neighbor classifier
Implement a function that realizes an n-nearest neighbor classifier, i.e. a function that decides the class label of a test data point from the majority of the labels among the n nearest training data.
Download the files data2-train.dat and data2-test.dat of labeled 2D data and determine the recognition accuracy (percentage of correctly classified data points) of your classifier for n ∈ {1,3,5}.
Determine the overall run time for computing the 1-nearest neighbor of every data in data2-test.dat.
task 2.5: computing a kD-tree
Implement a function that computes and plots up to four different kinds of kD-trees of the data in data2-train.dat where k = 2.
Determine overall run time for computing the 1-nearest neighbor of every data in data2-test.dat by means of your kD-tree.