This homework is intended to help you (1) learn (or refresh your understanding of) how to implement linear algebraic operations in Python using numpy (to which we refer in the code below as np); (2) practice implementing one of the simpler machine learning algorithms: step-wise classification.
1 Part 1: Python and numpy
For each of the problems below, write a method (e.g., problem1) that returns the answer for the corresponding problem. Put all your methods in one file called homework1 WPIUSERNAME.py
(e.g., homework1 jrwhitehill.py, or homework1 jrwhitehill lleshin.py for a team). See the starter file homework1 template.py. In all problems, you may assume that the the dimensions of the matrices and/or vectors are compatible for the requested mathematical operations.
Note 1: In mathematical notation we usually start indices with j = 1. However, in numpy (and many other programming settings), it is more natural to use 0-based array indexing. When answering the questions below, do not worry about “translating” from 1-based to 0-based indexes. For example, if the (i,j)th element of some matrix is requested, (where i,j ≥), you can simply write A[i,j].
Note 2: To represent and manipulate vectors and matrices, please use numpy’s array class (not the matrix class).
Note 3: While the difference between a row vector and a column vector is important when doing math, numpy does not care about this difference. Hence, please use a 1-dimensional numpy array for both cases.
1. Given matrices A and B, compute and return an expression for A + B. [ 2 pts ]
Answer (freebie!): While it is completely valid to use np.add(A, B), this is unnecessarily verbose; you really should make use of the “syntactic sugar” provided by Python’s/numpy’s operator overloading and just write: A + B. Similarly, you should use the more compact (and arguably more elegant) notation for the rest of the questions as well.
2. Given matrices A, B, and C, compute and return AB − C (i.e., right-multiply matrix A by matrix B, and then subtract C). Use dot or np.dot. [ 2 pts ]
3. Given matrices A, B, and C, return A , where represents the element-wise (Hadamard) product and represents matrix transpose. In numpy, the element-wise product is obtained simply with *. [ 2 pts ]
4. Given column vectors x and y, compute the inner product of x and y (i.e., xy). [ 2 pts ]
5. Given matrix A, return a matrix with the same dimensions as A but that contains all zeros. Use np.zeros. [ 2 pts ]
6. Given matrix A, return a vector with the same number of rows as A but that contains all ones. Use np.ones. [ 2 pts ]
7. Given square matrix A and (scalar) α, compute A + αI, where I is the identity matrix with the same dimensions as A. Use np.eye. [ 2 pts ]
8. Given matrix A and integers i,j, return the jth column of the ith row of A, i.e., Aij. [ 2 pts ]
9. Given matrix A and integer i, return the sum of all the entries in the ith row, i.e., Pj Aij. Do not use a loop, which in Python is very slow. Instead use the np.sum function. [ 4 pts ]
10. Given matrix A and scalars c,d, compute the arithmetic mean over all entries of A that are between c and d (inclusive). In other words, if S = {(i,j) : c ≤ Aij ≤ d}, then compute |S|1 P(i,j)∈S Aij. Use np.nonzero along with np.mean. [ 5 pts ]
11. Given an (n × n) matrix A and integer k, return an (n × k) matrix containing the right-eigenvectors of A corresponding to the k largest eigenvalues of A. Use np.linalg.eig to compute eigenvectors. [ 5 pts ]
12. Given square matrix A and column vector x, use np.linalg.solve to compute A−1x. Do not use np.linalg.inv or ** -1 to compute the inverse explicitly; this is numerically unstable and can, in some situations, give incorrect results. [ 5 pts ]
13. Given square matrix A and row vector x, use np.linalg.solve to compute xA−1. Hint: AB =
(BA). Do not use np.linalg.inv or ** -1 to compute the inverse explicitly. [ 5 pts ]
2 Part 2: Step-wise Classification
For the tasks below, write your code in a file called homework1 smile WPIUSERNAME.py, and put your experimental results in homework1 smile WPIUSERNAME.pdf.
In this part of the assignment you will train a very simple smile classifier that analyzes a grayscale image x ∈ R24×24 and outputs a prediction ˆy ∈ {0,1} indicating whether the image is smiling (1) or not (0). The classifier will make its decision based on very simple features of the input image consisting of binary comparisons between pixel values. Each feature is computed as
I[xr1,c1 xr2,c2]
where ri,ci ∈ {0,1,2,...,23} are the row and column indices, respectively, and I[·] is an indicator function whose value is 1 if the condition is true and 0 otherwise. In general, these features are not very good, but nonetheless they will enable the classifier to achieve an accuracy (fPC) much better than just guessing or just choosing the dominant class. Based on these features, you should train an ensemble smile classifier using step-wise classification for m = 5 features. The output of the ensemble (1 if it thinks the image is smiling, 0 otherwise) is determined by the average prediction across all m members of the ensemble. If more than half of the m ensemble predictors g(1),...,g(m) think that the image is smiling, then the ensemble says it’s a smile; otherwise, the ensemble says it’s not smiling.
Step-wise classification/regression is a greedy algorithm: at each round j, choose the jth feature (r1,c1,r2,c2) such that – when it is added to the set of j − 1 features that have already been selected – the accuracy (fPC) of the overall classifier on the training set is maximized. More specifically, at every round j, consider every possible tuple of pixel locations (r1,c1,r2,c2): if you construct an ensemble with j predictors (the j −1 you’ve already chosen, plus the current “candidate” (r1,c1,r2,c2)), is the resulting ensemble more accurate (in terms of PC on training data) than any other tuple of pixel locations during this round? If the current tuple is the best yet (for round j), then save it as your “best seen yet” for round j. If not, ignore it. Move on to the next possible tuple of pixel locations, and repeat until you’ve searched all of them. At the end of round j, you will have selected the best feature for that round, and you add it to your set of selected features. Once added, it stays in the set forever – it can never be removed. (Otherwise, it wouldn’t be a greedy algorithm at all.) Now you move on to round j + 1 until you’ve completed m = 5 rounds.
To measure the ensemble’s accuracy (fPC), you should run it on all the images in the training set, and then compare the output of the ensemble to the corresponding ground-truth labels. At the end of the entire training procedure, you should estimate how well your “machine” (ensemble smile classifier) works on a set of images not used for training, i.e., the test set.
Skeleton code: While how you write your code is up to you (subject to the vectorization constraint and also basic readability); however, to get you started, we sketched in a few functions:
• fPC (y, yhat): this takes in a vector of ground-truth labels and corresponding vector of guesses, and then computes the accuracy (PC). The implementation (in vectorized form) should only take 1-line.
• measureAccuracyOfPredictors (predictors, X, y): this takes in a set of predictors, a set of images to run it on, as well as the ground-truth labels of that set. For each image in the image set, it runs the ensemble to obtain a prediction. Then, it computes and returns the accuracy (PC) of the predictions w.r.t. the ground-truth labels.
• stepwiseRegression (trainingFaces, trainingLabels, testingFaces, testingLabels): I’ve included some visualization code, but otherwise it’s empty. You need to implement the step-wise classification described above.
Tasks:
1. Download from Canvas the starter Python file homework1 smile.py as well as the following data files: trainingFaces.npy,trainingLabels.npy,testingFaces.npy,testingLabels.npy.
2. Write code to train a step-wise classifier for m = 5 features of the binary comparison type described above; the greedy procedure should maximize fPC (which you will also need to implement). At each round, the code should examine every possible feature (r1,c1,r2,c2).. Make sure your code is vectorized to improve run-time performance (wall-clock time) [ 25 pts ].
3. Write code to analyze how training/testing accuracy changes as a function of number of examplesn ∈ {400,800,1200,1600,2000} (implement this in a for-loop)
(a) Run your step-wise classifier training code only on the first n examples of the training set.
(b) Measure and record the training accuracy of the trained classifier on the n examples.
(c) Measure and record the testing accuracy of the classifier on the (entire) test set.
Important: you must write code (a simple loop) to do this – do not just do it manually for each value of n. This is good experimental practice in general and is especially important in machine learning to ensure reproducibility of results. Using the entire training set, you should achieve a test accuracy of at least 75%. [ 8 pts ].
4. In a PDF document (you can use whatever tool you like – LaTeX, Word, Google Docs, etc. – but makesure you export to PDF), show the training accuracy and testing accuracy as a function of n. Please use the following simple format:
n trainingAccuracy testingAccuracy 400 ... ...
800 ... ...
1200 ... ...
1600 ... ...
2000 ... ...
Moreover, characterize in words how the training accuracy and testing accuracy changes as a function of n, and how the two curves relate to each other. What trends do you observe? [4 pts]
5. Visualizing the learned features: It is very important in empirical machine learning work to visualize what was actually learned during training. This can be useful for debugging to make sure that your training code is working as it should, and also to make sure your training and testing sets are selected wisely. For n = 2000, visualize the m = 5 features that were learned by (a) displaying any face image from the test set; and (b) drawing a square around the specific pixel locations ((r1,c1) and (r2,c2)) that are examined by the feature. You can use the example code in the homework1 smile.py template to render the image. Insert the graphic (just one showing all 5 features) into your PDF file. [ 3 pts].
Here’s an example that shows just one feature:
Tip on vectorization: Implement your training algorithm so that, for any particular feature (r1,c1,r2,c2), all the feature values (over all the n training images) are extracted at once using numpy – do not use a loop. Even after vectorizing my own code, running the experiments required in this assignment took about 1 hour (on a single CPU of a MacBook 2016 laptop).