Starting from:

$35

Machine-Learning- HW2: Phoneme Classification Solved

Task:  Multiclass Classification        M   M  M  AH AH SH SH IH  IH  IH   N   N   N   N ...

Framewise phoneme prediction from speech.                                                                  

What is a phoneme?

A unit of speech sound in a language that can serve to distinguish one word from the other.

bat / pat , bad / bed
Machine Learning → M AH SH IH N L ER N IH NG
Data Preprocessing
Acoustic Features - MFCCs (Mel Frequency Cepstral Coefficients)

 

                                                                                                                                                                                                              shape (11,39)                                                                                                                                                                                                                      label

More Information About the Data                                            
Since each frame only contains 25 ms of speech, a single frame is       prev frames    future frames unlikely to represent a complete phoneme

Usually, a phoneme will span several frames flatten  reshape to (11,39)Hint: post-processing may help
Concatenate the neighboring phonemes for trainingIn this HW, we concatenate the past and the future five frames for training (total 11 frames)
○     You may reshape the input (1,429) back to (11,39) to get separated 11 frames

○     Just remember that the label corresponds to the center frame

Finding testing labels or doing human labeling are strictly prohibited!
Introduction to Digital Speech Processing

Dataset & Data Format
Dataset: TIMIT Acoustic-Phonetic Continuous Speech Corpus
                    ○         Phonetically balanced for English

Data Format (The TAs have already preprocessed the data) timit_11/npy → training data (# of training frames, 11 x feature dim)
npy → framewise phoneme label (0-38)
npy → testing data (# of testing frames, 11 x feature dim) ● Acoustic features (39-dim MFCC)
                                    ○         Concatenate the past and the future five frames (feature dim = 11 x 39)

                                    ○            The phoneme label of each input corresponds to the center frame

Using additional data is prohibited. Your final grade will be multiplied by 0.9!
 

 Class
Phoneme
Example
 Class
Phoneme
Example
 Class
Phoneme
Example
 0
iy
beet
 13
l
lay
 26
dx
muddy
 1
ih
bit
 14
r
ray
 27
g
gay
 2
eh
bet
 15
y
yacht
 28
p
pea
 3
ae
bat
 16
w
way
 29
t
tea
 4
ah
but
 17
er
bird
 30
k
key
 5
uw
boot
 18
m
mom
 31
z
zone
 6
uh
book
 19
n
noon
 32
v
van
 7
aa
bob
 20
ng
sing
 33
f
fin
 8
ey
bait
 21
ch
choke
 34
th
thin
 9
ay
bite
 22
jh
joke
 35
s
sea
 10
oy
boy
 23
dh
then
 36
sh
she
 11
aw
bout
 24
b
bee
 37
hh
hay
 12
ow
boat
 25
d
day
 38
sil
silence/closure sounds
Sample Code
Colab Link:

https://colab.research.google.com/github/ga642381/ML2021-Spring/blob/main/HW 02/HW02-1.ipynb ●      Simple baseline

                    ○           You should able to pass the simple baseline using the sample code provided.

Strong baseline
                    ○          Model architecture (layers? dimension? activation function?)

                    ○          Training (batch size? optimizer? learning rate? epoch?)

                    ○         Tips (batch norm? dropout? regularization?)

2  Hessian Matrix
Task Introduction 
Task:  Hessian Matrix
Imagine we are training a neural network, and we try to find out whether the model reaches a local minima-like point, saddle point, or none of the above. We can make our decision by calculating the Hessian matrix. What is Hessian?

Hessian is the second order partial derivatives of a model. It is highly recommended to watch the lecture video before starting this part.

Task Introduction
The target function in this task is a one-variable sinc function.

You will get

a model checkpoint trained by TA, ● a batch of training data, ●      a loss function.
You will calculate the Hessian matrix and make the decision accordingly.

Gradient Norm / Minimum Ratio
1.  Gradient Norm

In a normal training process, we rarely have gradients equal to zero. In this homework, we regard those gradient norm less than 1e-3 as zero.

2.  Minimum Ratio
For an ideal local minima, all the eigenvalues of the hessian matrix are greater than zero. We define the proportion of positive eigenvalues as minimum ratio.

In this homework, if minimum ratio is greater than 0.5 and gradient norm is less than 1e-3, then we assume that the model is at “local minima like”.

Gradient Norm / Minimal Ratio
In this homework, we assume that

gradient norm < 1e-3 and minimum ratio > 0.5 => local minima like, ● gradient norm < 1e-3 and minimum ratio <= 0.5 => saddle point, ● gradient norm >= 1e-3 => none of the above.
Important Notice
You don’t need to and shouldn’t change any part of the code.
You can only use colab to run the code. Otherwise, your result might differ due to environmental issue.
You will get a different checkpoint according to your student ID, so please make sure to fill in your student ID in the sample code correctly.
Sample Code
Colab Link:

https://colab.research.google.com/github/ga642381/ML2021-Spring/blob/main/HW


02/HW02-2.ipynb

After executing the sample code, you should get a result like this.
Notice that each student will get a different answer, so your answer may differ from the example.
Choose your answer from local minima like, saddle point, or none of the above

More products