Starting from:

$30

EE679 Assignment 2 Name : Solved

                                           EE679 Assignment 2 


Code file for whole assignment : “assmt2.py”​  

 

Q1 ) Pre emphasis of Input Signal Filter(z) = 1- αz-1​ I took  α = 0.95
  

 

   Audio Signal :​  “ pre_emp.wav​  ”​ 

 

Q2) Narrowband magnitude spectrum​         ​ slice using a Hamming window of  30 ms         on a segment near the centre of the given audio file 

  

Q3)Plot error signal energy (i.e. square of gain) vs poles (p)
  

The Error signal energy is decreasing as the  number of poles in the model increases. This is inline with our expectation, as discussed in class  

 

 

 

Q4) Pole zero plots for p = 6  and p = 10
  

 

 

 

It is evident from above pole zero plots that all poles of the model transfer function lie inside the unit circle and hence we can conclude that our model is stable.

 

 

Q5) LPC spectrum magnitude (i.e. the dB magnitude frequency response of the estimated all-pole filter) for each order "p". 

 

The following the actual unshifted version of the plot
 

  

 

                     Inorder to get the view of comparison  between the filters of different poles, i have shifter each plot above vertically and achieved the following plot
  

  

    Comment on the characteristics of the spectral envelope estimates.                 Our theoretical expectation is that for a 2n-pole model we must have       n -  peaks in the frequency response plot . Here the plots do seem to have      quite similar behaviour , also as the number of poles in the model      increases we can extract more information about the true vocal tract filter      characteristics like peak positions and relative amplitude gains at peaks       and also width / sharpness of different peaks.

 

 

    Comment on their shapes with reference to the short-time magnitude     spectrum computed in part 2. 

  

       This above figure shown rough envelope structure 

  

       Now comparing the number of peaks and relative positions and relative amplitudes of peaks in the envelope with the peaks in frequency response plots of all pole filters will leave us a strong clue that the actual vocal tract might be a very close to 8-pole or 10-pole model created by us using LP analysis.  So, the vocal tract cannot be a 2, 4, 6 pole filter .  

  

Q6) Measuring pitch period
 

 

Acf[shift value]   = Σn error[n] * error[n+shift value]  
 

The peaks in acf plot of Residual error  signal are at 0 , 60 , 120 , 180 ,.....  Shift values in calculating the acf .  

 

Thus the pitch period of the original input signal is 60 samples. 

We know sampling frequency = 8000 samples/sec 

 

Thus pitch period = 7.5 ms 

 

Hence fundamental frequency  = 133.3 Hz 

 

 

Comparing the Acf of original signal and residual error signal
 

  

            Autocorrelation of Original Signal and Residual error signal are closely looking similar in the position of peaks at pitch periods, this is expected.

 

           Also we can clearly say that the acf of residual error signal is more convenient and reliable plot to get the pitch period as compared to acf of original signal because acf of original signal has more peaks surrounding the pitch period peak(may due to formant structure) and where as in acf of residual error these extra peaks got suppresses significantly.  

Q7) LP re-synthesis
          For the final step in re-synthesis ( i.e  de-emphasis )  part I have used             Filter(z) = 1/(1- αz​-1)​ and took α = 0.95

          

    For the re-synthesis part I have generated multiple outputs for different pole models and have shown the corresponding plots below.

 

  

 

Both 8-​        pole model and​     10-pole model​     seem to give a closely better​    approximation of original vocal tract ..

 

Between those two 10-​   pole model gives more better approximation ,​       This is evident by looking at the reconstructed signal and original signals in the above plots…

 

Note  : Given input file “aa.wav” is only of 90ms duration but I have​    synthesized the /a/ sound for 300ms as asked in question and so I have assumed that the parameters like LP_coefficients of model , G  to be remaining same for this 300ms.  

 

        But in reality we do this analysis for small duration like 30ms where the sound signal has only one vowel in whole duration and hence we expect the statistical parameter to not change much in this small duration.

 

Still , we have achieved a very good approximation of the original signal and also we can clearly notice the vowel from the synthesized sound (using a 10-pole model).

             So, this makes us witness the fact that speech sound signals can be encoded in less number of bits ( only Lp coefficients and G value ,pitch period) than actual bits of quantizing the samples.  

Audio files: For 6 - pole   : “​synthesized_aa_6.wav​”

                   For 8 - pole   : “​synthesized_aa_8.wav​”

                   For 10 - pole : “​synthesized_aa_10.wav​”  

 

We can clearly perceive the vowel in “​synthesized_aa_10.wav​”  to be /a/

 

More products