Starting from:

$30

CSE597-Assignment  3 Solved

Introduction     
Here  you  will  implement  and  train  a  Neural  Network  LM  instead  of  the  Markov  chain  model   used  in  HW  1.  Your  Neural  Language  Model  (NLM)  will  be  evaluated  by  a  downstream  task:  the   text  classification  you  did  in  HW2.  The  primary  goal i s  to  give  you  hands-on  experience  with   neural  N-gram  languages.  Understanding  how  these n eural  models  work  will  help  you   understand  not  just  language  modeling,  but  also  the p ipeline  of  a  common  NLP  task.  This   assignment  is  also  more  “from  scratch”  than  the  previous  ones,  which  will  help  you  prepare f or   the  final  project. M ake  sure  you  have  installed  pytorch  and  numpy  to  work  on  this  assignment.     

Neural  Language  Model     
In   this   LM   task,   you   need   to   build   a   vocabulary   and   calculate   an   optimal   word   embedding   for   every   word   in   this   vocabulary   by   training   a   neural   network.   And   you   need   to   compute   the   loss   function   on   some   training   data,   then   update   the   parameters   and   the   word   embeddings   with   backpropagation.   Recall   that   in   an   n-gram   language   model   in   the   HW2,   given  a  sequence  of  words w  ,  we  want  to  compute  the  probability  ( P ):     

   

Where w  k  is      the k  -th  word  of  the l ength- k  sequence.             In  this  model,  you  should  maximize  the   probability  of  the  correct  word w  k+1 given             the  latent  representation  of  the c ontext  words.  Note   that  your k  -grams  will  come  from  a  corpus  where  the  first k   words    are  context  features  and   

the  word w  k+1    is  the  training  label.  Some  useful  background c an  be  found  in  the  class  lecture   notes  slides  58-64,  CSE597-Wk5-Mtg9-FFNs.pdf  (and a ssociated  readings).   

Task: N-Gram    Neural  Language  Model     
  

Modify  the  code  in  NML.py  to  implement  a  simple  3-gram  language  model.  Make  sure  that   your  loss  decreases  during  the  training  process,  and  the  embedding  of  the  word  can  be  used   in  the  downstream  task  (text  classification)  which  you  have  finished  in  the  HW1.  These  are  the   steps  you  need  to  complete  for  this  task:       

Step               1:  Modify  the Class    NgramLM(nn.Module) which    is  a  definition  of  the   NLM  model.     

  

 Step  2:  Use  the  NgramLM  class  to  implement  the def    training() function;            

note  that  you  should  define  the  optimizer  and  the  loss  function  first.  You  should   experiment  with  different  loss  functions.  You  may  need  a DataLoader       to  enlarge  the   batch  size.     

  

Step  3:  Finish  the def         output  (model,  file1,  file2) .  In  this  function,  you  need  to   copy  the  embedding  vocabulary  to  disk  in  the  format  of  a  GloVe  embedding  file,  called   embedding.txt .  Then  it  can  be  used  in  the  downstream  classification  task.  You  are  to  conduct   a  controlled  experiment  where  you  compare  a  condition  where  words  are  initialized  with   random  embeddings  that  have  the  same  dimensionality  as  the  GloVe  lexicon,  to  a  condition   where  you  use  the  GloVe  vectors.  The  random  embedding  vocabulary  with  the  same  words   and  the  same  embedding  dimension  with  the  n-gram  embedding  vocabulary,  but  the   embedding  vectors  are  randomly  initialized,  we  call  it random_embedding.txt.                 

  

Step  4:  Conduct  experiments  on  the  text  classification  task.  You  are  requested  to   test  the  performance  differences  between  initializing  with random_embedding.txt,              embedding.txt,  and  glove.6B.50d.txt by   the  model  from  HW1  (which  we  give  the  formal   answer  in  this  project).     

  

The  zip  file  associated  with  this  homework  has  three  *py  files  (classifier.py,  NLM.py,  run.py),  a   glove  folder  with  a  *txt  file  with  the  GloVe  vectors,  and  a  data  folder  with  three  sizes  of  review   texts  and  labels:   reviews_100.txt  and  labels_100.txt  are  small,  so  this  can  be  used  for  development  (dev  set)   and  debugging.   reviews_500.txt  and  labels_500.txt  are  the  dataset  for  this  assignment  (both  NML  part  and   

downstream  part),  and  you  will  be  graded  based  on  them.     

reviews.txt  and  labels.txt  are  much  bigger  files  than  the  others;  they  will  be  used  to   evaluate  the  bonus  part.     

Bonus:  10  points  extra  credit   
It  is  very  time  consuming  if  NML  is  trained  in  a  large d ataset,  so  we  only  use  500  data   examples  in  the  dataset  to  train  this  NML,  and   test  the  classification  task.  However,  there  are   several  methods  or  tricks  to  overcome  the  training  demands,  such  as  enlarging  the  training   batch  size  or  sampling.  If  you  can  use  the  whole  dataset  to  train  the  NML  within  the  time  limits   (1hour)  in  a  common  laptop  machine  (without  a  GPU) a nd  get  an  accuracy  >  85%  in  the   classification  task,  you  will  get  extra  credits  (10  points).     

  

Note  that  you  can  use  the  saving  and  loading  in  pytorch  to  conduct  your  experiments.  Feel   free  to  implement  any  other  helpful  functions,  but  other p ackages  are  excluded.     

Questions  
1.      How  did  the  choice  of  initialization o f  word  embeddings  affect  training  of  the  LM   and/or  performance  of  the  embeddings  in  the  HW1  classifier?     

2.     Explain  your  choice  of  loss  function,  based  on  a  comparison  with  at  least  one  other   loss  function.     

More products