Starting from:

$30

CSE597-Project Solved

Overview :​ The project provides you with experience in applying neural machine translation (NMT) techniques to natural language generation (NLG), and the novel issues  involved in mapping a formal language to a natural language. The input to a machine  translation system consists of sentences from one human language, and the output  consists of translations of the same information into a different human language. In  contrast, NLG takes a semantic representation from a structured semantic language as  input, and produces a human language sentence as output.  

The data for this NLG project is from the restaurant domain, and consists of flat lists of attribute-value pairs from the E2E NLG challenge called E2E (see the README-E2E NLG Challenge). The project has two phases. In phase 1, you will work independently to become familiar with the E2E challenge systems, and you will train the E2E baseline system using an existing codebase, and compare the performance of your trained model with the E2E baseline and challenge findings. In phase 2, the class will work as a team to implement one or more new NLG models, with the goal of achieving (or exceeding) the E2E challenge results. You must use the same train/test split used in the E2E challenge, and the same automated evaluation. If you want to report extra results on other splits into train/test, or in using additional data or modified versions of the data, you can do so if you  provide a clear motivation in your presentation and report.  

Computing resources: Training neural models requires adequate computing​       resources. Students in this class have access to GPU compute resources and a large amount of data storage through Bridges, a part of the XSEDE system. There is very good documentation at the XSEDE site, including documentation for use of Bridges GPU nodes, which you will use. In addition, there is an XSEDE Quick  Start Guide for the class.  

Questions :​ Use Slack to ask questions of students in the class, one of whom is already familiar with Bridges, and to engage in discussions to help you choose your implementation in an informed way.  

YOUR FIRST STEP, WHICH YOU SHOULD COMPLETE ASAP, IS TO SIGN UP FOR YOUR XSEDE ACCESS, AND BECOME FAMILIAR WITH USING BRIDGES. Read​         

the documentation at the link above on Bridges GPU nodes. As a first step, you could upload your data and code from homework 3, make the necessary modifications for it to access the GPU resources on Bridges, and retrain your classifier.  

Phase 1 :​ Working on your own, you will create a simple model to address the problem defined in the E2E Challenge. The README points you to the  dataset, and the Github for TGen, the baseline system. You will submit a written report  in the form of a technical report, and make a brief class presentation. There will be a rubric for the technical report, which you should write in the style of a conference submission.  

Class presentation #1: Each student will make a short presentation of about​        5-10  minutes on phase 1 of their project.  Students should use the time after presentations to ask each other questions about how results were achieved, etc.

Technical Report: You report will describe the problem, data, your  application​    of the TGen model, results, and conclusions. The expected  format of the report will be defined in a rubric available in Canvas.  

Phase 2: Working as a team, students in the class will consider  other approaches​           that could improve upon the baseline model, improve the data  preprocessing, and improve the model training. You are free to try anything. Experiments that did not perform well can be reported if it shows how you arrived at conclusions that supported your later experiments. Performance is assessed on the test  data as in Phase 1.  

Final Class presentation: The final presentation will be judged on 4 criteria: a)​     your understanding of the original  NMT model and the new model; b) strength of arguments presented in  favor of the components of the improved NLG approach;

c) lessons learned from  the experiments, including insights derived from negative results; d) clarity and  efficiency of the presentation.  

Final Project Report :​ The team will submit a single written report in the form of a conference submission. A conference submission states the authors’ hypothesis, motivation for the approach, experimental materials (e.g., data), methods, results, discussion and conclusion. Other details, such as false starts that led to useful knowledge applied towards improvements can be included in appendices.  

Due Dates and submissions:   

Phase 1 Report/Slide Pack due date 3/16: Report and presentation slides to be​  submitted through Canvas.  

 

Phase 1 Class presentations 3/17: Each student will make a short  presentation in​        class of about 5 minutes. The presentation should discuss the experience of learning about NLG through replicating the baseline, the performance of their model, and ideas for future work in Phase 2.  

 

Phase 2 Report/Slide Pack due date 4/27: The class will work together as a team.​         Each member should suggest ideas,  and try experiments, but there should be a single final model that everyone can explain  and present results about.  Team report and slide pack to be submitted through Canvas.  

More products