Machine-Learning- HW5: Sequence to sequence Solved

Starting from:

$35

Introduction to sequence to sequence
Sequence to sequence

Generate a sequence from another sequence

Translation ASR TTS

text to text speech to text text to speech

and more...

Sequence to sequence

Often composed of encoder and decoder

Encoder: encodes input sequence into a vector or sequence of vectors
Decoder: decodes a sequence one token at a time, based on 1) encoder output and 2) previous decoded tokens

HW5: Machine Translation
Neural Machine Translation
We will translate from english to traditional chinese

Cats are so cute. -> 貓咪真可愛。
A sentence is usually translated into another language with different length.

Naturally, the seq2seq framework is applied on this task.

Training datasets
Paired dataTED2020: TED talks with transcripts translated by a global community of volunteers to more than 100 language
○ We will use (en, zh-tw) aligned pairs

Monolingual dataMore TED talks in traditional Chinese
source: Cats are so cute.

Evaluation
target:貓咪真可愛。

BLEU output: 貓好可愛。

Modified[1] n-gram precision (n=1~4)
Brevity penalty: penalizes short hypotheses
○ c is the hypothesis length, r is the reference length

The BLEU score is the geometric mean of n-gram precision, multiplied by brevity penalty

Workflow
Workflow
Preprocessing
download raw data
clean and normalize
remove bad data (too long/short)
tokenization Training
initialize a model
train it with training data
Testing
generate translation of test data
evaluate the performance

Training tips
Tokenize data with sub-word units
Label smoothing regularization
Learning rate scheduling
Back-translation

Tokenize data with sub-word unitsFor one, we can reduce the vocabulary size (common prefix/suffix)
○ For another, alleviate the open vocabulary problem

○ example

■ ▁new ▁ways ▁of ▁making ▁electric ▁trans port ation ▁.

■ new ways of making electric transportation.

Label smoothing regularizationWhen calculating loss, reserve some probability for incorrect labels ○ Avoids overfitting
Learning rate schedulingLinearly increase lr and then decay by inverse square root of steps
○ Stablilize training of transformers in early stages

Back-translation (BT)
Leverage monolingual data by creating synthetic translation data

Train a translation system in the opposite direction
Collect monolingual data in target side and apply machine translation
Use translated and original monolingual data as additional parallel data to train stronger translation systems
back-translation

translated monoligual datamonolingual data

original data original data

Back-translation
Some points to note about back-translation

Monolingual data should be in the same domain as the parallel corpus
The performance of the backward model is critical
You should increase model capacity (both forward and backward), since the data amount is increased.

Requirements
Requirements
You are encouraged to follow these tips to improve your performance in order to pass the 3 baselines.

Train a simple RNN seq2seq to acheive translation
Switch to transformer to boost performance
Apply back-translation to furthur boost performance

Train a simple RNN seq2seq to acheive translation ● Running the sample code should pass the baseline!

Switch to transformer to boost performance

Change the encoder/decoder architecture to transformer based, according to the hints in sample codeRNNEncoder -> TransformerEncoder
○ RNNDecoder -> TransformerDecoder

Change architecture configurationsencoder_ffn_embed_dim -> 1024
○ encoder_layers/decoder_layers -> 4

○ #add_transformer_args(arch_args) -> add_transformer_args(arch_args)

Apply back-translation to furthur boost performance

Train a backward model by switching languagessource_lang = "zh"
○ target_lang = "en"

Remember to change architecture to transformer-base
Translate monolingual data with backward model to obtain synthetic datacomplete TODOs in the sample code.
○ all the TODOs can be completed by using commands from earlier cells.

Train a stronger forward model with the new dataif done correctly, ~30 epochs on new data should pass the baseline.

Expected Run Time
on colab with Tesla T4
Baseline
Details
Total time
Simple
2m15s x 30 epochs
1hr 8m
Medium
4m x 30 epochs
2hr
Strong
8m x 30 epochs (backward)

+ 1hr (back-translation)

+ 15m x 30 epochs (forward)
12hr 30m
TA’s training curve https://wandb.ai/george0828zhang/hw5.seq2seq.ne
Regulation
You should NOT plagiarize, if you use any other resource, you should cite it in the reference. (＊)
You should NOT modify your prediction files manually.
Do NOT share codes or prediction files with any living creatures.
Do NOT use any approaches to submit your results more than 5 times a day.
Do NOT search or use additional data or pre-trained models.
Your final grade x 0.9 if you violate any of the above rules.
Lee & TAs preserve the rights to change the rules & grades.
(＊) Academic Ethics Guidelines for Researchers by the

Ministry of Science and Technology

More products

CS1114- Homework 7 Solved

$25

Add to cart

CS1114- Homework 5 Solved

$25

Add to cart

CS 1114 - Homework 4 Solved

$25

Add to cart