Starting from:

$24.99

CSCE689 HW2: Reproducing ChatGPT Solution

1. Create an account on HPRC (https://hprc.tamu.edu/apply/)
a. Apply for Basic Allocation on Grace (20,000 Service Units)
b. 20,000 Service Units (SUs) ~= 250 A100 (40G) GPU hours
2. Train a variant of GPT-2
a. Limit training time/resource to max 24 hours w/ one A100 40G GPU
b. Follow instructions at https://github.com/parasol-aser/hw-reproduce-chatgpt
Your Goal: train the best GPT model from scratch within the resource budget - Top 10 submissions with the highest HellaSwag accuracy will each earn 1 bonus point - Top 3 will earn 4, 2, 1 additional bonus points respectively
Your Strategies:
- Tune hyper-parameters guided by the scaling laws - Try different architectures, e.g.:
- Group Query Attention
- Replace LayerNorm by RMSNorm
- Replace absolute positional encoding by RoPE
- Replace GeLU activation function by SwiGLU
- Drop Positional Encoding
- Change KQV (e.g., merge K and Q)
- Elimination or Modification of FFN Layers
- Mixture of Experts (MoE)
- …
Submission (5pt):
- Your final model checkpoint and original logs stored on Grace (2pt)
- Need to share a folder with our grader
- Your training code (only diff is required if based on karpathy/llm.c) (1pt)
- A report that describes your solution and results (including remaining challenges and failures if any) (2pt)
- Limit your report to three pages with 10pt font size

More products