Starting from:

$35

Machine-Learning- HW12: Lunar Lander Solved

作業內容
在本次作業當中,你們將可以實做幾項 Deep Reinforcement Learning 方法:

Policy Gradient
Actor-Critic
作業的實做環境為 OpenAI 的 gym 當中的 Lunar Lander。其餘實做細節請參考助教提供的範例程式。
範例展示
Policy Gradient 方法(
Actor-Critic 方法
範例結果
繳交項目及評分標準
Python 程式碼 ( Submit on NTU COOL) 佔4分
Action List ( Submit on JudgeBoi, 沒有private set, 自動選擇最高分)
繳交項目及評分標準
More on a "valid submission “: 

agent在action list最後一個動作輸入之後,應該要輸出done。長度過長或過短的 action list都會被系統reject。

Bonus
If you successfully get 10 pts:
○ Your code will be made public to students.

○ You can submit a report in PDF format briefly describing what you have done (in English, less than 100 words) for extra 0.5 pts.

○ Reports will also be made public to students.

○ Notice, we do not have private score, so omit it in the report.

Report template
注意事項
You should finish your homework on your own.
You should NOT modify your prediction files manually.
Do NOT share codes or prediction files with any living creatures.
Do NOT use any approaches to submit your results more than 5 times a day.
Do NOT search or use additional data or pre-trained models.
Your final grade x 0.9 if you violate any of the above rules.
Prof. Lee & TAs preserve the rights to change

More products