Doing efficient RL properly at Foundation Model scale is still an open problem in my opinion. It’s especially prominent in agent and robotics applications and we can get significant benefits from figuring this out. This work is a step in that direction.
How can we train LLM Agents, to learn from their own experience autonomously?
Introducing ArCHer, a simple (i.e., small change on top of standard RLHF) and effective way of doing so with multi-turn RL 🧵⬇️
Paper: arxiv.org/abs/2402.19446
Website: yifeizhou02.github.io/archer…
Mar 2, 2024 · 5:15 PM UTC
3
33
