Rafael Rafailov @ NeurIPS · Mar 2, 2024 · 5:15 PM UTC

Rafael Rafailov @ NeurIPS · Mar 2, 2024 · 5:15 PM UTC

Rafael Rafailov @ NeurIPS

Rafael Rafailov @ NeurIPS

@rm_rafailov

2 Mar 2024

Doing efficient RL properly at Foundation Model scale is still an open problem in my opinion. It’s especially prominent in agent and robotics applications and we can get significant benefits from figuring this out. This work is a step in that direction.

Aviral Kumar

@aviral_kumar2

1 Mar 2024

How can we train LLM Agents, to learn from their own experience autonomously? Introducing ArCHer, a simple (i.e., small change on top of standard RLHF) and effective way of doing so with multi-turn RL 🧵⬇️ Paper: arxiv.org/abs/2402.19446 Website: yifeizhou02.github.io/archer…

Mar 2, 2024 · 5:15 PM UTC