Edward Beeching · Jan 28, 2025 · 4:11 PM UTC

Edward Beeching

28 Jan 2025

As part of our open reproduction of R1, we have roughly reproduced DeepSeek's MATH-500 eval numbers with Hugging Face's lighteval suite. We had to improve our latex parser to get the last few %.

104

1,193

145,286

Edward Beeching · Mar 28, 2022 · 2:51 PM UTC

Edward Beeching @edwardbeeching

28 Mar 2022

A month ago I joined🤗@huggingface as a Research Scientist. They're great: opening an office in Lyon, allowing me to work on open-source projects and trusting me to define my own schedule. I am proud to have added the Decision Transformer to🤗transformers. huggingface.co/blog/decision…

424

Edward Beeching · Sep 8, 2022 · 3:22 PM UTC

Edward Beeching @edwardbeeching

8 Sep 2022

Today @huggingface 🤗 release a long awaited tutorial on training Decision Tranformers models as a blogpost and colab notebook. The is part of a series on the application of transformer models in Deep RL settings. 👉 huggingface.co/blog/train-de… #reinforcementlearning #transformers

236

Edward Beeching · Dec 15, 2022 · 3:18 PM UTC

Edward Beeching @edwardbeeching

15 Dec 2022

Sneak peak of WIP of an upcoming FPS environment for my @godotengine Reinforcement Learning library. Agents trained using async PPO and population-based training with sample-factory. 👉 github.com/edbeeching/godot_… It will soon be available on the @huggingface hub!

222

Edward Beeching · Oct 10, 2023 · 2:54 PM UTC

Edward Beeching @edwardbeeching

10 Oct 2023

We will soon release the @huggingface LLM alignment handbook. Using these recipes you can build state of the art chatbots such as Zephyr-7b, released today. Register your interest by starring the github github.com/huggingface/align… You can find out about Zephyr-7b in this thread:

208

50,047

Edward Beeching · Jul 10, 2024 · 8:05 PM UTC

Edward Beeching @edwardbeeching

10 Jul 2024

The winning AI Math Olypiad model is out! Using an approach we call Self-Consistency with Tool Integrated Reasoning. Constraints of Kaggle (T4 GPUs) required us to use activation aware quantization in order to not degrade model performance. Details and code to follow next week.

Lewis Tunstall

@_lewtun

10 Jul 2024

Introducing NuminaMath-7B-TIR, the small but mighty model that won the first progress prize of the AI Math Olympiad 🥇! > Fine-tuned with iterative SFT on DeepSeekMath-7B from @deepseek_ai > Stage 1: learn math with chain of thought samples > Stage 2: learn code with tool-integrated reasoning (TIR) > Inference: self-consistency decoding with tool-integrated reasoning to generate solutions 🤖 Model: huggingface.co/AI-MO/NuminaM… ♾️ Demo: huggingface.co/spaces/AI-MO/… This has been quite a wild journey and I am grateful to have collaborated with a cracked team of researchers from Numina and Hugging Face - kudos to @edwardbeeching @JiaLi52524397 @ben_lipkin @vwxyzjn @krasul @AlbertQJiang and Roman Soletskyi for creating high-quality datasets & training kick ass models! #AIMO #Kaggle #AIMathOlympiad

184

19,809

Edward Beeching · Apr 22, 2024 · 7:36 PM UTC

Edward Beeching @edwardbeeching

22 Apr 2024

We are proud to release the first open-source multi-modal, multi-task and multi-domain model! Called JAT. A crucial step for generalist agents. What started out as an open reproduction of GATO with @QGallouedec, @ClementRomac and myself, has evolved into a far greater project.

172

28,270

Edward Beeching · Jul 22, 2024 · 7:07 AM UTC

Edward Beeching @edwardbeeching

22 Jul 2024

Our prize winning Math recipe is now released with datasets, training code and a new 72B math model. See thread for more details:

Lewis Tunstall

@_lewtun

21 Jul 2024

We have just released the ✨NuminaMath datasets: the largest collection of ~1M math competition problem-solution pairs, ranging in difficulty from junior challenge to Math Olympiad preselection. These datasets were used to win the 1st Progress Prize of the AI Math Olympiad and consist of two subsets: ⛓️ Chain of Thought (CoT): 860k problem-solution pairs templated with CoT to enhance mathematical reasoning in natural language 🛠️ Tool-integrated reasoning (TIR): 73k synthetic solutions derived from GPT-4 with code-execution feedback to decompose hard problems into simpler subproblems that can be solved with Python Models trained on NuminaMath achieve best-in-class performance among open weight models and approach or surpass proprietary models on math competition benchmarks 🔥 Our datasets and models can be found on the 🤗 Hub: huggingface.co/collections/A…

160

39,867

Edward Beeching · Feb 2, 2024 · 3:15 PM UTC

Edward Beeching @edwardbeeching

2 Feb 2024

When I am not busy Aligning LLMs, I spend my free time developing Godot RL Agents, an RL library for the #Godot game engine. Today we released version 0.7.0 with a number of new features, bugfixes and examples. Thanks to all the contributors for creating cool example envs:

146

12,246

Edward Beeching · Dec 1, 2022 · 7:39 PM UTC

Edward Beeching @edwardbeeching

1 Dec 2022

Announcing the release of Sample Factory 2.0. A lightning fast production grade Deep RL library. Sample Factory 2.0 is a collaboration between @petrenko_ai from @uscresl and 🤗 @huggingface. 👉 github.com/alex-petrenko/sam… Find out more on this 🧵

143

Edward Beeching · Dec 21, 2022 · 8:50 PM UTC

Edward Beeching @edwardbeeching

21 Dec 2022

The last environment for my @godotengine Reinforcement Learning lib, a team FPS. The release is planned for tomorrow once I have updated the docs. 👉 github.com/edbeeching/godot_… Env source code and builds are already available on the @huggingface hub: 👉 huggingface.co/datasets?sort…

141

16,577

Edward Beeching · May 23, 2023 · 8:31 AM UTC

Edward Beeching @edwardbeeching

23 May 2023

We have a new leader on the Open LLM leaderboard. Congrats to ausboss/llama-30b-supercot! They combined chain-of-thought datasets, code explanations and instructions, snippets, logical deductions and Alpaca GPT-4 prompts. Check it out here: huggingface.co/spaces/Huggin…

131

33,688

Edward Beeching · Dec 22, 2022 · 4:13 PM UTC

Edward Beeching @edwardbeeching

22 Dec 2022

Godot RL Agents v0.4.0 has been released. 👉github.com/edbeeching/godot_… This includes: ‣ Godot 4 support ‣ 3 RL frameworks: Sample Factory, Stable Baselines 3 and rllib ‣ 2 Advanced Racing and FPS environments ‣ Updated docs (still WIP🙂) Find out more in this thread 🧵

GitHub - edbeeching/godot_rl_agents: An Open Source package that allows video game creators, AI...

An Open Source package that allows video game creators, AI researchers and hobbyists the opportunity to learn complex behaviors for their Non Player Characters or agents - edbeeching/godot_rl_agents

github.com

14,717

Edward Beeching · Jan 26, 2024 · 10:06 AM UTC

Edward Beeching @edwardbeeching

26 Jan 2024

Thanks to the community for their feedback on DPO vs. IPO vs. KTO. In particular, we thank the authors of IPO, who have worked with us this week to improve TRL's IPO implementation. IPO now is comparable to DPO! Check out the updated blogpost. huggingface.co/blog/pref-tun…

20,711

Edward Beeching · Jan 28, 2025 · 4:11 PM UTC

Edward Beeching @edwardbeeching

28 Jan 2025

Follow us in our journey of open reproduction of the data and training methodology of the model at: github.com/huggingface/open-…

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

7,556

Edward Beeching · Dec 16, 2024 · 7:52 PM UTC

Edward Beeching @edwardbeeching

16 Dec 2024

Today we demonstrate how the performance Llama 1B can be scaled to outperform Llama 8B with tree search, guided by a Process Reward Model. During our efforts to replicate DeepMind's Test Time Compute paper, we found that beam search resulted in poor diversity when n>16. 👇🧵

6,482

Edward Beeching · Aug 29, 2024 · 2:54 PM UTC

Edward Beeching @edwardbeeching

29 Aug 2024

We have added Online Direct Preference Optimization to TRL. We observe that online methods, while slower to optimize, outperform their offline counterparts at various model scales.

22,165

Edward Beeching · Nov 10, 2023 · 3:09 PM UTC

Edward Beeching @edwardbeeching

10 Nov 2023

After many requests, v0.1 the LLM alignment handbook is now available. We've worked hard to make this as accessible as possible, so you can run: 🏋️‍♂️ Full fine-tuning with @MSFTDeepSpeed ZeRO-3 on A100s 🐭 LoRA or QLoRA fine-tuning on consumer GPUs Code: github.com/huggingface/align…

GitHub - huggingface/alignment-handbook: Robust recipes to align language models with human and AI...

Robust recipes to align language models with human and AI preferences - huggingface/alignment-handbook

github.com

8,893

Edward Beeching · May 11, 2023 · 3:15 PM UTC

Edward Beeching @edwardbeeching

11 May 2023

It seems like every week there is a new LLM or chatbot being released. In order to keep track of the progress of the open-source community, I created the🤗open LLM leaderboard. It benchmarks against 4 key metrics from the @EleutherAI LM Harness. huggingface.co/spaces/Huggin…

15,213

Edward Beeching · May 28, 2024 · 9:45 AM UTC

Edward Beeching @edwardbeeching

28 May 2024

A year ago I created the Open LLM Leaderboard. Now it has over 10,000 likes and is the #2 Space. In the next month it will overtake Stable Diffusion and become the #1 Space on Hugging Face!

clem 🤗

@ClementDelangue

28 May 2024

The Open LLM leaderboard is now the #2 most liked space ever on @huggingface with 10,000+ likes (huggingface.co/spaces?sort=l…)! Also, there are now hundreds of leaderboards for tons of different tasks, domains, languages,... on spaces (huggingface.co/spaces?search…) Very cool to see HF becoming the place to be for AI evaluation!

24,988

Edward Beeching · Mar 11, 2025 · 9:38 PM UTC

Edward Beeching @edwardbeeching

11 Mar 2025

Over the past few weeks, we've been focused on pushing the boundaries of competitive programming models by reproducing key elements of DeepSeek-R1. Today, we're excited to release 3 open-source artifacts: 🧵

4,855

Edward Beeching · Jan 18, 2024 · 3:08 PM UTC

Edward Beeching @edwardbeeching

18 Jan 2024

In our latest blog post, we summarize our extensive evaluation of three state of the art alignment algorithms. DPO vs IPO vs KTO. The results demonstrate a complex interaction between key hyper-parameters, models and datasets. #RLHF #DPO huggingface.co/blog/pref-tun…

Preference Tuning LLMs with Direct Preference Optimization Methods

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

17,099

Edward Beeching · Oct 17, 2021 · 9:57 AM UTC

Edward Beeching @edwardbeeching

17 Oct 2021

Proud to release an RL interface for the @godotengine Included are wrappers for both Ray RLlib @raydistributed and StableBaselines3 @araffin2 Find out more on the GitHub page github.com/edbeeching/godot_…

Edward Beeching · Apr 12, 2024 · 12:09 PM UTC

Edward Beeching @edwardbeeching

12 Apr 2024

Does your LLM know what a pizza looks like? You need a Vision Language Model. Here at @huggingface we have just added VLM finetuning support to TRL's SFTTrainer.

19,934

Edward Beeching · Feb 20, 2024 · 9:28 PM UTC

Edward Beeching @edwardbeeching

20 Feb 2024

I've added a Fall Guys style environment to Godot RL Agents. The agent learned its behavior fairly quickly, 20 minutes / 2M steps of PPO and default hyperparameters. Check out the library and more examples here: github.com/edbeeching/godot_… The amazing assets are from @KayLousberg

3,059

Edward Beeching · Jan 28, 2025 · 7:47 PM UTC

Edward Beeching @edwardbeeching

28 Jan 2025

Replying to @jaxgriot

You can see other roadmap on the repo:

3,855

Edward Beeching · Jul 4, 2024 · 7:24 PM UTC

Edward Beeching @edwardbeeching

4 Jul 2024

One of my contributions was the Tree of Thoughts algorithm that interleaved generation with code execution and correction. The constraints of running on Kaggle required an optimized and elegant solution to scale up to majority voting with 48 candidate solutions per problem.

3,401

Edward Beeching · Apr 12, 2024 · 12:09 PM UTC

Edward Beeching @edwardbeeching

12 Apr 2024

Fine-tune Vision Language Models in a few lines of code:

3,289

Edward Beeching · Feb 7, 2020 · 2:09 PM UTC

Edward Beeching @edwardbeeching

7 Feb 2020

Happy to announce that our recent work on augmenting a Deep RL agent with differentiable projective geometry and spatially structured memory is now available on ArXiv. arxiv.org/abs/2002.02286

Edward Beeching · Mar 2, 2023 · 8:14 AM UTC

Edward Beeching @edwardbeeching

2 Mar 2023

To celebrate the release of #GodotEngine 4.0, I have added a tutorial on creating custom Godot RL envs in Godot RL Agents: github.com/edbeeching/godot_… The tutorial was created as part of the Hugging Face Deep RL course, check it out to learn about Deep RL! 👉huggingface.co/deep-rl-cours…

GitHub - edbeeching/godot_rl_agents: An Open Source package that allows video game creators, AI...

An Open Source package that allows video game creators, AI researchers and hobbyists the opportunity to learn complex behaviors for their Non Player Characters or agents - edbeeching/godot_rl_agents

github.com

1,531

Edward Beeching · Jul 4, 2024 · 7:24 PM UTC

Edward Beeching @edwardbeeching

4 Jul 2024

@huggingface are proud to have teamed up with Numina and @MistralAI to win the first AI Math Olympiad. We will be sharing the details of our method over the coming weeks. This will include open source models, training code and evaluation pipelines.

Jia Li

@JiaLi52524397

4 Jul 2024

Six months ago, we launched Numina to lead open research in AI4Math. Today we are super excited to share that our Numina Math 7B model won the 1st progress prize of the AI Math Olympiad 🔥🔥🔥 kaggle.com/competitions/ai-m…

1,786

Edward Beeching · Apr 4, 2024 · 7:59 AM UTC

Edward Beeching @edwardbeeching

4 Apr 2024

Imitation Learning support has been added to Godot RL Agents, you can now learn complex behaviours from player demonstrations and then fine-tune with RL. Check out the trained agent (a Neural Network) from our example game.

1,961

Edward Beeching · May 2, 2022 · 7:54 AM UTC

Edward Beeching @edwardbeeching

2 May 2022

Tuesday the 3rd of May at 10am CEST I will be defending my PhD thesis "Large Scale Automatic Learning of Autonomous Agent Behavior with Structured Deep Reinforcement Learning". I will be livestreaming the defense, you are all welcome to come watch. piped.video/vHiEB5LDEho

PhD Defense: Edward Beeching

PhD Defense of "Large-scale Automatic Learning of Autonomous Agent ...

youtube.com

Edward Beeching · Dec 9, 2022 · 4:12 PM UTC

Edward Beeching @edwardbeeching

9 Dec 2022

I'm updating my @godotengine Reinforcement Learning library to Godot 4 and adding @huggingface integration. 👉 github.com/edbeeching/godot_… I am also adding a number of example games, such as this racing game. Are there any other example games people would like me to add?

Edward Beeching · Jan 28, 2025 · 4:11 PM UTC

Edward Beeching @edwardbeeching

28 Jan 2025

Commands to launch the evals in openr1: sbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B math_500 sbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Qwen-7B math_500 sbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Qwen-14B math_500 ...

5,661

Edward Beeching · Jul 13, 2020 · 8:03 PM UTC

Edward Beeching @edwardbeeching

13 Jul 2020

Happy to announce that a preprint of our ECCV 2020 spotlight paper "Learning to plan with uncertain topological maps" is now on Arxiv arxiv.org/abs/2007.05270 . We approximate a classical path planning algorithm and learn to plan under uncertainty. @chriswolfvision @chroma_inria

Learning to plan with uncertain topological maps

We train an agent to navigate in 3D environments using a hierarchical strategy including a high-level graph based planner and a local policy. Our main contribution is a data driven learning based...

arxiv.org

Edward Beeching · Apr 22, 2024 · 7:36 PM UTC

Edward Beeching @edwardbeeching

22 Apr 2024

We open-source the datasets and codebase for training JAT. We look forward to the community's contribution to this project. Find out more in our blogpost: huggingface.co/blog/jat

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

727

Edward Beeching · Apr 12, 2024 · 12:09 PM UTC

Edward Beeching @edwardbeeching

12 Apr 2024

Check out out our demo, example model and blogpost for more details: huggingface.co/spaces/Huggin… huggingface.co/HuggingFaceH4… hf.co/blog/vlms

VLM Playground - a Hugging Face Space by HuggingFaceH4

Discover amazing ML apps made by the community

huggingface.co

2,487

Edward Beeching · May 23, 2023 · 3:00 PM UTC

Edward Beeching @edwardbeeching

23 May 2023

Today marks the 0.5.0 release of an RL interface I develop for the @GodotEngine. The release adds many new features, including ONNX support to export and run trained agents in Godot without the need for python. Check it out: github.com/edbeeching/godot_…

GitHub - edbeeching/godot_rl_agents: An Open Source package that allows video game creators, AI...

An Open Source package that allows video game creators, AI researchers and hobbyists the opportunity to learn complex behaviors for their Non Player Characters or agents - edbeeching/godot_rl_agents

github.com

1,023

Edward Beeching · Jan 7, 2022 · 2:55 PM UTC

Edward Beeching @edwardbeeching

7 Jan 2022

@chriswolfvision your last day here at INSA Lyon / LIRIS. Best of luck at NaverLabs and enjoy the C64!

Edward Beeching · Jun 27, 2024 · 7:25 AM UTC

Edward Beeching @edwardbeeching

27 Jun 2024

After 3 months of hard work, we are at the top of the leaderboard for the first AI Math Olympiad!

Lewis Tunstall

@_lewtun

26 Jun 2024

Good data is all you need

1,160

Edward Beeching · Dec 22, 2022 · 4:13 PM UTC

Edward Beeching @edwardbeeching

22 Dec 2022

Sample Factory integration allows for the training of complex AI behaviors, such as this team FPS game. Pew Pew Pew!

543

Edward Beeching · Oct 13, 2022 · 1:40 PM UTC

Edward Beeching @edwardbeeching

13 Oct 2022

Christmas has come early @huggingface 🤗 #RTX4090

Edward Beeching · Oct 10, 2023 · 2:54 PM UTC

Edward Beeching @edwardbeeching

10 Oct 2023

You can try out Zephyr-7b here: huggingfaceh4-zephyr-chat.hf… This work was done with my H4 colleagues @_lewtun @natolambert @nazneenrajani and many others at 🤗!

435

Edward Beeching · Aug 29, 2024 · 4:16 PM UTC

Edward Beeching @edwardbeeching

29 Aug 2024

As part of TRL's v0.10.1 release I also added liger kernel support to TRL's SFT Trainer, it works with DeepSpeed zero3 out of the box and enables a 4x larger batch size! Thanks to the amazing open source work from AI researchers @LinkedIn

Lewis Tunstall

@_lewtun

29 Aug 2024

TRL v0.10.1 is here and it's beefy 💪 🔁 Online DPO by @GoogleDeepMind for aligning better LLMs 🐯 Liger kernel integration from @LinkedIn to supercharge SFT 🖼️ DPO for VLMs: 🌋 LLaVa, ✨ PaliGemma, 🐶 Idefics2 👩‍⚖️ Use LLMs as a judge for to compute win rates during training 🔍 Anchored Preference Optimization by @ContextualAI for fine-grained human/AI feedback github.com/huggingface/trl/r…

1,881

Edward Beeching · Nov 27, 2020 · 8:55 AM UTC

Edward Beeching @edwardbeeching

27 Nov 2020

Replying to @chriswolfvision

Christmas has definitely come early!

Edward Beeching · Jul 11, 2024 · 7:39 PM UTC

Edward Beeching @edwardbeeching

11 Jul 2024

We have published a blog post with more details on how we trained and deployed our AIMO-winning model. Find out more about the Self-Consistency with Tool-Integrated Reasoning decoding algorithm (SC-TIR) that I implemented for the winning pipeline. huggingface.co/blog/winning-…

How NuminaMath Won the 1st AIMO Progress Prize

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

839

Edward Beeching · Dec 16, 2024 · 7:52 PM UTC

Edward Beeching @edwardbeeching

16 Dec 2024

So I developed a novel method called Diverse Verifier Tree Search, which outperforms beam search at large n. We take a deep dive into the details in our blog post: huggingface.co/spaces/Huggin…

Scaling test-time compute - a Hugging Face Space by HuggingFaceH4

The toolkit lets you provide a text prompt (e.g., a math problem) to a language model and run it with test‑time search methods such as Best‑of‑N, beam search or Diverse Verifier Tree Search. It sco...

huggingface.co

1,057

Edward Beeching · Oct 10, 2023 · 2:54 PM UTC

Edward Beeching @edwardbeeching

10 Oct 2023

For SFT we used UltraChat, which consists of ~1.6M dialogues generated by gpt-3.5 We originally trained on all the data, but found the resulting model had an annoying personality 😅. So we filtered this down to ~200k examples that focused on helpfulness huggingface.co/datasets/stin…

openbmb/UltraChat · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

407

Edward Beeching · Dec 22, 2022 · 4:13 PM UTC

Edward Beeching @edwardbeeching

22 Dec 2022

We also include an example Racing game. What environments or functionality would you like to see in the next version of Godot RL Agents?

496

Edward Beeching · Jun 20, 2024 · 1:55 PM UTC

Edward Beeching @edwardbeeching

20 Jun 2024

We have just released an Imitation Learning tutorial for Godot RL Agents as part of Hugging Face's Deep RL class. Learn how to train an agent to solve this complex RL environment. huggingface.co/learn/deep-rl…

455

Edward Beeching · Oct 10, 2023 · 2:54 PM UTC

Edward Beeching @edwardbeeching

10 Oct 2023

Training wise, we used 🤗 TRL and DeepSpeed ZeRO-3 for all our experiments: - SFTTrainer: huggingface.co/docs/trl/sft_… - DPOTrainer: huggingface.co/docs/trl/dpo_… Total compute cost: $500 or 8h on 16 x A100s Kudos to @krasul for implementing DPO in TRL!

SFT Trainer · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

310

Edward Beeching · Mar 11, 2025 · 9:38 PM UTC

Edward Beeching @edwardbeeching

11 Mar 2025

Our OlympicCoder-32B model achieves top-tier performance, surpassing all open-weight models we tested—even some 100x larger! Learn more about how we built the dataset, benchmark, and models: huggingface.co/blog/open-r1/…

Open R1: Update #3

A Blog post by Open R1 on Hugging Face

huggingface.co

576

Edward Beeching · Jul 22, 2024 · 7:07 AM UTC

Edward Beeching @edwardbeeching

22 Jul 2024

We also wanted to share with the community the winning recipe, so we also have release the training code for those who want to take a deeper dive into LLMs for Mathematics! github.com/project-numina/ai…

GitHub - project-numina/aimo-progress-prize

Contribute to project-numina/aimo-progress-prize development by creating an account on GitHub.

github.com

580

Edward Beeching · Feb 7, 2024 · 8:39 AM UTC

Edward Beeching @edwardbeeching

7 Feb 2024

Tomorrow, February 8 at 11 AM Pacific Time (8PM CET) we will be presenting a workshop on aligning LLMs with DPO. We will discuss the theory behind it and get hands-on with the Hugging Face Transformer Reinforcement Learning (TRL) library. Register now: eventbrite.com/e/aligning-ll…

Aligning LLMs with Direct Preference Optimization

We will discuss a powerful alignment technique called Direct Preference Optimisation (DPO) which was used to train Zephyr.

eventbrite.com

388

Edward Beeching · Mar 11, 2025 · 9:38 PM UTC

Edward Beeching @edwardbeeching

11 Mar 2025

- CodeForces-CoTs – A dataset of 100k competitive programming samples in C++ and Python. - The IOI Benchmark – A new set featuring 2024 International Olympiad in Informatics problems. - OlympicCoder Models (7B & 32B) – Fine-tuned models that outperform closed-source models

589

Edward Beeching · Aug 29, 2024 · 2:54 PM UTC

Edward Beeching @edwardbeeching

29 Aug 2024

First introduced in a paper by @ShawnGuo13 at @GoogleDeepMind , Online DPO is a new alignment method to boost the performance of LLMs. The integration is the result of a fantastic collaboration between @ShawnGuo13 , @mnoukhov, @vwxyzjn , @QGallouedec, @_lewtun and myself.

707

Edward Beeching · Jul 22, 2024 · 7:07 AM UTC

Edward Beeching @edwardbeeching

22 Jul 2024

To build a strong math model, the team at projectnumina.ai led by @JiaLi52524397 built two datasets of math problems with 1M examples, comprising of problems answered with Chain of Thought and Tool Integrated Reasoning: huggingface.co/datasets/AI-M… huggingface.co/datasets/AI-M…

Project Numina

A non-profit building open-source AI for mathematical collaboration.

projectnumina.ai

680

Edward Beeching · Mar 13, 2024 · 9:03 AM UTC

Edward Beeching @edwardbeeching

13 Mar 2024

@huggingface has released StarChat2, a programming assistant based on BigCode's StarCoder2. We used a variant of the Zephry recipe to add chat to this strong math and code capabilities of StarCoder2. Demo: huggingface.co/spaces/Huggin… Training code: github.com/huggingface/align… MT Bench

387

Edward Beeching · Oct 10, 2023 · 2:54 PM UTC

Edward Beeching @edwardbeeching

10 Oct 2023

Zephyr is a mistral-7b finetune that outperforms llama2-70b on MT Bench and is the highest performing 7b model on the Open LLM Leaderboard. We used a combination of instruction fine-tuning and Direct Preference Optimization on publicly available datasets.

611

Edward Beeching · Oct 10, 2023 · 2:54 PM UTC

Edward Beeching @edwardbeeching

10 Oct 2023

For DPO we used UltraFeedback, which contains 64k prompts and completions spanning a wide range of open and closed access models. Each completion is ranked by GPT-4 according to criteria like helpfulness, and given a score to derive AI preferences from. hf.co/datasets/openbmb/Ultra…

openbmb/UltraFeedback · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

330

Edward Beeching · Jan 26, 2024 · 10:06 AM UTC

Edward Beeching @edwardbeeching

26 Jan 2024

All the code is open source in the alignment handbook: github.com/huggingface/align…

428

Edward Beeching · Oct 10, 2023 · 2:54 PM UTC

Edward Beeching @edwardbeeching

10 Oct 2023

For evaluations we used the excellent MT Bench from @lmsysorg This multi-turn benchmark evaluates chatbot capabilities across various domains like creative writing, code and math. It provides a much higher signal on chatbot perf than other leaderboards huggingface.co/spaces/lmsys/…

MT Bench - a Hugging Face Space by lmsys

This app lets you choose a question category and a specific benchmark question, then view responses generated by selected language models. You can see a single model’s answer with its grading, or c...

huggingface.co

367

Edward Beeching · Jul 22, 2024 · 7:07 AM UTC

Edward Beeching @edwardbeeching

22 Jul 2024

@_lewtun and I used this data for two stage fine-tuning. For the competition, we released a 7B model. We wanted to see how our recipe scales, today we release a 72B model with comparable performance to GPT-o when evaluated with Tool Integrated Reasoning. huggingface.co/AI-MO/NuminaM…

AI-MO/NuminaMath-72B-TIR · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

713

Edward Beeching · Jun 18, 2024 · 12:58 PM UTC

Edward Beeching @edwardbeeching

18 Jun 2024

We've just released Godot RL agents version 0.8.0. Which adds Imitation Learning support and multi-policy training. Thanks to all the contributors, find out more details in the release notes: github.com/edbeeching/godot_…

Release Release 0.8.0 · edbeeching/godot_rl_agents

What's Changed Imitation learning support by @Ivan-267 in #169 Adds downfall env to tests and hub by @edbeeching in #176 Add SB3 script instructions link to readme by @Ivan-267 in #178 Update...

github.com

296

Edward Beeching · Dec 16, 2024 · 7:52 PM UTC

Edward Beeching @edwardbeeching

16 Dec 2024

We open source the code here: lnkd.in/erQDWyKU

This link will take you to a page that’s not on LinkedIn

lnkd.in

1,467

Edward Beeching · Jul 10, 2024 · 8:15 PM UTC

Edward Beeching @edwardbeeching

10 Jul 2024

Preference Alignment for Multimodal models is now supported in TRL, amazing work by @QGallouedec and the team at @huggingface ! What algorithm should we implement next?

Quentin Gallouédec @QGallouedec

10 Jul 2024

🤔 Can we train a VLM to 𝐩𝐫𝐞𝐟𝐞𝐫? This is now possible, thanks to the new TRL/DPO support for VLMs! 🎉 As an example, we've trained a model to reduce hallucinations. Check out: 📰 Blog post: huggingface.co/blog/dpo_vlm 🐙 TRL: github.com/huggingface/trl Thanks to @mervenoyann, @vwxyzjn and @krasul who helped me with this work!

592

Edward Beeching · Dec 15, 2022 · 6:58 PM UTC

Edward Beeching @edwardbeeching

15 Dec 2022

Replying to @jtatuskoAI @godotengine @huggingface

The agent's observations are raycasts rather than pixels, so during training I do not render any cameras and run headless. You can also accelerate the rate of the physics which gets a nice speedup.

184

Edward Beeching · Mar 28, 2022 · 5:43 PM UTC

Edward Beeching @edwardbeeching

28 Mar 2022

Replying to @esthermakes_ @huggingface

A broad range of things: implementing new models in the transformers library, reading papers, working on open-source projects such as my Deep RL interface for the Godot Game Engine, building environments for Embodied AI, and sharing expertise with the rest of the team here at 🤗

Edward Beeching · Mar 20, 2025 · 12:53 PM UTC

Edward Beeching @edwardbeeching

20 Mar 2025

Replying to @danielhanchen @natolambert

The DAPO paper did some nice ablations of this and confirmed our intuition / more limited empirical observations: arxiv.org/pdf/2503.14476

125

Edward Beeching · Aug 29, 2024 · 2:54 PM UTC

Edward Beeching @edwardbeeching

29 Aug 2024

It is easy to get started with Online DPO, check out the example script: github.com/huggingface/trl/b…

584

Edward Beeching · Mar 25, 2022 · 10:32 AM UTC

Edward Beeching @edwardbeeching

25 Mar 2022

Replying to @aravindr93 @ClementDelangue @chelseabfinn @SurajNair_1 @Vikashplus

Hi, I had a look at your GitHub and it should be fairly easy to integrate the model in the transformers library and host the model checkpoints on the🤗Hub. The dataset license is indeed restrictive, we are looking into this. I will send you an email about the model integration.

Edward Beeching · Jun 5, 2020 · 12:38 PM UTC

Edward Beeching @edwardbeeching

5 Jun 2020

Delighted to see that our work on augmenting RL agents with egocentric neural memory has been accepted to ECML-PKDD 2020!

Christian Wolf (🦋🦋🦋)@chriswolfvision

5 Jun 2020

How to automatically discover objects and affordances from reward through projective egocentric memory: @edwardbeeching's paper has been accepted to ECM-PKDD 2020 (with Jilles Dibangoye, Olivier Simonin and yours, truly). @chroma_inria @LIRISLyon @citi_lab

Edward Beeching · Mar 25, 2024 · 1:07 PM UTC

Edward Beeching @edwardbeeching

25 Mar 2024

Replying to @_lewtun @teknium

Thanks for highlighting this powerful @huggingface Datasets feature, although I don't think you can infer that 99.9% of convs are single turn, just that 99.9% of convs have between 2-8 turns.

190

Edward Beeching · Dec 20, 2022 · 4:00 PM UTC

Edward Beeching @edwardbeeching

20 Dec 2022

Replying to @DrJimFan

Try it out now on a @huggingface Space! 👉 hf.co/spaces/osanseviero/poi…

285

Edward Beeching · Apr 14, 2024 · 12:29 PM UTC

Edward Beeching @edwardbeeching

14 Apr 2024

Replying to @signed_adam @_lewtun @huggingface

AGI achieved

Edward Beeching · Feb 2, 2024 · 3:15 PM UTC

Edward Beeching @edwardbeeching

2 Feb 2024

and a 3D Lunar lander example:

243

Edward Beeching · Nov 18, 2020 · 11:51 AM UTC

Edward Beeching @edwardbeeching

18 Nov 2020

Replying to @araffin2

For this run of the algorithm the best one at 10M steps is A? But let me guess, it is the same algorithm?

Edward Beeching · Feb 21, 2024 · 8:07 AM UTC

Edward Beeching @edwardbeeching

21 Feb 2024

Replying to @punchesbears @KayLousberg

Yes, we just added (experimental) imitation learning support in this PR: github.com/edbeeching/godot_…

Imitation learning support by Ivan-267 · Pull Request #169 · edbeeching/godot_rl_agents

Motivation: Supporting pre-training with GAIL could be helpful in some environments where it might be difficult to define a good dense reward function that results in the desired behavior. One use ...

github.com

Edward Beeching · Mar 27, 2024 · 8:49 PM UTC

Edward Beeching @edwardbeeching

27 Mar 2024

Anonymous comment from a colleague earlier: "Ok DeepSpeed has defeated me for another day. Will revisit tomorrow." I have a love-hate relationship with DeepSpeed, when it works it is magical, but it can be quite frustrating to debug when it doesn't work out of the box.

337

Edward Beeching · Dec 16, 2024 · 7:54 PM UTC

Edward Beeching @edwardbeeching

16 Dec 2024

Replying to @konarkmodi @_lewtun @GoogleDeepMind

fixed, let us know if you spot anything else

110

Edward Beeching · Dec 22, 2022 · 4:13 PM UTC

Edward Beeching @edwardbeeching

22 Dec 2022

Physics can be sped up for all environments, enabling accelerated training. Cut training time from hours to minutes.

422

Edward Beeching · Feb 2, 2024 · 3:15 PM UTC

Edward Beeching @edwardbeeching

2 Feb 2024

We also added a bunch of new examples including, learning how to park a car:

164

Edward Beeching · Dec 18, 2023 · 3:54 PM UTC

Edward Beeching @edwardbeeching

18 Dec 2023

Replying to @abidlabs

Do {thing} or we are going to go wash your hair. Works every time, apart from when you need to wash their hair.

468

Edward Beeching · Feb 2, 2024 · 3:15 PM UTC

Edward Beeching @edwardbeeching

2 Feb 2024

Racer hovercrafts!

132

Edward Beeching · May 11, 2023 · 3:15 PM UTC

Edward Beeching @edwardbeeching

11 May 2023

You can submit your own models for evaluation at the bottom of the leaderboard and they will be queued and run automatically on spare nodes on the 🤗 research cluster! You can even submit delta weights for non-commercial models such as llama.

305

Edward Beeching · Jan 18, 2024 · 3:08 PM UTC

Edward Beeching @edwardbeeching

18 Jan 2024

A reminder: * DPO: casts the RLHF objective via a loss based on a prompt and its positive and negative completions * IPO: has an identity function rather than DPO's sigmoid that can potentially cause overfitting * KTO: rather than +ve, -ve pair takes unpaired good and bad data

281

Edward Beeching · Feb 2, 2024 · 3:15 PM UTC

Edward Beeching @edwardbeeching

2 Feb 2024

Full release notes here: github.com/edbeeching/godot_…

Release v0.7.0 · edbeeching/godot_rl_agents

What's Changed Fix rllib by @edbeeching in #140 Adds nparallel multi-process support to CleanRL example by @Ivan-267 in #145 Update ADV_STABLE_BASELINES_3.md by @Ivan-267 in #146 Require sb3 v...

github.com

237

Edward Beeching · Feb 2, 2024 · 3:15 PM UTC

Edward Beeching @edwardbeeching

2 Feb 2024

This release adds a long requested feature: grid sensors. Set sail on high seas with a pirate example demonstrating the new sensor.

185

Edward Beeching · Feb 2, 2024 · 3:15 PM UTC

Edward Beeching @edwardbeeching

2 Feb 2024

The next version of the library will add a long-requested feature. Imitation learning support, which should allow the learning of far more complex behaviors!

226

Edward Beeching · Jan 18, 2024 · 3:08 PM UTC

Edward Beeching @edwardbeeching

18 Jan 2024

While the observations about each algorithm remain the same with OpenHermes, that is the best algorithm is DPO > KTO > IPO, the sweet spot for beta varies wildly with each algorithm. With the best choice of beta for DPO, KTO, and IPO being 0.6, 0.3 and 0.01, respectively.

494

Edward Beeching · Jul 24, 2020 · 6:24 PM UTC

Edward Beeching @edwardbeeching

24 Jul 2020

Replying to @chriswolfvision

I have a UDP joke but maybe you won't get it.

Edward Beeching · Feb 21, 2024 · 8:05 AM UTC

Edward Beeching @edwardbeeching

21 Feb 2024

Replying to @AgentMulcahy @KayLousberg

The observation is a small 12x12 cone of raycasts, a normalized vector pointing to the goal and a 4D 1-hot indicating which of the 4 levels the agent is on. Reward is the improvement in best distance to goal.

Edward Beeching · Mar 5, 2024 · 10:08 AM UTC

Edward Beeching @edwardbeeching

5 Mar 2024

Replying to @_lewtun

That's easy for you to say. I had to review it!

110

Edward Beeching · Dec 1, 2022 · 7:41 PM UTC

Edward Beeching @edwardbeeching

1 Dec 2022

Sample Factory achieves high throughput, training at hundreds of thousands of interactions per second 🔥 It includes a number of advanced features: 🟢 Multi-agent training 🟢 Self-play 🟢 Multi-GPU population-based training 🟢 Support for vectorized and GPU accelerated envs

Edward Beeching · Dec 15, 2022 · 7:02 PM UTC

Edward Beeching @edwardbeeching

15 Dec 2022

Replying to @itsmaksX @huggingface

It runs in the @godotengine

185

Edward Beeching · Aug 27, 2020 · 8:51 AM UTC

Edward Beeching @edwardbeeching

27 Aug 2020

Replying to @chriswolfvision

For those who missed it. You can find our video here piped.video/watch?v=rW6kLl5Y…

Learning to Plan with Uncertain Topological Maps ECCV 2020 Spotlight

Presentation of our work on Learning to Plan with Uncertain Topolog...

youtube.com

Edward Beeching · Oct 22, 2024 · 1:21 PM UTC

Edward Beeching @edwardbeeching

22 Oct 2024

Awesome analytics, our H4 models have been downloaded over 10M times!

Lewis Tunstall

@_lewtun

22 Oct 2024

The new analytics tab in Hub orgs is very cool and we can see that the H4 models have been downloaded ~10M times, driven mostly by Zephyr / StarChat Funnily enough, I thought the huge spike was from Zephyr, but it's actually from StarChat ... perhaps someone accidentally put in their CI pipeline 😅

967

Edward Beeching · Feb 6, 2025 · 7:47 AM UTC

Edward Beeching @edwardbeeching

6 Feb 2025

Replying to @Dongwei__Jiang @huggingface

Great work and thanks for sharing the recipe. I will review the PR now :)

258

Edward Beeching · Jan 19, 2024 · 2:04 PM UTC

Edward Beeching @edwardbeeching

19 Jan 2024

I totally agree, running the experiments for the post left me with more questions than answers. I think we may have a more extensive follow-up where we evaluate on some other benchmarks such as Alpaca eval.