Sr. Director of AI model post-training @NVIDIA

in the cloud
Replying to @kuchaev
Our post-training pipeline is a substantial redesign from Super. The core idea: don't rely on stacked RL stages alone. We do SFT, multi-environment RLVR across a huge mix of agentic/reasoning/code/safety environments, then Multi-teacher On-Policy Distillation (MOPD). 10+ domain-specialized teachers, merged into the student via dense token-level guidance on its own rollouts. See Figures below for overview and tech report for all the details. 2/4
7
38
279
105,996
We are excited to release Llama-Nemotron-Ultra! This is a reasoning ON/OFF, dense 253B model. Open weights and post-training data. huggingface.co/nvidia/Llama-… We started with llama-405B, changed it via NAS pruning then followed by reasoning-focused post-training: SFT + RL in FP8.
24
123
701
166,499
Today we are happy to release best open models for synthetic data generation. 340B parameters, includes base, instruct and reward models. As well as new human preference dataset HelpSteer2. 340B-Reward model is #1 on the Reward Bench leaderboard. blogs.nvidia.com/blog/nemotr…
16
77
440
242,091
NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try github.com/NVIDIA/NeMo-RL
5
64
392
25,041
Llama-Nemotron-v1 technical report is now available on arxiv arxiv.org/pdf/2505.00949v1
3
63
343
28,816
We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data. This model also supports runtime "thinking" budget control. HF collection with base and post trained models: huggingface.co/collections/n…
9
62
298
65,402
In LLM pre training, curating and preparing data is perhaps the most impactful step. NeMo data curator is now open source with lots of features you will need. We used it to curate trillions of tokens for our own models training. github.com/NVIDIA/NeMo-Curat…
3
41
222
19,101
Post-training of LLMs is increasingly important and RLHF remains a necessary step for an overall great model. Today we are releasing 6 new reward models, including GenRMs and multilingual. These models are used to post-train next *-nemotron models. huggingface.co/collections/n…
3
44
203
13,358
If you are a researcher working on LLM post-training, RL and reasoning, you should really give NeMo-RL a try. Works with hugginface and megatron-core (when you need scale). Here is great blogpost by @AlexanderBukha1 and team on how to get started: nvidia-nemo.github.io/blog/2…
26
200
17,246
We are excited to release new Llama-Nemotron models. These models allow you to set reasoning ON/OFF during runtime. We also release all the post-training data under CC-BY-4! Try it now on build.nvidia.com/nvidia/llam… HF collection: huggingface.co/collections/n…
8
39
192
51,783
Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-…
8
43
184
42,622
✈️to COLM2025. And I am looking for exceptional RL and post-training engineers who are excited to push frontiers of open-source post-training and open models such as Nemotron. • At the conference? Message me on Whova. • Not attending? DMs are open. Send your CV & a short note.
2
10
190
62,215
We just updated Llama-Nemotron post training dataset with additional 2.2M math and 500K code reasoning examples used in Llama-Nemotron-Ultra training huggingface.co/datasets/nvid…
What's cool about @nvidia is that in addition to models, they release tons of cool datasets! Why are the other big tech not doing that too? huggingface.co/nvidia
3
31
157
18,158
New paper from our team. An inference-time scaling approach which can boost non-math benchmarks such as Arena-Hard of existing models. We get Arena-Hard of 92.7 for 70B model. As of 5 Mar 2025, surpassing o1-preview-2024-09- 12 (90.4) and DS-R1 (92.3). arxiv.org/pdf/2503.04378
1
19
130
23,095
Llama-Nemotron-Super-V1.5 got AA intelligence index of 64. This is more than previous Ultra (61) model and is by far the "smartest" open-weights dense model. The best model for deployment on a single H100. Head to @ArtificialAnlys for detailed analysis artificialanalysis.ai/models…
2
16
134
5,663
NeMo speech recognition models published on @huggingface hub are now at the top of HF Speech Bench for all languages where we published the models so far: English, German, French and Chinese. Often beating other models without external LMs and with fewer parameters. #DeepLearning
2
25
121
New reasoning Nemotron-H models are now publicly available. These models are based on hybrid architecture! 47B and 8B in BF16 and FP8. Blogpost: developer.nvidia.com/blog/ne… Weights: huggingface.co/collections/n…
Transformers are still dominating the LLM scene but we show that higher throughput alternatives exist which are just as strong! Grateful to have a part in Nemotron-H Reasoning effort. 🙏 Technical report will be out soon, stay tuned!
1
25
122
23,844
New VLM training data drop!
We just released 3 million samples of high quality vision language model training dataset for use cases such as: 📄 optical character recognition (OCR) 📊 visual question answering (VQA) 📝 captioning 🤗 Learn more: nvda.ws/4oyfevu 📥 Download: nvda.ws/4fz2gtB
3
11
116
5,481
Replying to @teortaxesTex
We used R1 as teacher for lots of things (see diagram). But to push scientific reasoning (GPQA) beyond R1's number (71.5=>76) it took a big reasoning RL (GRPO) run in FP8.
4
4
118
4,178
Llama-3.1-Nemotron-70B-Instruct model aligned by our team is now live on lmarena.ai leaderboard with overall rank 9. Everything used to create this model is public: code, data and reward model. HF checkpoint: huggingface.co/nvidia/Llama-…
6
19
94
34,657
Our team just released a new dataset for LLM model alignment, called HelpSteer huggingface.co/datasets/nvid… on @huggingface Hub under CC-BY-4 license! This one should be used for reward model training, especially with SteerLM method.
3
27
92
22,358
Our team is happy to share SteerLM, a simpler alternative to RLHF which allows dynamic model controls during inference (humor, verbosity, etc.). To appear in Findings of EMNLP 2023. It is implemented in NeMo (open-source) and example model is on HF arxiv.org/abs/2310.05344
2
21
92
17,435
Llama-3.3-Nemotron-Super-49B-v1 is on LMArena leaderboard. Head to the huggingface.co/collections/n… for entire post-training data and model weights! Or try it now from the browser on build.nvidia.com
New on LMArena: @Nvidia's Llama-3.3-Nemotron-Super-49B-v1 lands at #14! A powerful open reasoning model—top-15 overall, excelling in math, with an openly released 15M post-training dataset. Congrats to the @NvidiaAI Nemo team for this fantastic contribution to the open community!
2
11
71
10,659
Our team put together a unified mathematical framework to analyze popular model alignment algorithms. “Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment” arxiv.org/pdf/2502.00203
2
17
70
6,585
Very happy to share our latest NeMo release. We re-designed NeMo to work with @PyTorchLightnin and @Hydra_Framework projects from @PyTorch ecosystem. Train your own #ASR, #NLP and #TTS models or re-use one of the many pre-trained models we have.
NeMo, @NVIDIA’s open-source toolkit based on #PyTorch, allows you to quickly build, train, and fine-tune conversational AI models. See how speech recognition, natural language processing and speech synthesis can be improved in this tutorial: bit.ly/nvidia-nemo-introduct…
1
20
56
Thank you @GavinNewsom for veto on SB 1047. @WSJ , no most researchers did not support that bill, only a small minority of them did. Application layer is a place for AI regulation, not fundamental model development.
9
56
5,253
You can try Nemotron-4-340B-Instruct model here build.nvidia.com/nvidia/nemo…
1
9
52
8,337
Replying to @agihippo
I guess you missed the part where we used 1000x less human data (10K vs 10M) for alignment than llama3. this is about synthetic data generation, literally says so in the blogpost. Also we released reward model and training data for it, all under commercial friendly license.
1
1
47
2,241
Replying to @ClementDelangue
One exception - NVIDIA. All you need to do is get your manager's OK to publish and in all my time here I've never seen that denied or even delayed.
2
2
52
2,399
Replying to @razomforukraine
@POTUS and @WhiteHouse you must act on this
5
43
839
Replying to @srush_nlp
Maybe it is because de-noising objective is "wasting" tokens compared to autoregressive models. E.g. when you mask 15% of tokens, then after 1 epoch you've backpropogated loss from only 15% of your tokens, compared to 100% in next token prediction loss.
2
3
50
6,238
Nemotron-4-340B-*Reward* model is now available via API on build.nvidia.com/nvidia/nemo… :) Give it a try.
1
12
37
3,098
HelpSteer3 data is public now! 40K prompts with 2 responses each. It has ratings, justifications for them, feedback on responses, and edited responses. CC-BY-4.0 and collected from professional human raters. Using this data we can get 93.4 on Arena-Hard. huggingface.co/collections/n…
6
33
2,677
I don’t understand why people think AI will make coding, SW jobs or CS obsolete. Instead an order of magnitude more people will do it and will do it much earlier and be 100x more productive.
2
1
33
1,682
New post-training data drop!
Open Code Reasoning is our latest dataset to train SOTA code reasoning capabilities in all model sizes ! With it, even 7B Qwen can reach 51% on LiveCodeBench, 32B hits 61% with just SFT alone ! Model release soon, paper and dataset are out !
1
2
29
3,500
Reward modeling is a key step of AI development. Our team just released new SOTA RM on build.nvidia.com/nvidia/llam…. We also release model’s weights on @huggingface and updated (with new dimensions) version of HelpSteer 2 dataset (ccby4) used to train it. huggingface.co/nvidia/Llama-…
7
28
5,965
NeMo now has Ukrainian speech recognition model on @huggingface hub. This is a CitriNet model tuned by our intern working from Kyiv huggingface.co/nvidia/stt_uk… As of today, I think, this is the best Ukrainian ASR model freely available.
4
5
25
Check out llama2-70B-SteerLM model which gets 7.54 on MT-bench. This model is NOT using outputs of stronger (ChatGPT) models during alignment which allowed us to keep llama2 license. Try now on NGC Catalog catalog.ngc.nvidia.com/orgs/… . Also on @huggingface hub huggingface.co/nvidia/Llama2…
1
5
26
2,597
Happy to help do our part towards open AI 🙂
Fun visualization by @aiworld_eu! @nvidia added 365 public model/dataset/apps on @huggingface in the past 12 months (one a day!) aiworld.eu/story/nvidia-domi…
1
1
25
1,085
New Nemotrons - mamba-transformer hybrid base models are now on @huggingface Hub!
Nemotron-H base models (8B/47B/56B): A family of Hybrid Mamba-Transformer LLMs are now available on HuggingFace: huggingface.co/nvidia/Nemotr… huggingface.co/nvidia/Nemotr… huggingface.co/nvidia/Nemotr… Technical Report: arxiv.org/abs/2504.03624 Blog: research.nvidia.com/labs/adl…
2
6
25
1,418
✈️ to ICML workshops to talk about the first open-weight model that outsmarted original DS-R1 on AA index. Happy to chat all things post-training and AI in general. (The poster is EXAIT workshop this Saturday)
2
25
1,732
Replying to @teortaxesTex
Meta is likely making a mistake doubling down on imitation learning (labeled data) in the era of exploration learning.
3
2
21
1,343
AI model post training is rapidly improving. The plot below (starting from the same base model) illustrates about 10 months of progress in the *open* post-training research. I’m not convinced that closed research can move as fast.
1
4
22
1,477
Oh wow! Look what model is #1 on @huggingface speech bench for English speech recognition
The largest NeMo ASR model is finally public on @huggingface ! This is a 600 M params Conformer Transducer X-Large, probably the largest public checkpoint trained with multiple datasets. huggingface.co/nvidia/stt_en…
3
22
Try it now on build.nvidia.com
🎊 Llama Nemotron Ultra 253B is here 🎊 ✅ 4x higher inference throughput over DeepSeek R1 671B 🏆Highest accuracy on reasoning benchmarks: 💎 GPQA-Diamond for advanced scientific reasoning 💎 AIME 2024/25 for complex math 💎 LiveCodeBench for code generation and completion Try as #NVIDIANIM ➡️ build.nvidia.com/nvidia/llam… Technical deep dive ➡️ developer.nvidia.com/blog/bu…
1
1
21
3,179
A paper arxiv.org/pdf/1910.10261.pdf about our latest speech recognition model - QuartzNet has been accepted to #ICASSP 2020. Head over to github.com/NVIDIA/NeMo for implementation and pretrained models. #DeepLearning #asr
5
20
Replying to @paulg
I grew up near Chernobyl and have always belived that its most devastating impact is a subsequent push back against nuclear power. Interesting fact - the Chernobyl station kept working after disaster (1986) until 1999 when it was fully shut down under the Western pressure.
1
19
565
Do you want to work on LLM and DLM model post-training with us? @JiantaoJ is hiring! nvidia.wd5.myworkdayjobs.com…
6
21
2,681
Another way to put it: we used 1000x less (10K vs 10M) human data for alignment than llama3 by using synthetic data. This release is about synthetic data generation which is why our license explicitly allows it.
20
371
With a license that permits synthetic data generation and commercial use.
1
1
19
1,218
Replying to @fchollet
people claiming that some LLM is at "high schooler" or even "kindergartner" level should spend more time around kids. No AI system today is near the level of even 3 year old when it comes to general intelligence. Moreover, the progress towards that level is unclear.
17
2,533
Post training is an exciting area with a lot of gains to be had. This model is built starting from llama-3.3-70B-instruct, AA index of 41. So +23 points from redoing post training stage.
2
18
953
@GavinNewsom , respectfully asking you to veto SB1047. As written it will hurt AI innovation in California.
1
16
2,071
Just finished listening to "Viral" audiobook by @Ayjchan and @mattwridley. This is an excellent account of the most likely origins of COVID-19 pandemic. A must read for anyone (which really should be everyone) interested in the the origins of #COVID19 audible.com/pd/Viral-Audiobo…
2
14
Replying to @mattrickard
you missed NeMo models huggingface.co/models?sort=d… (1, 5 and 20B GPTs with commercialy friendly license)
1
2
15
2,519
Replying to @natolambert
have you tried recent FSD though? I ‘ve tried it a year ago and thought only a fool would pay for that. But I find the recent version is so good that I am paying now and don’t want to turn it off. My driving is mostly suburban or longer trips.
1
16
1,039
Two very real steps anyone in the world can help: 1) Consider donating to humanitarian relief efforts, such as @razomforukraine and there are many others. If your company has a match - please make sure you make use of that. #UkraineRussiaWar
1
6
15
865
If you are at #ICLR and looking for applied research roles in model post-training, reasoning and model alignment - message me and I’ll be happy to chat.
1
2
14
3,404
Replying to @geoff_l
generating synthetic data for alignment of smaller models is key use case we have in mind.
1
13
1,432
Yes, PPO is much more expensive then DPO in terms of infra but in all our experiments so far, on the same data (and no online setting) PPO>DPO on MT-bench.
3
14
635
#ChatGPT is very impressive! In the dialogue below, it comes up with a suboptimal solution, and argues a little without admitting a mistake. Then takes a hint, admits the mistake, and fixes its solution!
1
4
14
A great video by @ctnzr on what Nemotron is and why it is open piped.video/_y9SEtn1lU8?si=65jm… !
3
11
1,449
This is currently the best 8B instruct model on most benchmarks
Excited to introduce MN-Minitron-8B-Instruct 📗! We've developed an even more powerful instruct model than its parent, Mistral-NeMo-12B, with significant improvements over LLaMa3.1-8B-Instruct as well! Weights on HF: huggingface.co/nvidia/Mistra… Demo: build.nvidia.com/nvidia/mist… Our new model outperforms LLaMa3.1-8B-Instruct on key benchmarks, including: 🧮 Math reasoning 🔧 Function calling 🧑‍🏫 Instruction following Additionally, our model improves 7 out of 8 metrics of the parent 12B. This model is a result of combining pruning and distillation, reducing the original Mistral-NeMo-12B-Base model to an efficient 8B, followed by alignment with NeMo Aligner. Thanks to the community for support that encourages us to release more models! 💡Useful links: NeMo-Aligner: github.com/NVIDIA/NeMo-Align… Minitron paper: arxiv.org/abs/2408.11796
3
13
957
Replying to @arthurmensch
Very impressive!
5
407
this is an excellent work from NVIDIA to speed convergence speed of transformers with algorithmic modifications
Normalized Transformer - tricks to keep the activations constrained, improves training convergence; from NVIDIA Was pointed to this paper by lucidrains arxiv.org/abs/2410.01131
1
11
1,017
If someone approaches you to talk about "agents", always ask them for their definition of what it is. Often a good signal on whether to continue or avoid the conversation.
12
718
New Reward modeling research and models from our team! 1. "RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards" 2. "Think Twice: Branch-and-Rethink Reasoning Reward Model" As usual, models are on @huggingface Hub. Links in the reply.
1
1
12
978
This holiday seasons please consider supporting Ukraine which is being attacked by the terrorist federation. Today 76 rockets were fired at critical civilian infrastructure with typical terrorist intents of generating fear and humanitarian disaster.
3
3
10
1,399
if you are at #COLM2024, checkout our work - NeMo Aligner. Poster #3 during morning session today.
4
11
987
Replying to @natolambert
Not having value function is a huge advantage of GRPO and REINFORCE in the context of LLMs. This is because value function (critique model) is supposed to assign values to *partial* generations which is fundamentally hard.
10
1,016
looking forward to interesting conference! DM me if want to chat.
1
1
11
5,646
This is an excellent paper, we found very useful for those working on alignment / post training. Congrats on NeurIPS acceptance!
3
10
437
Replying to @janleike
disagree. @GavinNewsom should veto this bill as it focuses on regulating model development rather than their applications.
1
8
340
I’ll be in Singapore attending ICLR2025. Looking forward to chatting in person about model post-training, alignment and reasoning! ✈️🇸🇬
1
10
501
Replying to @Teknium @teknium
This is a "runtime" feature. We started with same approach as Qwen3 but noticed that the model starts "thinking" outside of the thinking trace of forced to answer. Training on truncated thinking traces fixed that. Section 3.4 research.nvidia.com/labs/adl…
1
2
9
1,522
This is how runtime reasoning budget model control works for 9B. You can limit "thinking" token budget, forcing model to produce an answer. We also made the model to not think outside of the thinking trace in such cases.
1
2
9
1,218
Replying to @karpathy
It seems like the recipe to achieving superhuman performance in domain X is now well known: have perfect reward model in X (e.g. game rules, physics) + Good transformer-based heuristic + graph reasoning (MCTS/A*, etc.). Perfect reward model for LLMs would indeed be a game changer
7
477
very impressive model! congratulations.
4
239
Replying to @garrytan
I wish more people move to AIME2025 in 2025.
1
8
1,251
Replying to @natolambert
I wonder what is your take on this
3
2
1,031
If you are at #ICLR25 stop by our poster #239 04/25 at poster session 4 (3:00pm-5:30pm): "HelpSteer2-Preference: Complementing Ratings with Preferences". Will be happy to chat about data collection and RLHF. P.S. HelpSteer3 is already available: huggingface.co/datasets/nvid…
8
458
We have an exciting new job opportunity for NLP researcher on our team. Please check job description and apply here if interested nvidia.wd5.myworkdayjobs.com… #NLProc #NLP #OpenSource
5
7
OSS 💪
Latest open artifacts (#14): NVIDIA's rise, "Swiss & UAE DeepSeek," and a resurgence of open data While Qwen takes some rest, others continue to fuel the open model space. interconnects.ai/p/latest-op… 43 of the best models/datasets from 27 different organizations. Featuring: NVIDIA (@nvidia) x6 Swiss National Supercomputing Centre (@cscsch) Ant Group (@AntGroup) x2 Hugging Face (@huggingface) x2 ByteDance (@BytedanceTalk) x2 DeepSeek (@deepseek_ai) Meituan (@Meituan_LongCat) Moonshot AI (@Kimi_Moonshot) Baidu (@Baidu_Inc) Cohere (@Cohere_Labs) x2 OpenBMB (@OpenBMB) x2 Tilde (@tilderesearch) Liquid AI (@liquidai) Meta (@Meta) Alibaba AIDC (@AI_AlibabaInt) Baichuan AI (@BaichuanAI) Allen AI (@allen_ai) x2 Tencent (@TencentGlobal) x3 Microsoft (@Microsoft) x2 LLM360 (@llm360) Jan (@jandotai) Google (@Google) x2 IBM (@IBM) x2 JHU CLSP (@jhuclsp) Qwen (@Alibaba_Qwen) Motif Technologies Skywork (@Skywork_ai)
8
1,434
Our team is hiring Sr. Applied Scientists to work on AI model alignment and customization (text and multimodal). If you have strong track record and experience with: LLMs or RL or multi-modal, please apply. Can be in-person or remote. #NLP #hiring nvidia.wd5.myworkdayjobs.com…
4
7
735
Our team has studied the tradeoffs between performance and the number of trainable params in LoRA. This work would be especially useful to those building and scaling AI customization services. Great work by @rendu_a and Tugrul Konuk arxiv.org/abs/2311.09578
2
8
352
Replying to @natolambert
if one has strong enough synthetic data pipeline, do they even need pre-training on Internet tokens… 🤔
2
8
3,075
Replying to @williamfalcon
You missed some of the open-source, commercially friendly (CC-BY-4) models built using lightning :) huggingface.co/models?librar…
1
8
687
Latest release of NeMo-Aligner adds TRT-LLM integration which speeds up RLHF rollouts up to 7x compared to pure Pytorch implementation github.com/NVIDIA/NeMo-Align…
7
407
Replying to @sergeykarayev
We do github.com/NVIDIA/NeMo/tree/… (much more to come in the next months)
1
7
Replying to @alexgraveley
yes, currently very limited on human preference data. You might want to add new dataset we've published this week huggingface.co/datasets/nvid… which can be used with DPO. Btw, in most of our experiments SFT < DPO < SteerLM <= PPO. So while simple, DPO lags behind PPO and SteerLM.
2
7
1,294
I was taught that trees aren't supposed to have any cycles!
7
300