A recipe for frontier model post-training
Apple, Meta, and Nvidia all agree — synthetic data, iterative training, human preference labels, and lots of filtering.
interconnects.ai/p/frontier-…
OpenAI's o1 using "search" was a PSYOP
How to understand OpenAI's o1 models as really just one wacky, wonderful, long chain of thought.
interconnects.ai/p/openais-o…
Synthetic data: Anthropic’s CAI, from fine-tuning to pretraining, OpenAI’s Superalignment, tips, types, and open examples
Synthetic data is the accelerator of the next phase of AI — what it is and what it means.
interconnects.ai/p/llm-synth…
Reverse engineering OpenAI’s o1
What productionizing test-time compute shows us about the future of AI. Exploration has landed in language model training.
interconnects.ai/p/reverse-e…
China's Top 19 Open Model Labs
We ranked all the organizations in China releasing open models, from the top of DeepSeek to small, newer academic labs making waves with tech reports and niche models.
interconnects.ai/p/chinas-to…
OpenAI's o3: Over-optimization is back and weirder than ever
Tools, true rewards, and a new direction for language models.
interconnects.ai/p/openais-o…
OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference
Whether or not scaling works, we should spend more on inference.
interconnects.ai/p/openai-st…
Futures of the data foundry business model
Scale AI’s future versus further scaling of language model performance. How Nvidia may take all the margins from the data market, too.
interconnects.ai/p/ai-data-f…
Model merging lessons in The Waifu Research Department
When what seems like pure LLM black magic is actually supported by the literature.
interconnects.ai/p/model-mer…
DBRX: The new best open model and Databricks’ ML strategy
Databricks’ new model is surpassing the performance of Mixtral and Llama 2 70B while still being in a size category that's reasonably accessible.
interconnects.ai/p/databrick…
RLHF roundup: Getting good at PPO, sketching RLHF’s impact, RewardBench retrospective, and a reward model competition
Things to be aware of if you work on language model fine-tuning.
interconnects.ai/p/rlhf-roun…
As Meta tries to race to build a great new research lab, we wanted to remind everyone that the organization structure is just as much of a challenge as the personnel.
Here are our takeaways from earlier in the year. Rec's first.
Interviewing Eugene Vinitsky (@EugeneVinitsky) on self-play for self-driving and what else people do with RL
#13. Reinforcement learning fundamentals and scaling.
interconnects.ai/p/interview…
RL backlog: OpenAI's many RLs, clarifying distillation, and latent reasoning
Notes I forgot to publish. Closing some loose ends in the reasoning model discussions.
interconnects.ai/p/rl-backlo…
Character training: Understanding and crafting a language model's personality
Post-training in industry is very different than the academic papers and open-source models demonstrate.
interconnects.ai/p/character…
Kimi K2 and when "DeepSeek Moments" become normal
One "DeepSeek Moment" wasn't enough for us to wake up, hopefully we don't need a third.
interconnects.ai/p/kimi-k2-a…
Grok 4: An o3 look-alike in search, high highs and new lows
An o3 class model, the possibility of progress, chatbot beige, and the illusiveness of taste.
interconnects.ai/p/grok-4-an…
OpenAI's o3: The grand finale of AI in 2024
A step change as influential as the release of GPT-4. Reasoning language models are the current and next big thing.
interconnects.ai/p/openais-o…
OLMoE and the hidden simplicity in training better foundation models
Ai2 released OLMoE, which is probably our “best” model yet relative to its peers, but not much has changed in the process.
interconnects.ai/p/olmoe-and…
Where inference-time scaling pushes the market for AI companies
Fundamentals emerging downstream from the RL reasoning models.
interconnects.ai/p/where-inf…
State-space LLMs: Do we need Attention?
Mamba, StripedHyena, Based, research overload, and the exciting future of many LLM architectures all at once.
interconnects.ai/p/llms-beyo…
Latest open artifacts (#12): Chinese models continue to dominate throughout the summer 🦦
A new flagship Qwen model, Qwen3-235B-A22B-Instruct-2507, and a general rise in ecosystem quality in Artifacts Log 12.
interconnects.ai/p/latest-op…
DeepSeek V3 and the actual cost of training frontier AI models
The $5M figure for the last training run should not be your basis for how much frontier AI models cost.
interconnects.ai/p/deepseek-…
RLHF progress: Scaling DPO to 70B, DPO vs PPO update, Tülu 2, Zephyr-β, meaningful evaluation, data contamination
Huge steps forward in confirming that RLHF can really help you on vibes based evaluation, among many other RLHF analyses.
interconnects.ai/p/rlhf-prog…
The White House's plan for open models & AI research in the U.S.
Thoughts on the new AI Action plan, American DeepSeek, and what comes next.
interconnects.ai/p/the-white…
Artifacts Log 3: Synthetic math and Magpie datasets, another 1T param model, and many Mistral models
Artifacts ~124 and on for the year.
(partial $) interconnects.ai/p/artifacts…
Big Tech's LLM evals are just marketing
A PSA everyone needs. The importance of a wait and see attitude when it comes to new models, big and small, open and closed.
interconnects.ai/p/evals-are…
RLHF lit. review #1 and missing pieces in RLHF:
Looking at the difference between two sets -- what rumors say industry leaders are doing with RLHF and what the literature is up to. A new series studying RLHF literature.
interconnects.ai/p/rlhf-lit-…
What I'm reading (#2): More on Kimi K2, how to build a bad research center, Pretraining with RL, and sporks of AGI
A quiet summer is all you need.
interconnects.ai/p/what-im-r…
GPT-4.5: "Not a frontier model"?
OpenAI's latest model raises more questions than answers, but no, the AI bubble isn't popping quite yet.
interconnects.ai/p/gpt-45-no…
Sycophancy and the art of the model
GPT-4o-simp, LMArena backlash, and people refusing to understand how messy and crucial RLHF is.
interconnects.ai/p/sycophanc…
Interviewing Tri Dao and Michael Poli of Together AI on the future of LLM architectures
The first Interconnects research interview! We go even further on the promise of state-space models in the emerging LLM market.
interconnects.ai/p/interview…
Managing frontier model training organizations (or teams)
How do the frontier labs consistently train great models? How can they fail?
interconnects.ai/p/how-to-ma…
The latest open artifacts (#8): The return of ~30B models, side effects of OpenAI's proposed DeepSeek ban, and yet another reasoning roundup
Artifacts Log 8. Expect this pace to continue until mid summer.
interconnects.ai/p/the-lates…
Latest open artifacts (Artifacts Log #10): New DeepSeek R1 0528!, more permissive licenses, everything as a reasoner, and from artifacts to agents
interconnects.ai/p/latest-op…
Deep Research, information vs. insight, and the nature of science
What AI will accelerate in the scientific process, what it cannot do, and how we can prepare for new manners of scientific investigation.
interconnects.ai/p/deep-rese…
Undoing RLHF and the brittleness of safe LLMs
Recent papers show most of the arguments about needing "safety" in releases of open LLM weights are nearly dead in the water. Yes, still release the parameters.
Read here: interconnects.ai/p/undoing-r…
Interviewing Tim Dettmers (@Tim_Dettmers) on open-source AI: Agents, scaling, quantization and what's next
Interconnects interview #10. Catching up with one of the leaders of open-source AI.
interconnects.ai/p/tim-dettm…
Artifacts Log 4: Reflection 70B, o1 on LMSYS, fine-tuning fine-tunes, and speech models
The latest open models and datasets.
interconnects.ai/p/artifacts…
Latest open artifacts (#13): The abundance era of open models
Mostly thanks to Qwen, but now we're spoiled for choice and winds are shifting.
interconnects.ai/p/latest-op…
ChatBotArena: The peoples’ LLM evaluation, the future of evaluation, the incentives of evaluation, and gpt2chatbot
What the details tell us about the most in-vogue LLM evaluation tool — and the rest of the field.
interconnects.ai/p/chatbotar…
OpenAI’s Model (behavior) Spec, RLHF transparency, personalization questions
Now we will have some grounding for when weird ChatGPT behaviors are intended or side-effects — shrinking the Overton window of RLHF bugs.
interconnects.ai/p/openai-rl…
Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions
A sampling of recent happenings in the multimodal space. Be sure to expect more this year.
interconnects.ai/p/multimoda…
Why I build open language models
Reflections after a year at the Allen Institute for AI and on the battlefields of open-source AI.
interconnects.ai/p/why-i-bui…
Elicitation, the simplest way to understand post-training
An F1 analogy to help understand fast improvements in post-training on top of slow improvements in scaling.
buff.ly/XmEa3XN
Grok 3 and an accelerating AI roadmap
Where AI is heading, why 2024 felt slow, and shifting priorities of frontier laboratories.
interconnects.ai/p/grok-3-an…
People use AI more than you think
And businesses too. The most important trend in AI that gets washed away from between the headlines.
interconnects.ai/p/people-us…
What people get wrong about the leading Chinese open models: Adoption and censorship
Narrative violations on licenses, adoption, and censorship.
interconnects.ai/p/what-peop…
Phi 3 and Arctic: Outlier LMs are hints
Models that seem totally out of scope from recent open LLMs give us a sneak peek of where the industry will be in 6 to 18 months.
interconnects.ai/p/phi-3-and…
Model commoditization and product moats
Where moats are tested now that so many people have trained GPT4 class models. Claude 3, Gemini 1.5, Inflection 2.5, and Mistral Large are here to party.
interconnects.ai/p/gpt4-comm…
The latest open artifacts (#9): RLHF book draft, where the open reasoning race is going, and unsung heroes of open LM work
Artifacts Log 9.
interconnects.ai/p/the-lates…
Evaluations: Trust, performance, and price (bonus, announcing RewardBench)
Evaluation is not only getting harder with modern LLMs getting more complicated, it’s getting harder because it means something different.
interconnects.ai/p/evaluatio…
The DPO debate: Do we need RL for RLHF?
Direct vs. RL methods for preferences, more RLHF models, and hard truths in open RLHF work. We have more questions than answers.
interconnects.ai/p/the-dpo-d…
We aren’t running out of training data, we are running out of open training data
Data licensing deals, scaling, human inputs, and repeating trends in open vs. closed LLMs.
interconnects.ai/p/the-data-…
AI for the rest of us
Apple Intelligence makes a lot of sense when you get out of the AI bubble. Plus, the cool technical details Apple shared about their language models "thinking different."
interconnects.ai/p/apple-int…
OpenAI's GPT-4.1 and separating the API from ChatGPT
OpenAI's latest models optimizing on intelligence per dollar. We'll continue to see ChatGPT handled differently than the API business.
interconnects.ai/p/openais-g…
The latest open artifacts (#6): Reasoning models, China's lead in open-source, and a growing multimodal space
Artifacts Log 6.
The open LM ecosystem yet again accelerates. interconnects.ai/p/open-arti…
How scaling changes model behavior
Some trends are reasonable to extrapolate, some are not. Even for the trends we are succeeding at extrapolating, it is not clear how that signal translates into different AI behaviors.
interconnects.ai/p/how-scali…
Making the U.S. the home for open-source AI
Open-source AI is here to stay, but it is not a given that it will be American.
interconnects.ai/p/making-th…
Mixtral Round-up: MoE trade-offs, release lessons, Mistral raises $400mil, Google's loss, vibes vs marketing
Emergency blog 🚨 We have an amazing open mixture of experts model for the holidays!
interconnects.ai/p/mixtral
We don’t need to reinvent everything to solve alignment
Integrating some non-computing science into reinforcement learning from human feedback (RLHF) can give us the models we want. Bonus: OLMo 1.7-7B.
interconnects.ai/p/reinventi…
It's 2024 and they just want to learn
The state of the ML communities big and small starting 2024. My general expectations for the year.
interconnects.ai/p/they-want…
Latest open artifacts (#11): Visualizing China's open models market share, Arcee's models, and VLAs for robotics
Artifacts Log 11.
interconnects.ai/p/latest-op…
Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning
Interconnects interview #11. An overview on the past, present, and future of RL.
interconnects.ai/p/finbarr-t…
Latest open artifacts (#13): The abundance era of open models
Mostly thanks to Qwen, but now we're spoiled for choice and winds are shifting.
interconnects.ai/p/latest-op…
The AI research job market shit show (and my experience)
There are plenty of jobs, but finding a place where you're happy is as hard as ever.
Read here: interconnects.ai/p/ai-resear…
SB 1047, AI regulation, and unlikely allies for open models
The rallying of the open-source community against CA SB 1047 can represent a turning point for AI regulation.
interconnects.ai/p/sb-1047-a…