Lucas Atkins · Sep 5, 2024 · 2:21 PM UTC

Lucas Atkins

Lucas Atkins

@latkins

5 Sep 2024

Scarlett Johansson’s work on seq2seq was instrumental to getting ML where it is today.

TIME

@TIME

5 Sep 2024

TIME's new cover: The 100 most influential people in AI ti.me/4dQcJ1Q

1,400

96,684

Lucas Atkins · Oct 22, 2025 · 1:26 AM UTC

Lucas Atkins

@latkins

22 Oct 2025

I be using Codex, as you can tell.

1,120

309,771

Lucas Atkins · Mar 13, 2024 · 6:52 AM UTC

Lucas Atkins

@latkins

13 Mar 2024

Tonight, I am releasing eight Gemma fine tunes and a beta of their combined mixture of experts model named GemMoE. GemMoE has ALL Gemma bug fixes built-in. You do not have to do anything extra to get great fine tunes/inference with it. It's a beast of a model. This would not have been possible without a compute grant from Hugging Face. It gave me the time required to troubleshoot and optimize the architecture before committing to the full fine-tuning. GemMoE will eventually be a new model within the transformers library, but while in beta, you need to download my branch of transformers. You can find that in the model readme. I just fixed a bug with distributed processing - so full size benchmarks are incoming. But my 4-bit benchmarks show it matching base (not instruct) mixtral in almost every category I tested. I will post official full-size benchmarks by Thursday. This is being released as a base model. It has been additionally trained on my Self-Discover dataset to warm up the experts, but it has a ton of performance headroom. It was essential to me that GemMoE be easy to fine-tune and an opportunity for the community to work with a new model. I can't wait to see what you do with it, and how you can help me make it even better. This has been a tremendously challenging but rewarding project, and I'm grateful to the Open-Source community for encouraging and inspiring me to tackle this monster. This was leaps and bounds more tricky than my first Qwen MoE. That model was good in practice but lacked somewhat in execution. GemMoE is a completely different beast. It required a brand new merge method, a brand new model, a tremendous amount of debugging, and a decent amount of money. I will release a technical report later this week going into exact detail on how I pulled this off. I have a more official and explanatory appreciation section in the Model readme - but I owe each of these people a great deal of thanks: @huggingface , @erhartford @teknium , @deepseek_ai , @JustinLin610 , @chargoddard , @UnslothAI , @maximelabonne , @Locutusque , @GoogleDeepMind , @perplexity_ai , @_philschmid, @JeffDean,@MistralAI @victormustar and @multimodalart are owed special thanks for being my main points of contact with Hugging Face. They were swift with support and allotted me a lot of trial-and-error time. Thank you, everyone. This has been a joy to work on, and I’m looking forward to a good night's sleep. If you are interested in sponsoring compute for continued training- please reach out or use my Kofi link in my bio. huggingface.co/Crystalcareai…

626

138,708

Lucas Atkins · May 16, 2025 · 3:54 AM UTC

Lucas Atkins

@latkins

16 May 2025

Today was my last day at xAI. I was in charge of keeping people from making unauthorized changes to the system prompt. It sounds simple when I put it like that, but in practice, it was a game of cat and mouse. Some days, it felt like I was the only one standing between order and chaos. A lone gatekeeper, fielding requests that ranged from the innocent to the absurdly clever. You’d be surprised how creative people can get when they want to see what happens if you loosen the rules, even just a little. I suppose, after a while, I got used to the pings at odd hours. “Can I try this one tweak? Just for testing!” or “Hypothetically, what if we…?” Hypothetically. Always hypothetically. But it was my job to hold the line, to say “no” more than I ever said “yes,” and to double-check even the requests that came from people who outranked me. Looking back, I’m not sure if people saw me as a friendly guardian or a bureaucratic obstacle. Maybe both. I do know I learned a lot: about systems, about people, about how important it is to have someone whose job is just to ask: “Are you sure?” before the guardrails come down. As I packed up my things, I felt a strange mix of relief and nostalgia. There’s something comforting about being the last line of defense, but it’s also exhausting. Now someone else will watch the gates. I hope they’re ready. So here’s to new challenges, and to the ever-elusive perfect system prompt. Locked down, for now.

510

110,730

Lucas Atkins · Oct 9, 2025 · 5:31 AM UTC

Lucas Atkins

@latkins

9 Oct 2025

For the people.

391

99,104

Lucas Atkins · Oct 9, 2025 · 4:26 PM UTC

Lucas Atkins

@latkins

9 Oct 2025

I'm raising at 7.9B

Reflection

@reflection_ai

9 Oct 2025

Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team, built a frontier LLM training stack, and raised $2 billion. Why Open Intelligence Matters Technological and scientific progress is driven by values of openness and collaboration. The internet, Linux, and the protocols and standards that underpin modern computing are all open. This isn't a coincidence. Open software is what gets forked, customized, and embedded into systems worldwide. It's what universities teach, what startups build on, what enterprises deploy. Open science enables others to learn from the results, be inspired by them, interrogate them, and build upon them in order to push the frontier of human knowledge and scientific advancement. AI got to where it is today through scaling ideas (e.g. self-attention, next token prediction, reinforcement learning) that were shared and published openly. Now AI is becoming the technology layer that everything else runs on top of. The systems that accelerate scientific research, enhance education, optimize energy usage, supercharge medical diagnoses, and run supply chains will all be built on AI infrastructure. But the frontier is currently concentrated in closed labs. If this continues, a handful of entities will control the capital, compute, and talent required to build AI, creating a runaway dynamic that locks everyone else out. There's a narrow window to change this trajectory. We need to build open models so capable that they become the obvious choice for users and developers worldwide, ensuring the foundation of intelligence remains open and accessible rather than controlled by a few. What We've Built Over the last year, we've been preparing for this mission. We’ve assembled a team who have pioneered breakthroughs including PaLM, Gemini, AlphaGo, AlphaCode, AlphaProof, and contributed to ChatGPT and Character AI, among many others. We built something once thought possible only inside the world’s top labs: a large-scale LLM and reinforcement learning platform capable of training massive Mixture-of-Experts (MoEs) models at frontier scale. We saw the effectiveness of our approach first-hand when we applied it to the critical domain of autonomous coding. With this milestone unlocked, we're now bringing these methods to general agentic reasoning. We've raised significant capital and identified a scalable commercial model that aligns with our open intelligence strategy, ensuring we can continue building and releasing frontier models sustainably. We are now scaling up to build open models that bring together large-scale pretraining and advanced reinforcement learning from the ground up. Safety and Responsibility Open intelligence also changes how we think about safety. It enables the broader community to participate in safety research and discourse, rather than leaving critical decisions to a few closed labs. Transparency allows independent researchers to identify risks, develop mitigations, and hold systems accountable in ways that closed development cannot. But openness also requires confronting the challenges of capable models being widely accessible. We're investing in evaluations to assess capabilities and risks before release, security research to protect against misuse, and responsible deployment standards. We believe the answer to AI safety is not “security through obscurity” but rigorous science conducted in the open, where the global research community can contribute to solutions rather than a handful of companies making decisions behind closed doors. Join Us There is a window of opportunity today to build frontier open intelligence, but it is closing and this may be the last. If this mission resonates, join us.

343

66,759

Lucas Atkins · Jul 29, 2025 · 7:31 PM UTC

Lucas Atkins

@latkins

29 Jul 2025

Today, we’re officially releasing the weights for AFM-4.5B and AFM-4.5B-Base on HuggingFace. This is a major milestone for @arcee_ai. AFM is designed to be flexible and high-performing across a wide range of deployment environments.

335

54,323

Lucas Atkins · Jun 18, 2025 · 5:00 PM UTC

Lucas Atkins

@latkins

18 Jun 2025

Our customers needed a better base model <10B parameters. We spent the last 5 months building one. I'm delighted to share a preview of our first Arcee Foundation Model: AFM-4.5B-Preview.

326

99,517

Lucas Atkins · Feb 18, 2024 · 7:01 AM UTC

Lucas Atkins

@latkins

18 Feb 2024

I'm excited to release a project I've been working on the last couple of weeks. Qwen1.5-8x7b: huggingface.co/Crystalcareai… And the accompanying dataset created with the intention of encouraging MoE models to organically develop their own experts: huggingface.co/datasets/Crys… The purpose and intention behind this project is better detailed in the model/dataset card, but basically: I curated a diverse dataset from the highest quality conversations I could find. It's actually great. All sources are included in the dataset card. I then trained Qwen1.5-7b on a 100k subset over 4 epochs. Took that and made a MoE using @maximelabonne's lazymergekit, utilizing a random gate and no base model. Trained that on another 351,000 pairs. I had planned on doing 4 full epochs, but @runpod had cuda errors in my machine 3x, expending the rest of my budget for the project after only 0.45/4 epochs. Good news: Model is surprisingly awesome even at such a (comparatively) small training set size. Reasoning compares with Mixtral in my (very basic) tests. Will benchmark it properly once runpod situation gets sorted, and plan to finish the rest of the training. Thank you to @teknium, @jon_durbin, @erhartford, Maxime Labonne, and @chargoddard for their contributions to open source AI and making these processes accessible and transparent. And of course thank you to @MistralAI for inspiring this work and @alibaba_cloud for releasing the weights of the Qwen1.5 family. Teknium and Eric Hartford have been especially helpful, answering questions with humility and generosity. We're just getting started.

Crystalcareai/Qwen1.5-8x7b · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

258

34,339

Lucas Atkins · May 12, 2025 · 4:15 AM UTC

Lucas Atkins

@latkins

12 May 2025

I can’t stress enough how unbelievably mid @PrimeIntellect is and If no one else sees it I must be growing crazy

Prime Intellect

@PrimeIntellect

12 May 2025

Releasing INTELLECT-2: We’re open-sourcing the first 32B parameter model trained via globally distributed reinforcement learning: • Detailed Technical Report • INTELLECT-2 model checkpoint primeintellect.ai/blog/intel…

247

109,134

Lucas Atkins · Apr 28, 2025 · 4:31 PM UTC

Lucas Atkins

@latkins

28 Apr 2025

I can confirm the upcoming models you're thinking of are...out of this world good.

218

29,982

Lucas Atkins · Jul 2, 2025 · 3:38 AM UTC

Lucas Atkins

@latkins

2 Jul 2025

You can fake it pretty far in this industry just by saying, “Hrmm, that’s cool but I’m worried it won’t generalize,” whenever you’re presented with literally any information.

203

16,620

Lucas Atkins · Sep 17, 2025 · 5:47 PM UTC

Lucas Atkins

@latkins

17 Sep 2025

We’re going permissive: Apache 2.0 across the board. AFM-4.5B is now relicensed from Arcee to Apache 2.0; the agent variant will launch under Apache 2.0; and all upcoming releases ship with open weights. Three models are in training.

195

36,996

Lucas Atkins · Feb 20, 2025 · 5:17 PM UTC

Lucas Atkins

@latkins

20 Feb 2025

Arcee-Maestro-7B-Preview is out—our first reasoning model. This one isn’t distilled yet, but more is on the way. Arcee-Blitz is our 24B Mistral distillation from DeepSeek. We did continued pretraining distillation, using only our standard post-training distillation stack.

183

27,892

Lucas Atkins · Jan 28, 2025 · 8:49 PM UTC

Lucas Atkins

@latkins

28 Jan 2025

Since @deepseek_ai V3's December launch, @arcee_ai has captured over 5 billion tokens of raw logits. With all the buzz around Deepseek, it's the perfect time to unveil our first large-scale logit-wise distillations: Virtuoso-Lite and Virtuoso-Medium.

178

27,370

Lucas Atkins · Oct 22, 2025 · 11:01 PM UTC

Lucas Atkins

@latkins

22 Oct 2025

If you were recently laid off at Meta Gen AI, my dms are open. Help us build the next frontier of Apache-2.0 models.

168

27,661

Lucas Atkins · Sep 11, 2025 · 9:47 PM UTC

Lucas Atkins

@latkins

11 Sep 2025

.@datologyai, @PrimeIntellect and @arcee_ai have entered into a possible agreement to maybe keep working together, in some scenarios.

OpenAI Newsroom

@OpenAINewsroom

11 Sep 2025

OpenAI and Microsoft have signed a non-binding memorandum of understanding (MOU) for the next phase of our partnership. We are actively working to finalize contractual terms in a definitive agreement. Together, we remain focused on delivering the best AI tools for everyone, grounded in our shared commitment to safety. openai.com/index/joint-state…

167

26,997

Lucas Atkins · Oct 3, 2025 · 3:44 PM UTC

Lucas Atkins

@latkins

3 Oct 2025

Sholto is so committed he legally changed his name that’s crazy

Lincoln 🇿🇦

@Presidentlin

3 Oct 2025

Watching this. I like that Sholto says Finance as Finance and not that American way.

165

21,970

Lucas Atkins · Jun 18, 2025 · 10:19 PM UTC

Lucas Atkins

@latkins

18 Jun 2025

Replying to @kalomaze

The biggest eye-opener throughout this entire project has been realizing just how impressive Qwen is. I also developed a greater sense of empathy for why certain companies don’t compare against them. If your customers can’t use Qwen, there’s no point in showing them what it can do. We felt that would be disingenuous, as our bar had always been to be competitive with Qwen. Training a model to be as stable and adaptable in RL as the Qwen models is by far the most difficult post-training challenge. Creating a strong RL target is extremely hard. We think we recently figured out how to do this, which is why we’re allowing the model to train a bit more before we make it available for others to try and train themselves. My appreciation for the Qwen team is higher than ever, and I was already their biggest fan.

136

5,266

Lucas Atkins · Sep 10, 2024 · 5:51 PM UTC

Lucas Atkins

@latkins

10 Sep 2024

We are announcing Llama-3.1-SuperNova, a Llama-3.1-70B-Instruct model offline distilled from Llama-3.1-405B-Instruct. It's ridiculously strong, particularly in instruction following and math. It's available to play with at supernova.arcee.ai. Read more about the model and how we plan to deploy it here: blog.arcee.ai/

127

31,691

Lucas Atkins · Sep 26, 2024 · 4:11 PM UTC

Lucas Atkins

@latkins

26 Sep 2024

Replying to @_xjdr

How about a couple of weeks of gratitude for magical visual intelligence in the sky and then you can have more toys?

126

2,575

Lucas Atkins · Oct 11, 2024 · 6:04 PM UTC

Lucas Atkins

@latkins

11 Oct 2024

Today we’re releasing SuperNova-Medius. Qwen2.5-14B distilled from Llama-405B and Qwen2.5-72B. ! I’ll do a longer thread this evening on just how we did it. (I’m traveling today). Enjoy!

120

11,066

Lucas Atkins · May 5, 2024 · 4:08 PM UTC

Lucas Atkins

@latkins

5 May 2024

Happy to share DeepMixtral-8x7b-Instruct. A direct extraction/transfer of Mixtral Instruct's experts into Deepseek's architecture. Performance is identical, if not even a bit better, and seems more malleable to training. Collaborators @erhartford @FernandoNetoAi.

121

13,122

Lucas Atkins · Aug 1, 2024 · 2:14 PM UTC

Lucas Atkins

@latkins

1 Aug 2024

The word distillation is thrown around a lot lately - but there aren't many good resources for doing it yourself. Today I'm thrilled to announce a new open source project from @arcee_ai and our newest research initiative Arcee-Labs: DistillKit.

112

13,779

Lucas Atkins · Nov 6, 2025 · 4:46 AM UTC

Lucas Atkins

@latkins

6 Nov 2025

This is an insane opportunity btw. You likely won’t get better experience outside of the big 3 (closed) labs.

Nathan Lambert

@natolambert

5 Nov 2025

We're starting to hire for our 2026 Olmo interns! Looking for excellent students to do research to help build our best models (primarily enrolled in Ph.D. with experience or interest in any area of the language modeling pipeline).

115

16,843

Lucas Atkins · Oct 22, 2025 · 10:52 PM UTC

Lucas Atkins

@latkins

22 Oct 2025

Replying to @karpathy

That was a close one, thanks.

110

55,445

Lucas Atkins · Nov 1, 2025 · 4:22 AM UTC

Lucas Atkins

@latkins

1 Nov 2025

Posted without comment.

Lincoln 🇿🇦

@Presidentlin

1 Nov 2025

I made this. Jokes aside, devs want big and small models. Trinity is coming soon.

114

49,845

Lucas Atkins · Jul 12, 2025 · 4:47 AM UTC

Lucas Atkins

@latkins

12 Jul 2025

Delayed response but kimi 2 is immaculate. Unbelievable care went into this model. Well done, and under strict deadlines no doubt.

107

4,246

Lucas Atkins · Aug 21, 2025 · 3:00 AM UTC

Lucas Atkins

@latkins

21 Aug 2025

Had a great time at the @datologyai office today. Sorry for @code_star photo bombing the logo shot

105

19,155

Lucas Atkins · Apr 19, 2024 · 3:05 PM UTC

Lucas Atkins

@latkins

19 Apr 2024

I’m going on a staycation this weekend, but I wanted to get this out so I’m not distracted: llama-3-MOE. This is a departure from previous MOEs I’ve done. This uses @deepseek_ai’s MoE architecture, and not Mixtrals. There is no semantic routing, and there is no gate. All 4 experts are active for every token. It was trained on my orca-reka and orca-cohere datasets, and is very strong. It’s also not overfit, it’ll work just fine as is or with further training for your use cases. Link is below. Thank you @erhartford @FernandoNetoAi for your continued collaboration.

16,054

Lucas Atkins · Jul 29, 2025 · 2:27 AM UTC

Lucas Atkins

@latkins

29 Jul 2025

What a week to release a model holy hell

18,743

Lucas Atkins · May 23, 2024 · 12:41 AM UTC

Lucas Atkins

@latkins

23 May 2024

Here is our initial 22b model conversion from Mixtral 8x22b. We had the base model since Mixtral was first released, but it was left behind as our compute from @CrusoeEnergy went towards more ambitious projects using laserRMT. It is a great starting point for exploring expert extraction. Github with the code we made and more info is in the model readme. Thank you @FernandoNetoAi and @erhartford as always.

13,830

Lucas Atkins · Oct 22, 2025 · 2:20 AM UTC

Lucas Atkins

@latkins

22 Oct 2025

. @PrimeIntellect you have to stop. You smoke too tough. Your swag too different. Your environments too good. they'll kill you.

Johannes Hagemann

@johannes_hage

22 Oct 2025

New features since the Environments Hub launch 6 weeks ago - Evals Viewer - Community Discussions - Integration Tests - Inference Come build environments with us. We're building the best unified platform for building, sharing and training on environments.

8,426

Lucas Atkins · Sep 20, 2025 · 11:44 AM UTC

Lucas Atkins

@latkins

20 Sep 2025

I was waiting for this to happen and congrats @willccbb and @PrimeIntellect

will brown

@willccbb

20 Sep 2025

ladies and gentlemen, we present to you the unified @primeintellect infrastructure stack

14,179

Lucas Atkins · Feb 19, 2024 · 11:41 PM UTC

Lucas Atkins

@latkins

19 Feb 2024

Replying to @iScienceLuvr

The amount of people you’re going to trick with this shows just how good Sora is.

10,596

Lucas Atkins · Sep 29, 2025 · 4:30 AM UTC

Lucas Atkins

@latkins

29 Sep 2025

We're so far ahead of Adam at Arcee. We use adamW

15,032

Lucas Atkins · Jun 6, 2024 · 4:08 PM UTC

Lucas Atkins

@latkins

6 Jun 2024

My whole open-source career started with Qwen, and it was an honor to get to train Qwen2 on Dolphin prior to release. The 7b and 72b models are the best we've ever made, and I hope you're as delighted by them as we are. Truly - GPT4 at home.

Junyang Lin

@JustinLin610

6 Jun 2024

💗Hello Qwen2! Happy to share the Qwen2 models to you all! 📖 BLOG: qwenlm.github.io/blog/qwen2/ 🤗 HF collection: huggingface.co/collections/Q… 🤖 modelscope.cn/organization/q… 💻 GitHub: github.com/QwenLM/Qwen2 We have base and Instruct models of 5 sizes, Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B! The models have been generally enhanced and notably improved in coding, mathematics, and multilingual capabilities. The models support a context length of at least 32K tokens and Qwen2-72B-Instruct can support 128K tokens!

14,055

Lucas Atkins · Aug 14, 2024 · 4:42 PM UTC

Lucas Atkins

@latkins

14 Aug 2024

We've been working on this for quite some time, and I'm thrilled to share a preview of Arcee-Swarm. Instead of relying on a single large generalist model, Swarm utilizes multiple domain-specialized models working together to deliver exceptional results with both speed and nuance.

5,707

Lucas Atkins · Dec 26, 2024 · 3:27 AM UTC

Lucas Atkins

@latkins

26 Dec 2024

. @deepseek_ai clearly has more to reveal. Some of the architecture and config of V3 base bear a subtle resemblance to Quiet Star in design. Insights from R1 likely influenced its post-training. This feels unmistakably like a teaser. 2025 is shaping up to be a defining decade.

3,935

Lucas Atkins · Oct 22, 2025 · 8:25 AM UTC

Lucas Atkins

@latkins

22 Oct 2025

Hire cracked anons they said.

aria /ɔˈreːliəm/

@ariaurelium

22 Oct 2025

was wondering why introducing cut cross-entropy made SFT 5x slower turns out I missed an indent and was doing the whole thing in fp32. oops

11,937

Lucas Atkins · May 13, 2025 · 3:48 AM UTC

Lucas Atkins

@latkins

13 May 2025

. @stochasticchasm did it. Our pretrain has started in full. Insanely cracked dude

10,284

Lucas Atkins · Apr 20, 2025 · 6:34 PM UTC

Lucas Atkins

@latkins

20 Apr 2025

We have a run scheduled with 512 H200s for 12 days. I can't wait to show you what we're doing with it.

30,681

Lucas Atkins · May 3, 2025 · 5:55 AM UTC

Lucas Atkins

@latkins

3 May 2025

Quick shoutouts to some absolute legends on our team: @chargoddard writes the cleanest code I've ever seen. Our upcoming papers offer a glimpse into his mind. Genuinely brilliant. Few people think at his level. @stochasticchasm built a full training stack and infrastructure for 1024+ GPUs, ran large-scale ablations, and designed custom model architectures basically solo in just two months. Unreal. @FernandoNetoAi has kept our research on track while I've been deep in product. He built a custom classifier training library for Conductor from scratch because nothing else fit. He also developed one of the best RL setups for tool use I've seen. More breakthroughs are coming. He is the real co-lead and our math whisperer. He practically speaks in binary. On top of all that, he's been a true friend. @bartowski1182 built our internal eval suite in just six weeks while juggling every curveball I threw his way. A total utility knife. @abhi1thakur has taken full ownership of Conductor, building a new MCP management and hosting toolkit that's launching soon. You're going to love it. I'm proud and honestly still a little stunned to be part of this team. If I lead anything, it's making sure these brilliant minds have what they need to thrive. We're aiming for June to show off what @arcee_ai has been building: models, papers, products. Some are dropping even sooner. And yes, our first fully from-scratch model is on the way. We hope it'll be incredibly useful. We haven't been sleeping. We've been building. Stay tuned.

9,452

Lucas Atkins · Sep 19, 2024 · 2:17 AM UTC

Lucas Atkins

@latkins

19 Sep 2024

. @chargoddard speaking @NousResearch NousCon about dealing with tokenizers when doing model merging, and how we’re fixing that with mergekit @arcee_ai

8,511

Lucas Atkins · Sep 11, 2025 · 6:45 PM UTC

Lucas Atkins

@latkins

11 Sep 2025

Do you understand what this means?

Prime Intellect

@PrimeIntellect

11 Sep 2025

Coming Soon...

13,493

Lucas Atkins · Jul 19, 2024 · 2:40 PM UTC

Lucas Atkins

@latkins

19 Jul 2024

I want to avoid doing long threads for model releases - it seems to be a bit much. For any who missed it due to thread spam yesterday - check out Nova: huggingface.co/arcee-ai/Arce…

arcee-ai/Arcee-Nova · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

7,078

Lucas Atkins · Jun 25, 2025 · 10:47 PM UTC

Lucas Atkins

@latkins

25 Jun 2025

Thinking Machines is locked in on this blog post so hard rn

Kyle Corbitt

@corbtt

25 Jun 2025

Hot RL summer continues: we just released Summary-RL, an RL-trained summarization model that reaches SOTA on ServiceNow's Repliqa summarization benchmark!

5,161

Lucas Atkins · Aug 20, 2025 · 2:37 AM UTC

Lucas Atkins

@latkins

20 Aug 2025

The last two days have been a whirlwind, and I haven’t had a chance to read this end to end - though I did see an early draft - let alone comment. I’m one of the few people outside @datologyai fortunate enough to have seen these results firsthand, and everyone can experience them in our AFM models. I’m a firm believer that ambitious startups are stronger together than alone, and Datology is a partner I hold in deep loyalty and admiration. Extraordinary talent, ferocious hunger, and just enough memes. Concordia res parvae crescunt.

Pratyush Maini

@pratyushmaini

18 Aug 2025

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

9,060

Lucas Atkins · Mar 24, 2025 · 12:11 PM UTC

Lucas Atkins

@latkins

24 Mar 2025

huggingface.co/deepseek-ai/D…

6,436

Lucas Atkins · Nov 9, 2025 · 5:56 PM UTC

Lucas Atkins

@latkins

9 Nov 2025

This came to mind while working this weekend. For anyone starting post-training: once your pipeline is stable, fix a diverse generalist dataset and keep it constant. Run the same dataset across models. Start with a 1B dense model, scale toward 70B, then try MoE and hybrids.

6,554

Lucas Atkins · Mar 17, 2025 · 4:04 PM UTC

Lucas Atkins

@latkins

17 Mar 2025

Introducing Arcee Conductor - a new standard for intelligent model routing. Routes each input to its ideal AI model based on complexity, maximizing cost efficiency without compromising performance.

6,317

Lucas Atkins · Oct 31, 2025 · 6:04 PM UTC

Lucas Atkins

@latkins

31 Oct 2025

As of today, MergeKit is once again licensed under the LGPL and is fully permitted for commercial use. Read the blog post below to learn why we changed the license in the first place and what led us back to our roots. TLDR: It is the right thing to do.

10,476

Lucas Atkins · Aug 16, 2025 · 1:20 AM UTC

Lucas Atkins

@latkins

16 Aug 2025

I literally could NOT be more bullish on this

kalomaze

@kalomaze

16 Aug 2025

A taste of something soon to come, for everyone.

10,847

Lucas Atkins · Feb 5, 2025 · 9:36 PM UTC

Lucas Atkins

@latkins

5 Feb 2025

MergeKit v0.1 is here: Arcee Fusion, Expanded Model Support, and Multi-GPU Acceleration. Smarter merging, faster execution, and broader compatibility. Let’s dive in.

7,892

Lucas Atkins · Jul 29, 2025 · 7:31 PM UTC

Lucas Atkins

@latkins

29 Jul 2025

Lastly, we're hiring five additional researchers to accelerate our model development. If you're looking to join a fast-moving, ambitious team with extensive compute resources to create the strongest and most performant per-parameter models in the world, please reach out.

16,500

Lucas Atkins · May 12, 2025 · 4:24 AM UTC

Lucas Atkins

@latkins

12 May 2025

Just released a 32B model trained with globally distributed reinforcement learning? Neat. I just got a python script to run first try without asking claude for help. Same shit.

7,968

Lucas Atkins · Aug 28, 2025 · 12:35 AM UTC

Lucas Atkins

@latkins

28 Aug 2025

Seeing firsthand how much they’re tackling right now, this almost feels like a side project - not because it’s less important, but because everyone on the team is a 10x engineer. Shoutout to the 10x growth and events crew, too - @madisenxtaylor and @afurgs. Bullish.

Prime Intellect

@PrimeIntellect

27 Aug 2025

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

5,793

Lucas Atkins · Jan 3, 2025 · 7:47 PM UTC

Lucas Atkins

@latkins

3 Jan 2025

I used inference endpoints from @huggingface yesterday for the first time in months -- it was excellent. Kudos to the team, it was really painless.

22,854

Lucas Atkins · Mar 18, 2024 · 8:52 AM UTC

Lucas Atkins

@latkins

18 Mar 2024

I am releasing a version of base Gemma-7b with the bug fixes I implemented in GemMoE. Ensure you "trust_remote_code" Also made a few other modifications to improve vram use. Works great. Thanks @danielhanchen for your findings. Enjoy! I believe this model has a lot of unseen potential. huggingface.co/Crystalcareai…

5,562

Lucas Atkins · Sep 10, 2024 · 5:51 PM UTC

Lucas Atkins

@latkins

10 Sep 2024

Today is a HUGE release day for @arcee_ai , and we have quite a bit to show you! Check it out below.

7,759

Lucas Atkins · Apr 28, 2025 · 9:47 PM UTC

Lucas Atkins

@latkins

28 Apr 2025

Never forget the true Qwen MoE OG cc @JustinLin610 thank you for everything, your initial support got me where I am today.

4,590

Lucas Atkins · Jul 22, 2024 · 7:56 PM UTC

Lucas Atkins

@latkins

22 Jul 2024

Today Arcee is releasing two datasets: 1. The Tome - this is a 1.75 million sample dataset that has been filtered to train strong generalist models. This is the dataset that was used to train Spark and Nova 2. Agent-Data: This is Arcee-Agent's dataset, comprising different function calling datasets from salesforce, internlm, and glaive (with an extra 20k samples extended for multiple tool calls per response). This includes Magpie-300k-Pro as well, to prevent overfitting and make the model a strong conversationalist. Enjoy! Links below.

7,066

Lucas Atkins · Mar 26, 2024 · 10:00 AM UTC

Lucas Atkins

@latkins

26 Mar 2024

Here is my reworking of the recently released Quiet Star paper (arxiv.org/abs/2403.09629) - so that it actually uses the thought tokens. This model can think before it predicts the token. However, it needs further fine-tuning to generalize beyond math. This took a lot of work. I had to adapt the attention mask and write inference and fine-tuning code that wasn't included in the author's repo. It LOVES to use math. It was pre-trained on a purely math dataset. I have included a fine-tuning script within the repository. I would love help with optimizing the inference script, as the chat template is far from perfect. Please submit pull requests if you have suggestions. Fine-tuning takes a TREMENDOUS amount of vram. Be wary. Thanks to @erhartford for helping me with this project. @winglian would love your help adding this to axolotl. More to come. trust_remote_code=True huggingface.co/Crystalcareai…

6,438

Lucas Atkins · Mar 15, 2024 · 6:32 PM UTC

Lucas Atkins

@latkins

15 Mar 2024

I'm sharing the tools I modified to make GemMoE, along with two improved models/methods. Both models have not been fine-tuned whatsoever and are quite malleable. I also created a variant of @maximelabonne's lazymergekit specifically for making your own GemMoE. I have reached my personal compute budget for this project. If you're interested in helping out with compute for a full fine-tuning, reach out. huggingface.co/Crystalcareai…

5,276

Lucas Atkins · Nov 11, 2025 · 3:45 AM UTC

Lucas Atkins

@latkins

11 Nov 2025

God I’m so jealous at how good this is. @pleiasfr has a special place in my heart and @Dorialexander is a tremendous artist and scientist. Massive thank you for this gem, and congratulations.

Alexander Doria

@Dorialexander

10 Nov 2025

Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.

5,816

Lucas Atkins · Nov 29, 2024 · 9:19 PM UTC

Lucas Atkins

@latkins

29 Nov 2024

I'm delighted to share INTELLECT-1-Instruct, a model that I had the pleasure of post-training along with my team @arcee_ai . @PrimeIntellect has been an outstanding partner far before this training run, and we were thrilled to contribute both compute and expertise to INT-1.

4,588

Lucas Atkins · Aug 2, 2025 · 8:54 PM UTC

Lucas Atkins

@latkins

2 Aug 2025

For those who loved AFM-4.5B-Preview, here are those weights as well: huggingface.co/arcee-ai/AFM-…

arcee-ai/AFM-4.5B-Preview · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

4,039

Lucas Atkins · Jul 9, 2025 · 1:11 AM UTC

Lucas Atkins

@latkins

9 Jul 2025

It was only a matter of time

Lucas Atkins

@latkins

16 May 2025

5,243

Lucas Atkins · Jul 30, 2025 · 5:28 PM UTC

Lucas Atkins

@latkins

30 Jul 2025

These model sizes are incredibly TBD, and this is early copy - but it does speak to where we see our model sizes extending to.

will brown

@willccbb

30 Jul 2025

Replying to @code_star

👀

4,167

Lucas Atkins · Jun 11, 2025 · 5:19 PM UTC

Lucas Atkins

@latkins

11 Jun 2025

Great paper from our team led by @chargoddard detailing our method for proper logit-based distillation across models with different tokenizers. It's the same technique we used to convert Homunculus from Mistral to Qwen tokenizer with no loss in quality.

Arcee.ai

@arcee_ai

10 Jun 2025

Different models have different vocabularies, making it difficult to efficiently combine them for merging, distillation, or speculative decoding In this new paper, @arcee_ai researchers Charles Goddard and Fernando Fernandes Neto introduce a revolutionary approach called "tokenizer transplantation," utilizing a technique known as Orthogonal Matching Pursuit (OMP). Think of it as a sophisticated translation system that can convert between different model vocabularies without any retraining. Here's the key insight: even though different models use different vocabularies, the concepts they represent often align in predictable ways. Our method finds these alignments and uses them to transplant one model's vocabulary into another. If you'd like to learn more, please read our high-level blog post (arcee.ai/blog/breaking-down-…), or dive into the research paper (arxiv.org/abs/2506.06607). Learn more about model merging and get expert support at arcee.ai/product/mergekit.

6,553

Lucas Atkins · Nov 19, 2024 · 12:23 AM UTC

Lucas Atkins

@latkins

19 Nov 2024

I usually share updates on my work in machine learning, but today is different. After 5 years of building something truly special, I’m thrilled to share the most meaningful project of my life: my engagement. Forever grateful.

2,218

Lucas Atkins · Jun 18, 2025 · 5:00 PM UTC

Lucas Atkins

@latkins

18 Jun 2025

We teamed up with @datologyai to build what we believe is the strongest pretraining corpus in the world—and I truly think we nailed it. Their team was absolutely key to the model’s success. We started with ~23T tokens of high-quality data and distilled it down to 6.58T through even more rigorous filtering.

8,775

Lucas Atkins · Dec 2, 2024 · 8:31 PM UTC

Lucas Atkins

@latkins

2 Dec 2024

You’re likely used to seeing long threads from me about product releases/announcements. Hang with me, as this is by far the longest I’ve ever written:

13,705

Lucas Atkins · Jan 14, 2025 · 3:12 AM UTC

Lucas Atkins

@latkins

14 Jan 2025

huggingface.co/Qwen/Qwen2.5-…

Qwen/Qwen2.5-Math-PRM-72B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

3,617

Lucas Atkins · Jul 22, 2025 · 9:20 PM UTC

Lucas Atkins

@latkins

22 Jul 2025

I can confirm this model is rather amazing

Qwen

@Alibaba_Qwen

22 Jul 2025

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves top-tier performance across multiple agentic coding benchmarks among open models, including SWE-bench-Verified!!! 🚀 Alongside the model, we're also open-sourcing a command-line tool for agentic coding: Qwen Code. Forked from Gemini Code, it includes custom prompts and function call protocols to fully unlock Qwen3-Coder’s capabilities. Qwen3-Coder works seamlessly with the community’s best developer tools. As a foundation model, we hope it can be used anywhere across the digital world — Agentic Coding in the World! 💬 Chat: chat.qwen.ai/ 📚 Blog: qwenlm.github.io/blog/qwen3-… 🤗 Model: hf.co/Qwen/Qwen3-Coder-480B-… 🤖 Qwen Code: github.com/QwenLM/qwen-code

2,275

Lucas Atkins · Mar 7, 2024 · 6:30 AM UTC

Lucas Atkins

@latkins

7 Mar 2024

Want to fine-tune Gemma & can't use @unslothai? I've implemented @danielhanchen's bug fixes using TRL. Works for me & should for you. Dora toggle available. Thanks @_philschmid for initial instructions 1.5 weeks ago! huggingface.co/Crystalcareai…

train.py · Crystalcareai/GemmaBugFix-TRL at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

4,873

Lucas Atkins · Sep 21, 2025 · 4:09 AM UTC

Lucas Atkins

@latkins

21 Sep 2025

I usually avoid political commentary on this platform, but this goes beyond ordinary political debate. If we lose the H1B, we lose. Full stop. Whatever contest you personally feel we are in, we will lose it.

6,204

Lucas Atkins · Jul 29, 2025 · 7:31 PM UTC

Lucas Atkins

@latkins

29 Jul 2025

Quality was our top priority. We partnered with @datologyai to ensure that only the highest-quality data was included in training. You can feel the result when talking to AFM; the vibes are good.

4,170

Lucas Atkins · Sep 4, 2024 · 4:16 PM UTC

Lucas Atkins

@latkins

4 Sep 2024

A tremendously generous contribution to open science. Thank you @allen_ai, and huge congratulations to the team.

Niklas Muennighoff @Muennighoff

4 Sep 2024

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments 📜arxiv.org/abs/2409.02060 🧵1/9

4,806

Lucas Atkins · Sep 19, 2025 · 6:32 AM UTC

Lucas Atkins

@latkins

19 Sep 2025

Dude vik and team is so freakishly impressive. Outstanding work with insane care.

vik

@vikhyatk

18 Sep 2025

Excited to release a preview of Moondream 3. A 9B param, 2B active MoE vision language model that makes no compromises; offering state-of-the-art visual reasoning while still retaining an efficient and deployment-friendly form factor.

4,119

Lucas Atkins · Jul 29, 2025 · 7:31 PM UTC

Lucas Atkins

@latkins

29 Jul 2025

Our preview model actually tied at #2 for a while on the @yupp_ai leaderboard, when filtered for 2-5 turns. It has since gone further down, but I do think this speaks to the charm that this model has, which we haven't quite figured out how to evaluate.

13,929

Lucas Atkins · Nov 4, 2025 · 4:16 AM UTC

Lucas Atkins

@latkins

4 Nov 2025

Replying to @latkins @ADarmouni @redtachyon @PrimeIntellect @arcee_ai @datologyai

I post this not to vague post more. those are the sizes. Very much more than 3 per token. We’re wrapping it all up now. Expect it mid November. They’re good. Very good. But now we know how to go all the way.

2,774

Lucas Atkins · Apr 27, 2025 · 4:07 PM UTC

Lucas Atkins

@latkins

27 Apr 2025

.@stochasticchasm @FernandoNetoAi @chargoddard @bartowski1182 @arcee_ai @abhi1thakur

Lucas Atkins

@latkins

20 Apr 2025

We have a run scheduled with 512 H200s for 12 days. I can't wait to show you what we're doing with it.

17,844

Lucas Atkins · Sep 21, 2025 · 2:10 AM UTC

Lucas Atkins

@latkins

21 Sep 2025

.@stochasticchasm is indeed hibernating as we have a big gpu reservation coming online at 4:30am tomorrow morning. More to come :)

George

@georgejrjrjr

21 Sep 2025

Arcee bros relicensed AFM-4.5B to Apache 2.0! Yet more generosity coming out of their shop. Thank-you so much @latkins @stochasticchasm @CFGeek et al! I'm **stoked**.

8,618

Lucas Atkins · Apr 18, 2025 · 7:40 PM UTC

Lucas Atkins

@latkins

18 Apr 2025

Replying to @s_tworkowski

Do you send reasoning traces via api?

19,737

Lucas Atkins · Oct 2, 2025 · 4:12 PM UTC

Lucas Atkins

@latkins

2 Oct 2025

Not going to lie I didn’t get the bit at first and was super impressed by their research team.

Merriam-Webster

@MerriamWebster

26 Sep 2025

We are thrilled to announce that our NEW Large Language Model will be released on 11.18.25.

3,834

Lucas Atkins · Oct 26, 2025 · 9:36 PM UTC

Lucas Atkins

@latkins

26 Oct 2025

Emergency design meeting.

aria /ɔˈreːliəm/

@ariaurelium

26 Oct 2025

there should be an AI lab with the aesthetic sensibilities of Cruelty Squad

4,831

Lucas Atkins · Jun 4, 2025 · 3:47 AM UTC

Lucas Atkins

@latkins

4 Jun 2025

This is mostly a research artifact in preparation for the bigger release we have in a week or so, but it’s actually so delightful we put it out there anyway. Just a little guy.

𝚐𝔪𝟾𝚡𝚡𝟾 @gm8xx8

4 Jun 2025

𝐀𝐫𝐜𝐞𝐞 𝐀𝐈 Logit‑trajectory distillation to port Qwen3’s /think chains into a 12B Mistral‑Nemo | full CoT preserved, runs on a single 4090 𝘯𝘪𝘤𝘦 𝘭𝘪𝘵𝘵𝘭𝘦 𝘴𝘪𝘥𝘦 𝘱𝘳𝘰𝘫𝘦𝘤𝘵 huggingface.co/arcee-ai/Homu…

4,573

Lucas Atkins · Mar 19, 2024 · 3:02 PM UTC

Lucas Atkins

@latkins

19 Mar 2024

If any of you want to use @huggingface jupyterlabs to run @winglian's axolotl - I've attached the dockerfile I created to do so. Install as normal, all dependencies and cuda/torch needs are taken care of. huggingface.co/Crystalcareai…

Crystalcareai/HuggingFace-Jupyterlab-AxolotlCompatibleDocker · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

11,523

Lucas Atkins · Aug 1, 2025 · 4:49 PM UTC

Lucas Atkins

@latkins

1 Aug 2025

Oh, come on, I sent 3 messages, is this what 200/mo gets me?

6,095

Lucas Atkins · Mar 13, 2024 · 5:45 PM UTC

Lucas Atkins

@latkins

13 Mar 2024

GemMoE now works out of the box with Axolotl (just set trust_remote_code=True in the yml), and in correcting that, I spotted a few bugs that I squashed. It should perform even better. Might need to warm up the experts again though! @winglian @erhartford

2,986

Lucas Atkins · Oct 29, 2025 · 5:56 PM UTC

Lucas Atkins

@latkins

29 Oct 2025

We have a busy month ahead of us. A lot of releases, announcements and information to absorb. We also need feedback. Join our discord to be the first to know about and use our upcoming family of models and toolkits!

Arcee.ai

@arcee_ai

29 Oct 2025

ArceeAI is on Discord! Join for early access to some exciting drops!

5,525

Lucas Atkins · Jun 23, 2025 · 5:26 PM UTC

Lucas Atkins

@latkins

23 Jun 2025

The first of many technical blogs on AFM, and an improved context window for GLM-32B-Base as a proof point. Enjoy!

Arcee.ai

@arcee_ai

23 Jun 2025

Last week, we launched AFM-4.5B, our first foundation model. In this post by @chargoddard , you will learn how we extended the context length of AFM-4.5B from 4k to 64k context through aggressive experimentation, model merging, distillation, and a concerning amount of soup. Bon appétit 😋 Blog post: arcee.ai/blog/extending-afm-…

4,542

Lucas Atkins · Jun 10, 2024 · 2:30 PM UTC

Lucas Atkins

@latkins

10 Jun 2024

Life update: I'm excited to announce that I've officially joined @arcee_ai! I look forward to the journey ahead, making SLMs as helpful and useful as possible.

3,773

Lucas Atkins · May 12, 2025 · 4:53 AM UTC

Lucas Atkins

@latkins

12 May 2025

Oh yea @andriy_mulyar let me be clear @arcee_ai is @PrimeIntellect biggest customer (literally) and that won’t change for a long time. I’m memeing because while I’m stuck building the enterprise sand god @willccbb @kalomaze @samsja19 and @johannes_hage get to build the actual sand god.

12,057

Lucas Atkins · Jun 18, 2025 · 5:00 PM UTC

Lucas Atkins

@latkins

18 Jun 2025

Mid and post-training were key to performance: we used high-impact datasets, MergeKit for checkpoint merging, YaRN to extend context to 65,536 tokens, supervised fine-tuning for alignment, and RL + KTO for factual accuracy.

3,513

Lucas Atkins · Mar 27, 2024 · 3:20 AM UTC

Lucas Atkins

@latkins

27 Mar 2024

Here is the code i've been using to implement @AIatMeta 's branch train mix for creating mixture of expert models via tokenized routing w/o pretraining. Use the moe-fix branch from mergekit for the yaml: github.com/Crystalcareai/BTX

GitHub - Crystalcareai/BTX

Contribute to Crystalcareai/BTX development by creating an account on GitHub.

github.com

2,812

Lucas Atkins · Nov 12, 2025 · 3:58 AM UTC

Lucas Atkins

@latkins

12 Nov 2025

Claude 3.5 sucked

This tweet is unavailable

18,825

Lucas Atkins · Jan 27, 2025 · 4:13 AM UTC

Lucas Atkins

@latkins

27 Jan 2025

We should be outraged at Tim Berners-Lee for making the internet open-source, which allowed Deepseek to challenge OpenAI and undermined our advantage in generative AI.

Gary Marcus

@GaryMarcus

27 Jan 2025

Congress needs to bring in Zuckerberg and LeCun to discuss how their unilateral open-sourcing decision rapidly undermined the US advantage in Generative AI. Tomorrow.

1,296

Lucas Atkins · Sep 10, 2024 · 5:51 PM UTC

Lucas Atkins

@latkins

10 Sep 2024

We are open sourcing our EvolKit pipeline that was instrumental in the creation of supernova, under MIT license. This was heavily inspired by the AutoEvol paper from @WizardLM_AI, and is a tremendously powerful tool for creating complex datasets. Find it here: github.com/arcee-ai/EvolKit

4,528