Goodfire · May 7, 2026 · 4:09 PM UTC

Goodfire

Pinned Tweet

Goodfire

@GoodfireAI

May 7

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

311

1,683

11,257

3,263,839

Lucius Bushnaq ⏹️ · Jun 26, 2026 · 11:20 AM UTC

Goodfire retweeted

Lucius Bushnaq ⏹️@BushnaqLucius

Jun 26

Replying to @bygregorr @GoodfireAI

Checked. Same story as French and Spanish. The LoRAs wreck Dutch and Swedish performance, the single component edit suppression fine-tune leaves them alone.

1,826

Goodfire · Jun 25, 2026 · 5:12 PM UTC

Goodfire

@GoodfireAI

Jun 25

Correction: a plotting error caused the bars in the plot of off-target effects to display at 0.01 nats above the true means. The corrected plot is below:

5,828

Goodfire · Jun 25, 2026 · 4:23 PM UTC

Goodfire

@GoodfireAI

Jun 25

We removed an LM's ability to speak German by fine-tuning on only 4 German tokens. As part of a 1-day hackathon with our product Silico, we removed a 67M-parameter language model's ability to predict German text, by tuning only a scalar factor on one subcomponent of the weights. (1/6)

131

1,427

300,250

more replies

Goodfire · Jun 25, 2026 · 4:23 PM UTC

Goodfire

@GoodfireAI

Jun 25

Plus, that interpretability lets us notice and fix problems. E.g.: initially we tuned the top 16 German-related components, but their labels showed most were about foreign languages in general. So we narrowed to the single component for German alone, improving precision. (5/6)

7,604

Goodfire · Jun 25, 2026 · 4:23 PM UTC

Goodfire

@GoodfireAI

Jun 25

This is an early demo of how parameter decomposition could enable targeted, predictable model editing. Details on this experiment: lesswrong.com/posts/ieoWstub… If you want to run experiments on your model too, learn more and request access to Silico: goodfire.ai/silico

Exploration: fine-tuning with parameter decomposition — LessWrong

TL;DR: We can destroy a 67M-parameter language model's ability to predict German text by fine-tuning a single number: the scalar prefactor on one Ger…

lesswrong.com

7,226

Eric Ho · Jun 24, 2026 · 7:47 PM UTC

Goodfire retweeted

Eric Ho

@ericho_goodfire

Jun 24

we're hiring for a bunch of technical GTM roles at @GoodfireAI across forward deployed engineering, sales, and growth come help us understand every model across biology, materials, robotics, language, and more apply here or DM me: goodfire.ai/careers

245

22,973

Goodfire · Jun 23, 2026 · 4:30 PM UTC

Goodfire

@GoodfireAI

Jun 23

Stories have shapes: a comedy rises toward joy; a tragedy falls into loss. Inside an LLM, that’s visible more literally: as an LLM reads a story, its internal activations trace a wandering path that reflects the model’s sense of what kind of story it is reading. (1/5)

Goodfire

@GoodfireAI

May 7

121

859

101,327

more replies

Goodfire · Jun 23, 2026 · 4:30 PM UTC

Goodfire

@GoodfireAI

Jun 23

Emotions in stories are a simple case study, but the lesson is general: a model's activations, viewed over time, trace trajectories along manifolds. Fully understanding models, and debugging and designing them, means studying how representations change over time! (4/5)

2,502

Goodfire · Jun 23, 2026 · 4:30 PM UTC

Goodfire

@GoodfireAI

Jun 23

Read the full post: goodfire.ai/research/stories…

Meandering on Manifolds: The Neural Geometry of Stories Over Time

To fully understand LLM representations, we must understand how they change dynamically, over the course of a prompt or conversation. We investigate these temporal dynamics with a simple case study:...

goodfire.ai

4,477

Vmax · Jun 18, 2026 · 9:00 PM UTC

Goodfire retweeted

Vmax

@VmaxAI

Jun 18

Following the blog post from our collaboration with @GoodfireAI, the arxiv paper for PROPEL is now available.

Augustine Mavor-Parker

@MavorParker

Jun 18

Replying to @MavorParker

The arxiv is now live! arxiv.org/abs/2606.18284

3,770

Goodfire · Jun 17, 2026 · 3:52 PM UTC

Goodfire

@GoodfireAI

Jun 17

We're hosting a happy hour at ICML, Wednesday July 8! Come connect with members of the Goodfire team. Learn about our work in neural geometry and other recent publications. Note that space is limited, and we’re prioritizing attendees who are actively engaged in relevant AI research areas. Link to register in the thread!

133

14,482

Goodfire · Jun 17, 2026 · 3:52 PM UTC

Goodfire

@GoodfireAI

Jun 17

Goodfire ICML Happy Hour · Luma

The Event: Come connect with members of the Goodfire team, including researchers and other members of technical staff. We'll have food, drinks, and good…

luma.com

2,045

Santiago Aranguri · Jun 12, 2026 · 11:51 PM UTC

Goodfire retweeted

Santiago Aranguri

@santiaranguri

Jun 12

Happy to see our work cited in the Claude Fable & Mythos system card! Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness.

2,347

Goodfire · Jun 11, 2026 · 5:05 PM UTC

Goodfire

@GoodfireAI

Jun 11

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

109

902

179,900

more replies

Goodfire · Jun 11, 2026 · 5:05 PM UTC

Goodfire

@GoodfireAI

Jun 11

If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design. Request access to Silico here: goodfire.ai/silico (9/9)

Build AI models the way you write software

Understand and debug your AI model with Silico.

goodfire.ai

4,649

Goodfire · Jun 11, 2026 · 5:05 PM UTC

Goodfire

@GoodfireAI

Jun 11

Read the full blog post on predictive data debugging: goodfire.ai/research/predict…

Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train

Given a preference dataset, we can accurately predict which behaviors RL will amplify or suppress before you train, trace them back to the responsible data, and reshape the dataset and/or training...

goodfire.ai

4,318