The next wave of AI will not be won by better prompts. It will be won by systems that learn from experience. Today, Prime Intellect Lab is out of beta, open for you to start training your own models. The era of self-improving agents is here.
83
204
2,007
1,345,726
Prime Intellect retweeted
something has definitely shifted in the past few weeks. seeing a huge uptick in large enterprises wanting to secure compute and post-train their own models in house, frequently on top of GLM-5.2. everyone is starting to understand how open source wins.
94
193
2,604
254,621
Prime Intellect retweeted
1/n For browser agents, a major bottleneck in evaluation is truthful scoring on the live web. A task is only as good as your ability to confirm the agent actually did it, on a real site whose state keeps moving and that the agent can potentially misreport. So we took matters into our own hands. Today, we're releasing Ecom Bench on @PrimeIntellect: 40 shopping tasks on real Shopify storefronts, each run in a live @browserbase browser and graded by a deterministic verifier. vibrantlabs.com/research/eco…
7
11
28
6,828
Prime Intellect retweeted
this is a good one
Today we're releasing prime-rl v0.6.0 — enabling RL at trillion-parameter MoE scale on agentic workloads at the highest efficiency. We've relentlessly optimized our RL infra. The result: GLM-5 on agentic SWE tasks at 131k context and sub-5-minute step time.
1
2
43
6,174
Huge thanks to the @vllm_project team, and @robertshaw21 in particular, for all the help along the way. Also to the llm-d and Dynamo teams for the collaboration on routing and inference.
3
51
3,910
Over a long run the trainer and inference policies slowly drift apart, and that mismatch can kill your training. R3 (router replay) captures the routing decisions from the inference engine, replays them on the trainer - KL mismatch drops ~10x.
2
51
3,727
The trainer is 3D-parallel (FSDP2 + CP + EP), built on TorchTitan. FSDP2 shards params, grads & optimizer state. EP keeps experts sharded and routes tokens with all2all instead of all-gathering ~80GB per layer. CP handles the 131k context and GLM-5's DSA attention.
2
55
3,534
Today we're releasing prime-rl v0.6.0 — enabling RL at trillion-parameter MoE scale on agentic workloads at the highest efficiency. We've relentlessly optimized our RL infra. The result: GLM-5 on agentic SWE tasks at 131k context and sub-5-minute step time.
37
95
949
288,296
One Mooncake store pools KV cache across all nodes, so any worker can reuse any prefix. The router picks workers by a score over load, queue depth, KV usage and prefix overlap. You get cross-replica cache hits with balanced routing across the whole deployment.
1
52
3,868
We disaggregate prefill and decode onto separate workers. A long prefill used to stall decode for everyone. Now it doesn't.
1
59
4,679
In RL, inference is the bottleneck — we optimize for throughput, not latency. High concurrency, FP8 precision, and wide expert parallelism over 32+ GPUs. Every GPU holds its own slice of experts and acts as its own endpoint.
1
70
6,311
Prime Intellect retweeted
awesome post by @kimbochen covering RL systems end-to-end, including a SWE training run on GLM-5 using our prime-rl framework.
RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/…
1
16
145
20,898
Prime Intellect retweeted
nice blog by @kimbochen about the current RL ecosystem, goes into detail about the different settings and tradeoffs to consider when RLing open models
RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/…
1
9
85
13,121
Prime Intellect retweeted
Excited to support this epic Inference-time compute hackathon with Prime Intellect credits for post-training + compute 24hr hack on > Agents: Multi-step systems that take a goal and execute. Tool use, planning, long horizons. > Real-Time and Interactive: Sub-second loops, live multimodal. > ​RL + Applied AI: Systems that judge capability of a person or a model. Autograders, preference ranking, rubrics, verified-skill signals, human-in-the-loop. luma.com/hncudfxb
7
7
96
6,887
Prime Intellect retweeted
We are so back! Future looking bright to post-train, serve, and continuously improve your own model on top of models like GLM-5.2 using primeintellect.ai/ 🫡
Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: z.ai/blog/glm-5.2 Weights: huggingface.co/zai-org/GLM-5… API: docs.z.ai/guides/llm/glm-5.2 Coding Plan: z.ai/subscribe Chat: chat.z.ai
5
5
118
9,096
Prime Intellect retweeted
Great RL systems deep dive by @SemiAnalysis_ Scaling RL is as much of an infra problem as an algorithm one SemiAnalysis ran experiments on our stack: Prime RL + Sandboxes. System efficiency is ultimately queue health to match generator and trainer throughput
RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/…
4
15
119
28,769
Prime Intellect retweeted
been beating this drum since early 2025, seems like people are starting to see why it's so important :) RL works -> "train or get trained on" -> open models + post-training infra are the path to institutional flywheels + democratization of AI progress
The next big trade is infrastructure / RL environments that enable companies to turn their institutional knowledge / processes into continuously improving learning loops that they can own.
6
28
392
35,696
Prime Intellect retweeted
Satya is perfectly describing the why and what behind @primeintellect since 2023 🫡 > AI needs to be open & sovereign > Let every company create its own self-improving agents: and own their loop to make them better > A rich open ai ecosystem creates far more abundance than a future locked down by a few closed labs > Every company is becoming an ai company: so every company needs to own its own product <> model improvement loop @primeintellect enables this today: > Your own evals + rl envs for the outcomes you care about > models self-improving in production from your real traces > don't cede your moat to a handful of labs. This self-improvement loop is the IP and it compounds Open self improving agents for everyone 🫡
13
32
228
25,054
By performing SFT on tool outputs and RL on the assistant tokens, we can efficiently teach the model the environment dynamics. This happens on-policy: the LLM models the environment not in a vacuum but in response to its own actions.
1
2
90
9,942
We show strong results in the under-resourced programming language Forth and evaluate generalization to unrelated environments. We also characterize what aspects of an environment lead to overfitting when using ECHO, how model behavior is impacted, and much more.
1
1
73
5,096