The AI engineering platform for teams shipping reliable AI agents and LLM applications. Also home to @ArizePhoenix.

San Francisco, CA
Pinned Tweet
Come hang with us over at Booth P4! We're excited to be this year's Evals track lead and will be hosting multiple talks, workshops, and lunch & learns.
6️⃣ Things to Know about AI Engineer World's Fair 2026 - It’s bigger than all previous AIEs - 4x Larger Expo with 4 Expo stages - Researchers: Poster sessions & Poaster sessions - AI Leadership: Token Billionaires & Off the Record - AI Verticals: Healthcare, GTM, FDE, AGC, Finance - Side Events: NEO, Kids day - attendees get $40k in credits to try everything our sponsors have to offer! It's going to be our BIGGEST show yet!
3
14
475,319
What if all your traces look healthy but your agent is failing silently? @dat_attacked walked us through how we used Signal to debug our own AI engineering agent, Alyx. Signal identified that empty returned strings were putting the agent into a loop, grouped the traces where the issue occurred as evidence, proposed a fix, and opened a PR that was merged. That’s what a self-improving agent loop looks like in practice. Next up: Fuad @tofuadmiral at Expo Stage 2 on how to evaluate voice agents. Catch it live.
3
137
50 traces. That’s how much data @HamelHusain says you need to start building evals that actually work. Pull them. Label them with a PM. Cluster the failures. Pick the highest-impact one. Write a binary eval You’ll learn more in an hour by doing this than in a month of dashboard watching.
1
1
2
188
A year ago, 200 instructions was the ceiling. Today it's closer to 2,000 - and up to 5,000 on the strongest models. The capacity problem is largely solved, but the verification problem is wide open. Our DevRel Lead Laurie Voss is presenting new IFScale data at AIE tomorrow, showing exactly where each model breaks down. DeepSeek quietly drops instructions. Opus refuses when innocuous words trip a safety classifier. Gemini burns its whole budget on reasoning and emits nothing. The question stops being "how long can my skills be?" and starts being "how do I know my agent followed all of them?" 📅 Day 3 · Wednesday, July 1 · 1:30–1:50pm 📍 Context Engineering / Room 2020 Full session details : ai.engineer/worldsfair/sched…
4
148
Game was on @aiDotEngineer Day 1 🔥 Come find us at Booth P4 and grab your spot for our AIE watch party tomorrow: luma.com/game-on-worldsfair-… See today’s talks in the thread👇
1
6
161
11:00-11:15 am Booth P4 “The Harness Stack: From One Agent to a Swarm” -Ankur Duggal 12:05-12:25 pm Expo 1 “Your Agent Is Lying to You About Whether It Worked” -Dat Ngo 1:55-2:15 pm Expo 2 “Voice Agents Are Mostly Invisible. Here’s How to See Them” -Fuad Ali
59
@truefoundry now integrates with @arizeai . Teams building LLM applications and agents can now export AI Gateway traces directly to Arize using OpenTelemetry: bringing end-to-end observability, evaluation, and debugging to production AI workloads without changing application code or deploying additional collectors. Monitor request flows, latency, token usage, errors, and model performance in Arize, while continuing to benefit from the unified rate limiting, cost tracking, access controls, and governance that TrueFoundry AI Gateway provides across all AI providers. The integration also includes privacy controls to exclude prompt and response data when required. One gateway. Complete AI observability. Thanks to the @arizeai team for the collaboration!
1
204
Wrapped up day 1 at @AIdotEngineer with Laurie’s second workshop on a deeper dive on continuous improvement for agents. “Observability is shifting to become action.” Thanks to all attendees, had a full room all day! Check out the schedule for our talks tomorrow: ai.engineer/worldsfair/sched…
2
175
Let your agents cook! Our solutions architect Ankur @Anky488 is walking through how to get evals up and running in minutes with Arize skills.
1
2
130
Your LLM gateway can be more than a router. With @truefoundry + Arize, every model call becomes an OpenInference trace: spans for auth, model resolution, provider calls, token usage, latency, cost, and more. The interesting bit: trace export happens async, so observability doesn’t sit on the inference path. Here's how: arize.com/blog/trace-and-eva…
3
2
4
248
Up next we have our Product Manager Fuad @tofuadmiral on how to use Arize skills and build self-learning loops for agents (hot topic alert!) Yet another full house workshop in Room 2010 🚀
1
2
5
243
Laurie @seldo just kicked off @aiDotEngineer for Arize with Workshop 101: From vibes to production: evaluating and shipping Al agents that work. Room is at capacity! Workshop 202 is at 2:20-4:20 Room 2010. Come early if you want to grab a seat 🙃
3
6
17
57,721
We built the whole workflow in workshop #1! instrument → trace → read data → eval → validate → iterate → ship → monitor More workshops coming up throughout the day in room 2010!
96
This Saturday: breakfast at 9, hacking by 10:30, live demos at 4:30, afterparty till late. @Londonmaxxing 003 at Ramen Space w/ Zed, ElevenLabs, OpenRouter, Cloudflare, AG Grid, TRMNL + us. We will see you there! luma.com/maxxing-london
2
5
1,499
You can have production-quality evals running in minutes. Our Solutions Architect Ankur Duggal @Anky488 is leading a workshop at AI Engineer World's Fair tomorrow, walking through how to stand up a production eval pipeline in minutes using Arize Agent Skills, no prior setup required. Grab lunch and come by, leave with a working eval setup. 📅 Day 1 · Monday, June 29 · 1:15–2:15 pm 📍 Room 2010 Mark your calendar: ai.engineer/worldsfair/sched… #AIEngineer #AIEWF #AIAgents #Evals #Skills
1
1
189
Come see what we've been building at Arize. Our Fuad Ali @tofuadmiral is leading a live walk-through of the latest features in Arize on Day 1 at AI Engineer World's Fair. If you want to see what's new firsthand, ask questions directly, and get a head start on features your team can put to use now, this is the session. 📅 Day 1 · Monday, June 29 · 11:05 am–12:05 pm 📍 Room 2010 See you there: ai.engineer/worldsfair/sched… #AIEngineer #AIEWF #AIAgents #Evals #Observability
1
229
Two workshops. Two chances to help you move from vibes-based development to production-ready AI agents. Our Laurie Voss @seldo is running two hands-on workshops Day 1 at AI Engineer World's Fair, covering the full lifecycle of tracing, evals, experiments, and production monitoring, using a real financial-analyst agent. 101 covers the core loop: instrument, do error analysis, build a layered eval suite, and close the loop with monitors. 201 goes deeper: session-level evals, RAG quality scoring, trajectory evaluation, and autonomous issue investigation with Signal. Workshop 101: ai.engineer/worldsfair/sched… Workshop 201: ai.engineer/worldsfair/sched… #AIEngineer #AIEWF #AIAgents #Evals #Observability
1
1
4
273
Excited to have Uber on the Evals track with us at Day 3 of AIE. Soumya Gupta and Jai Chopra are presenting how @Uber used closed-loop evals for their food photography enhancement agent. The session will cover reward hacking, where the agent learned to game the eval loop, and how they built an offline/online feedback loop for continuous improvement while enforcing safety guardrails at scale. If you're working on multimodal systems, agentic pipelines, or eval design under tight quality or safety constraints, this is the talk: ai.engineer/worldsfair/sched… 📅 Day 3 · Wednesday, July 1 · 11:40am-12:00pm 📍 Evals / Room 2005
2
2
274
What does a failing agent look like when all your metrics say it's fine? Our Strategy lead @dat_attacked unpacking one of the most common failure patterns in production AI: agents that report success without actually succeeding. We'll pull up a real trace where the outcome looks healthy and the path is broken, then show the Arize autopilot Signal surface the issue automatically, linking straight to the offending trace with debugging evidence attached. 📅 Day 2 · Tuesday, June 30 · 12:05-12:25pm 📍 Expo Stage 1 See the session: ai.engineer/worldsfair/sched… #AIEngineer #AIEWF #AIAgents #Evals #Observability
4
174