Arize AI · Jun 22, 2026 · 5:26 PM UTC

Arize AI

Pinned Tweet

Arize AI

@arizeai

Jun 22

Come hang with us over at Booth P4! We're excited to be this year's Evals track lead and will be hosting multiple talks, workshops, and lunch & learns.

AI Engineer

@aiDotEngineer

Jun 21

6️⃣ Things to Know about AI Engineer World's Fair 2026 - It’s bigger than all previous AIEs - 4x Larger Expo with 4 Expo stages - Researchers: Poster sessions & Poaster sessions - AI Leadership: Token Billionaires & Off the Record - AI Verticals: Healthcare, GTM, FDE, AGC, Finance - Side Events: NEO, Kids day - attendees get $40k in credits to try everything our sponsors have to offer! It's going to be our BIGGEST show yet!

475,319

Arize AI · Jun 30, 2026 · 8:47 PM UTC

Arize AI

@arizeai

19h

What if all your traces look healthy but your agent is failing silently? @dat_attacked walked us through how we used Signal to debug our own AI engineering agent, Alyx. Signal identified that empty returned strings were putting the agent into a loop, grouped the traces where the issue occurred as evidence, proposed a fix, and opened a PR that was merged. That’s what a self-improving agent loop looks like in practice. Next up: Fuad @tofuadmiral at Expo Stage 2 on how to evaluate voice agents. Catch it live.

137

Arize AI · Jun 30, 2026 · 8:00 PM UTC

Arize AI

@arizeai

19h

Watch the full talk or read our write-up here ⬇️ arize.com/blog/ai-evals-are-…

AI evals are a data science problem: What most teams get wrong

Hamel Husain explains why the best AI teams treat LLM judges like classifiers, not dashboards.

arize.com

Arize AI · Jun 30, 2026 · 8:00 PM UTC

Arize AI

@arizeai

19h

50 traces. That’s how much data @HamelHusain says you need to start building evals that actually work. Pull them. Label them with a PM. Cluster the failures. Pick the highest-impact one. Write a binary eval You’ll learn more in an hour by doing this than in a month of dashboard watching.

188

Arize AI · Jun 30, 2026 · 7:00 PM UTC

Arize AI

@arizeai

20h

A year ago, 200 instructions was the ceiling. Today it's closer to 2,000 - and up to 5,000 on the strongest models. The capacity problem is largely solved, but the verification problem is wide open. Our DevRel Lead Laurie Voss is presenting new IFScale data at AIE tomorrow, showing exactly where each model breaks down. DeepSeek quietly drops instructions. Opus refuses when innocuous words trip a safety classifier. Gemini burns its whole budget on reasoning and emits nothing. The question stops being "how long can my skills be?" and starts being "how do I know my agent followed all of them?" 📅 Day 3 · Wednesday, July 1 · 1:30–1:50pm 📍 Context Engineering / Room 2020 Full session details : ai.engineer/worldsfair/sched…

148

Arize AI · Jun 30, 2026 · 3:27 PM UTC

Arize AI

@arizeai

Jun 30

Game was on @aiDotEngineer Day 1 🔥 Come find us at Booth P4 and grab your spot for our AIE watch party tomorrow: luma.com/game-on-worldsfair-… See today’s talks in the thread👇

161

Arize AI · Jun 30, 2026 · 3:27 PM UTC

Arize AI

@arizeai

Jun 30

11:00-11:15 am Booth P4 “The Harness Stack: From One Agent to a Swarm” -Ankur Duggal 12:05-12:25 pm Expo 1 “Your Agent Is Lying to You About Whether It Worked” -Dat Ngo 1:55-2:15 pm Expo 2 “Voice Agents Are Mostly Invisible. Here’s How to See Them” -Fuad Ali

Arize AI · Jun 30, 2026 · 1:27 PM UTC

Arize AI

@arizeai

Jun 30

ICYMI: @truefoundry🤝@arizeai

TrueFoundry

@truefoundry

Jun 30

@truefoundry now integrates with @arizeai . Teams building LLM applications and agents can now export AI Gateway traces directly to Arize using OpenTelemetry: bringing end-to-end observability, evaluation, and debugging to production AI workloads without changing application code or deploying additional collectors. Monitor request flows, latency, token usage, errors, and model performance in Arize, while continuing to benefit from the unified rate limiting, cost tracking, access controls, and governance that TrueFoundry AI Gateway provides across all AI providers. The integration also includes privacy controls to exclude prompt and response data when required. One gateway. Complete AI observability. Thanks to the @arizeai team for the collaboration!

204

Arize AI · Jun 29, 2026 · 10:06 PM UTC

Arize AI

@arizeai

Jun 29

Wrapped up day 1 at @AIdotEngineer with Laurie’s second workshop on a deeper dive on continuous improvement for agents. “Observability is shifting to become action.” Thanks to all attendees, had a full room all day! Check out the schedule for our talks tomorrow: ai.engineer/worldsfair/sched…

175

Arize AI · Jun 29, 2026 · 8:38 PM UTC

Arize AI

@arizeai

Jun 29

Let your agents cook! Our solutions architect Ankur @Anky488 is walking through how to get evals up and running in minutes with Arize skills.

130

Arize AI · Jun 29, 2026 · 7:00 PM UTC

Arize AI

@arizeai

Jun 29

Your LLM gateway can be more than a router. With @truefoundry + Arize, every model call becomes an OpenInference trace: spans for auth, model resolution, provider calls, token usage, latency, cost, and more. The interesting bit: trace export happens async, so observability doesn’t sit on the inference path. Here's how: arize.com/blog/trace-and-eva…

Trace and evaluate TrueFoundry AI Gateway traffic in Arize AX

Learn how TrueFoundry AI Gateway exports OpenTelemetry traces to Arize AX so teams can trace, evaluate, and monitor production LLM and agent traffic without embedding a vendor SDK in every service.

arize.com

248

Arize AI · Jun 29, 2026 · 6:23 PM UTC

Arize AI

@arizeai

Jun 29

Up next we have our Product Manager Fuad @tofuadmiral on how to use Arize skills and build self-learning loops for agents (hot topic alert!) Yet another full house workshop in Room 2010 🚀

243

Arize AI · Jun 29, 2026 · 4:04 PM UTC

Arize AI

@arizeai

Jun 29

Laurie @seldo just kicked off @aiDotEngineer for Arize with Workshop 101: From vibes to production: evaluating and shipping Al agents that work. Room is at capacity! Workshop 202 is at 2:20-4:20 Room 2010. Come early if you want to grab a seat 🙃

57,721

Arize AI · Jun 29, 2026 · 6:08 PM UTC

Arize AI

@arizeai

Jun 29

We built the whole workflow in workshop #1! instrument → trace → read data → eval → validate → iterate → ship → monitor More workshops coming up throughout the day in room 2010!

Arize AI · Jun 29, 2026 · 8:00 AM UTC

Arize AI

@arizeai

Jun 29

This Saturday: breakfast at 9, hacking by 10:30, live demos at 4:30, afterparty till late. @Londonmaxxing 003 at Ramen Space w/ Zed, ElevenLabs, OpenRouter, Cloudflare, AG Grid, TRMNL + us. We will see you there! luma.com/maxxing-london

Londonmaxxing 003: Maxxing London Hackathon · Luma

How can we make London better to build in, live in and fall in love with? London is on a generational run. Billions in investment is flowing into the city,…

luma.com

1,499

Arize AI · Jun 28, 2026 · 7:00 PM UTC

Arize AI

@arizeai

Jun 28

You can have production-quality evals running in minutes. Our Solutions Architect Ankur Duggal @Anky488 is leading a workshop at AI Engineer World's Fair tomorrow, walking through how to stand up a production eval pipeline in minutes using Arize Agent Skills, no prior setup required. Grab lunch and come by, leave with a working eval setup. 📅 Day 1 · Monday, June 29 · 1:15–2:15 pm 📍 Room 2010 Mark your calendar: ai.engineer/worldsfair/sched… #AIEngineer #AIEWF #AIAgents #Evals #Skills

189

Arize AI · Jun 28, 2026 · 12:00 AM UTC

Arize AI

@arizeai

Jun 28

Come see what we've been building at Arize. Our Fuad Ali @tofuadmiral is leading a live walk-through of the latest features in Arize on Day 1 at AI Engineer World's Fair. If you want to see what's new firsthand, ask questions directly, and get a head start on features your team can put to use now, this is the session. 📅 Day 1 · Monday, June 29 · 11:05 am–12:05 pm 📍 Room 2010 See you there: ai.engineer/worldsfair/sched… #AIEngineer #AIEWF #AIAgents #Evals #Observability

229

Arize AI · Jun 27, 2026 · 7:00 PM UTC

Arize AI

@arizeai

Jun 27

Two workshops. Two chances to help you move from vibes-based development to production-ready AI agents. Our Laurie Voss @seldo is running two hands-on workshops Day 1 at AI Engineer World's Fair, covering the full lifecycle of tracing, evals, experiments, and production monitoring, using a real financial-analyst agent. 101 covers the core loop: instrument, do error analysis, build a layered eval suite, and close the loop with monitors. 201 goes deeper: session-level evals, RAG quality scoring, trajectory evaluation, and autonomous issue investigation with Signal. Workshop 101: ai.engineer/worldsfair/sched… Workshop 201: ai.engineer/worldsfair/sched… #AIEngineer #AIEWF #AIAgents #Evals #Observability

273

Arize AI · Jun 27, 2026 · 12:00 AM UTC

Arize AI

@arizeai

Jun 27

Excited to have Uber on the Evals track with us at Day 3 of AIE. Soumya Gupta and Jai Chopra are presenting how @Uber used closed-loop evals for their food photography enhancement agent. The session will cover reward hacking, where the agent learned to game the eval loop, and how they built an offline/online feedback loop for continuous improvement while enforcing safety guardrails at scale. If you're working on multimodal systems, agentic pipelines, or eval design under tight quality or safety constraints, this is the talk: ai.engineer/worldsfair/sched… 📅 Day 3 · Wednesday, July 1 · 11:40am-12:00pm 📍 Evals / Room 2005

274

Arize AI · Jun 26, 2026 · 7:00 PM UTC

Arize AI

@arizeai

Jun 26

What does a failing agent look like when all your metrics say it's fine? Our Strategy lead @dat_attacked unpacking one of the most common failure patterns in production AI: agents that report success without actually succeeding. We'll pull up a real trace where the outcome looks healthy and the path is broken, then show the Arize autopilot Signal surface the issue automatically, linking straight to the offending trace with debugging evidence attached. 📅 Day 2 · Tuesday, June 30 · 12:05-12:25pm 📍 Expo Stage 1 See the session: ai.engineer/worldsfair/sched… #AIEngineer #AIEWF #AIAgents #Evals #Observability

174