Your agents shouldn’t be loose scripts with credit cards and tool access. They should run through a control plane.
Coming soon.
3
4
752
Genuinely just overheard someone at a bistro I walked past try to pitch a @naval tweet as their own thought to their date. Date called them out on it. I am done with SF.
26
58
1,347
I love these posts, but what frustrates me is nobody actually shares how they did it. So here’s the “how we actually did it" - you guessed it, using @DSPyOSS!
PRD → evals → PR to generate evals from PRDs. It’s evals all the way down. 🐢🌀
7
25
423
77,051
Replying to @ChristineCarril
I'm a bit confused (as a CEO doing all of these things presently) -- what are you doing, then?
6
240
Replying to @ZackKorman
What on earth? @figma this is unacceptable.
2
282
58,016
Agents don’t need frameworks. I open‑sourced a micro‑agent to prove it: @DSPyOSS modules + a ~100‑line loop. - Planner → tool → finalize with policy checks - CLI + FastAPI + JSON‑robust parsing - Traces + tiny eval harness Works with OpenAI or Ollama Check it out: github.com/evalops/dspy-micr…
dspy is not specifically an AI agent framework and yet:
10
23
260
27,424
Tired of LLMs arguing with no ground truth? Cognitive Dissonance DSPy links DSPy agents with Coq proofs – translating claims to formal specs, detecting conflicts, and resolving them with math. Move from “probably right” to provably right. 🔥 github.com/evalops/cognitive…
8
41
246
24,000
Replying to @zoink
The fact this was possible in the first place is pretty concerning. You say there was monitoring in place - why did it take a customer reporting it to bring it to attention?
5
189
28,761
Them: “You can’t just throw @DSPyOSS at everything.” Me: throws DSPy at inbox DSPy: batch_with_similar updates, route Google → Engineering, mark Mercury as networking pitch, surface competitive intel. Me: I absolutely can and will.
3
14
125
9,243
Stop treating evals like artisanal cheese. Most teams still handcraft test cases from scratch — tedious, brittle, slow. With @EvalOpsDev + DSPy, you describe a real scenario (“refund request from angry customer”) → we generate a complete test suite for it. Fast. Repeatable. On demand. No more spreadsheet hell. No more “did we test this edge case?” panic. Just evals that scale with your ambition.
Stop hand-crafting evals. With EvalOps + DSPy, just describe your scenario and we auto-generate test cases for you. From “refund requests with angry customers” → a full test suite in seconds.
2
12
114
12,515
Cold email isn’t dead. Bad cold email is. We open-sourced a CLI that learns from your 3 best “winner” emails using @DSPyOSS, then generates new outreach in the same founder-style tone. Input: CSV of wins + leads Output: scored, optimized emails you can A/B test next week github.com/evalops/founder-e…
4
5
101
11,700
Replying to @threepointone
Mineflare. So hot right now.
94
8,658
I’ve said it a few times, but @Suhail and team are building magic. I sincerely forgot I was in Mighty today, I thought Chrome was running flawlessly. That’s how seamless the product is. That’s what happens when you ruthlessly innovate. It’s just that good.
8
2
82
Replying to @mattturck
This is the type of thought leadership that keeps me coming back to this app
1
80
6,519
just be glad they didn't post hog
3
79
4,569
I need this but with gangster glasses at the end on the cat in black and white during the “guess who’s back”
1
1
45
Replying to @jasteinerman
100s of devices can DDoS your servers?
3
49
21,934
A comment on HackerNews.
1
45
Replying to @nickabouzeid
Apps taking items from your clipboard without permission.
2
43
2,798
Replying to @claudeai
Surprised by this response. Feels like an engineering diary, not a postmortem. Impact is buried (up to 16% degraded), causes are hedged, and “what’s changing” lacks hard commitments.... (would have expected prod eval coverage %, alert latency targets, rollback thresholds). Complexity ≠ accountability.
1
1
42
11,204
It is absurd how much better @AmpCode is at creating UI than everyone else. Not sure what all is cooking under the hood, but it's something incredible.
5
2
42
2,604
Replying to @PottsJustin
I dunno, I have a pretty good notion of who it could be
39
Tried @AmpCode. Feels like showing up to a go-kart race in an F1 car. Pricey, yeah—but that's the point. There's a full buffet of power under the hood. Very neat!
2
1
35
3,675
We researched a fundamental problem in AI safety Multi-agent systems have no way to verify conflicting beliefs. Debate? Voting? That's not ground truth. We built a framework that translates agent claims to formal proofs and verifies them mathematically. Results: 80% success rate, sub-200ms verification ⬇️
Tired of LLMs arguing with no ground truth? Cognitive Dissonance DSPy links DSPy agents with Coq proofs – translating claims to formal specs, detecting conflicts, and resolving them with math. Move from “probably right” to provably right. 🔥 github.com/evalops/cognitive…
1
3
35
5,371
Replying to @chan_k
Please print / frame this
Working at @cartainc was genuinely one of the best things I've ever done as a founder. There is a genuine MBA worth of knowledge in working there.
3
31
Having way too much fun building with @opencode 🤖💻The magic of OSS is you don’t just hack for today—you get to layer these experiments back in so the whole ecosystem levels up. @EvalOpsDev x OpenCode = 🔥 Can't wait to bring evals into everyone's workflows :)
3
2
31
6,709
The day has come. Removed Chrome because it was crashing _more_ than @MightyApp. Did not think I would see this day this soon, but it is a wonderful one :)
3
2
28
Replying to @nickabouzeid
Passwords, CC#, just about anything you’d store in a password manager / similar app.
1
29
1,085
A few weeks ago, I was debugging a prompt that worked perfectly in testing but kept failing in production. The difference? One extra space in the JSON format. Six hours of my life. Gone. Because LLMs are finicky about whitespace.
1
26
5,784
Whatever @simpsoka and team have been putting in the water at the Jules team has been clearly working. When I tried it, the product was good, but it wasn't really as useful as anything else in my workflow - Codex could work pretty similarly. I've just set it off on a difficult problem that it spent the last 40 minutes solving without any help and making way more progress than I did manually (and with Claude!). Super impressed.
2
3
27
6,479
My beloved @Superhuman raised a $33M series B from @a16z nyti.ms/2X1vOGo. I’ve been apparently using Superhuman since the first 100 users (how neat!). Amazed at all the team has accomplished.
2
25
Replying to @caseyjohnellis
I mean Perl is rough, but malware is a bit of a stretch /s
2
23
Replying to @andreasklinger
Loooove the name. Congrats!
3
36
Hands down, I have never heard someone whose music I can listen to more repeatedly than @Apashe_Music. Skill on truly another level.
2
1
23
If there ever was an application that was going to truly change how people do things, it’s @MightyApp. @Suhail and co have not only built technology that is genuinely mind blowing, but their attention to detail and customer obsession is something every company should strive for
2
1
24
Long overdue. @maticrobots turned “vision-only” autonomy from a sci-fi demo into a shipping product (literally on my floor) and did with years of heads-down R&D. Huge respect to @mehul and the crew for truly raising the bar on home robotics 👏
Whoa!! @maticrobots got that the first 10/10 review in 17 years! 🤯🤯🤯 Now we must live up to it!! Bar just got higher...
2
1
24
7,053
Just got to try @Cloudflare pages and... wow. It's just that good. If I was any JAMstack platform, I would be paying attention.
1
22
Man, @FactoryAI has been cooking. This team ships fast, takes bold swings, and it shows. Watching the product come together - and actually getting to use it - has been fascinating. A lot of teams are going to want to pay close attention to what Factory is building.
2
2
22
6,974
On the real though — the coolest thing you can do with @DSPyOSS is just build something with it :) Endless ideas out there. Just make something!
Guys this bait formula is becoming way too common and needs to stop... 😡😡
2
21
2,600
You should go work for @dakshgup. Not an employee, investor, or advisor myself (yet!) - I just love this product as a customer and want to see it continue to succeed. Greptile and BugBot have been my cofounders with Claude and Gemini. Absolutely pivotal parts of the workflow.
4
22
8,653
Context is everything. Spent a few hrs w/ @FactoryAI: agent hopped across repos, Jira, Sentry & more, nailing tasks beyond code—incident response, tech writing, product ideas that floored me. Remote+local workflows ✅ first-party integrations ✅ Sold.
3
3
21
2,537
me: writing my investor update also me: what if the LLM just graded me like a pissed off VC... @DSPyOSS time? 🤖: “traction = mid, story = cope, pls do better” Surprisingly effective. Thanks @dosco for Ax 🙏
1
4
22
2,292
Replying to @TaylorLorenz
Definitely @doctorow. His books inspired me throughout my childhood (I guess I'm aging myself here) -- and he's ultimately the reason I work in cyber-security today. He made the field sound so fascinating. I wrote him an email upon graduating, and he was so kind. Truly great.
1
20
Replying to @danielzarick
I'll take useless over harmful any day of the week
1
22
1,681
We write PRDs as Markdown in a repo Product drops a PRD (problem, scope, user stories, success criteria). Nothing fancy—just a .md file checked into Git. This gives us a single source of truth that code can read.
2
1
20
2,889
Replying to @sawyerhood
But why? Like what do they actually get from doing this?
8
19
9,400
Not a shill. @FactoryAI just works. And the one time it didn’t, the team jumped on a call and fixed it in 20 min. That’s rare—and it matters. Being vocal about that is important. I think it's vital to celebrate teams doing it right.
1
20
883
Fun fact: La Croix “Pure” is on sale for $1.09 per twelve pack on Prime Now. Less fun fact: I have no idea where I’m going to put 120 bottles of La Croix
4
16
Most teams don’t have 100s of labeled replies. That’s fine. DSPy lets you bootstrap from 3 winners—then generalize
Now shipping: founder-style cold email that learns from your winners. Open-source, CLI, @DSPyOSS inside (really need that on a sticker...)
1
19
4,285
Replying to @JoshConstine
Was great to see this on the TV this morning!
1
1
76
Claude Code + @useblacksmith + @greptile = cracked. Greptile nails PR reviews with real context. Blacksmith turns approvals into secure, automated deploys. Claude Code ties it all together with memory, tools, and flow. First time the dev stack has actually felt AI-native.
1
4
19
3,158
Replying to @isabelacmor
This was me except reinstalling the entire OS after I broke brew
3
19
YC partners are blunt for a reason: most startups die by default. Orbit Agent is our first shot at bottling that advice into something you can text (or use via the CLI) when you need it most. Part of the @EvalOpsDev push to build agents that reflect real workflows (and failures). 👉 github.com/evalops/orbit-age…
3
2
18
5,333
Quite the opposite, I think. I believe @paulg is saying the average person (those who voted for boaty) will continue to do as they will (crazy price pumping) longer than is sensical (even if fundamentally there’s no good reason for a price rise)
15
POV: your AI coding assistant has better git hygiene than most senior engineers and you’re reviewing its autonomous PR workflow from your phone before dinner in the back of a Waymo. Thanks @FactoryAI 🙏
3
1
18
8,576
And it goes without saying, I am majorly appreciative to the incredible team of folks working on DSPy. I have so much fun building with this framework. It allows me to move so quickly and iterate on so many ideas.
2
18
2,171
Replying to @dexhorthy
“This is a massive undertaking!” “I know. That’s why I’m using you 😄”
18
428
Replying to @DSPyOSS
DSPy is a hell of a lot more production-ready than your crappy prompt strings chained together, that's for sure :)
18
444
Absolutely floored using @FactoryAI. Feels like hiring a dev army on demand.
2
1
17
786
Replying to @filipe_almeida
Seems likely, if only for the hype of it all, no?
1
76
Replying to @cyantist
Antifragile fragile club
15
The most important KPI: are developers having fun building? Everything else is downstream.
programmer happiness driven development
3
17
4,052
Because clearly what Clubhouse needed was more Kool Aid
14
Replying to @abrilzucchi
SF is back!
1
152
Replying to @latkins
We'll catch every damn exception that exists
16
5,430
Replying to @ghosttyped
Truly the most beautiful of bills
16
4,416
7 deadly sins are the most successful product drivers, for the most part.
14
Making sure to thank my Poke as it slowly takes over trip planning. @interaction is absolutely cooking.
2
2
15
2,788
We should chat :) I’ve been chatting with @LucasNelson about what I believe to be the exact thing you’re going through right now as well :)
1
12
1,156
Replying to @AliAbdaal
Lady Whistledown is definitely using @RoamResearch
14
Step 1 inside the DSPy program: parse → structure We parse the PRD into a structured spec: goals, non-goals, user stories, guardrails, and measurable “done” criteria. That’s just Python + a Signature like PRDToSpec(prd_text -> spec_json). The module enforces that we get JSON with the fields we expect (if it’s malformed, we fail the run).
2
1
15
2,341
Replying to @naval
It’s a hard crowd!
14
Just got access to @Superhuman AI and 🤯 Really putting the name to good use — it’s incredible how fast and accurate it is.
1
2
15
3,656
Replying to @raffichill
Without question. The Family “values” piece is one of the best design reads I’ve ever had.
13
2,621
Replying to @DSPyOSS
Foundation models change. Foundational abstractions don’t. DS­Py is forever :)
1
14
3,937
Replying to @saranormous
Congrats!!! What a great name, too.
1
1
Replying to @loganbartlett
"TAM is great, margins are fantastic, but is it CHEUGy?" Checks out.
14
I have never felt so seen. Sent from Superhuman
14
I migrated the EvalOps codebase to @useblacksmith out of curiosity. Greater than 50% reduction in my build times. From a single line config change, that they did. I literally could not hand them my credit card faster. Blown away. Just incredible.
3
5
14
1,112
Replying to @TaylorLorenz
Mine would just create some JIRA tickets
10
Revisiting my take here: AI code-gen is still pretty bad at UI — but not all of the platforms! @AmpCode actually does surprisingly well. The key is going in with a killer prompt and leaning on areas where the model already has strong design priors. Be specific about the vibe you want. Use the words you’d use, point to reference points (open-source codebases, design styles, etc.). Done right, you can push the model toward the design direction you want without juggling multiple tools — just keep working in the same harness you’ve been using.
Let’s just say it up front: coding models are really fucking bad at UI. They can write clean TypeScript. They understand React’s component model. They even know Tailwind classes by heart. But put them in charge of a product surface and you get layouts that confuse, frustrate, or outright mislead users. It’s not a lack of horsepower—it’s a mismatch of context. And it tells us something important about where developer experience tools should go next. I wrote more about it here: haasonsaas.com/blog/coding-m…
1
2
14
2,429
Replying to @NWischoff
Unironically yes. Super underrated.
127
Replying to @Suhail
The ding dong noise on new entry
12
Replying to @signulll
If it's for coding, migrate to @FactoryAI and cut it to one :)
1
13
4,874
Whatever you think of @AmpCode Free is really what you think of ads. The product is good. They have to pay COGS. Not everything in life is free - but even when they found a trade to make it so, folks are still upset? I really don’t get it.
3
13
452
If you have not seen @lolitataub's "Asking for a founder" series of tweets, you are missing out. They are an absolute goldmine of information (and can give you an incredible sense of the market)
1
1
11
Someone opens a GitHub issue to “turn this PRD into evals” That’s the screen you’re looking at: an issue titled “AI-Powered Generation of Evaluation Suites from PRDs.” In the comments we literally type “@gremlin implement this.” That mention is just a trigger phrase for our bot. (The “Claude Code is working…” line is the Claude bot acknowledging it picked up the job, Gremlin works the same)
1
2
12
2,674