Carlos E. Perez · Apr 4, 2026 · 11:29 AM UTC

Carlos E. Perez

Pinned Tweet

Carlos E. Perez

@IntuitMachine

Apr 4

Introducing "A Pattern Language for Agentic AI Skill Design."

954

83,780

Carlos E. Perez · Nov 21, 2020 · 1:34 PM UTC

Carlos E. Perez

@IntuitMachine

21 Nov 2020

The state of machine learning practice:

1,963

21,162

Carlos E. Perez · Mar 8, 2021 · 3:53 PM UTC

Carlos E. Perez

@IntuitMachine

8 Mar 2021

The brain's consensus algorithm demonstrated:

123

2,402

10,058

Carlos E. Perez · Dec 1, 2024 · 3:26 PM UTC

Carlos E. Perez

@IntuitMachine

1 Dec 2024

Wow! "Attention is All You Need" (i.e., Transformers) was inspired by the Alien's communication style in the movie Arrival.

Community note

This is an error. The podcast got this story wrong. Arrival was not an inspiration for self-attention/the transformer, it was however used as an analogy. bsky.app/profile/victor… bsky.app/profile/tilman

138

575

5,150

1,045,109

Carlos E. Perez · May 24, 2025 · 8:14 AM UTC

Carlos E. Perez

@IntuitMachine

24 May 2025

Shocker! Claude 4 system prompt was leaked, and it's a goldmine! The Claude system prompt incorporates several identifiable agentic AI patterns as described in "A Pattern Language For Agentic AI." Here's an analysis of the key patterns used: Run-Loop Prompting: Claude operates within an execution loop until a clear stopping condition is met, such as answering a user's question or performing a tool action. This is evident in directives like "Claude responds normally and then..." which show turn-based continuation guided by internal conditions. Input Classification & Dispatch: Claude routes queries based on their semantic class—such as support, API queries, emotional support, or safety concerns—ensuring they are handled by different policies or subroutines. This pattern helps manage heterogeneous inputs efficiently. Structured Response Pattern: Claude uses a rigid structure in output formatting—e.g., avoiding lists in casual conversation, using markdown only when specified—which supports clarity, reuse, and system predictability. Declarative Intent: Claude often starts segments with clear intent, such as noting what it can and cannot do, or pre-declaring response constraints. This mitigates ambiguity and guides downstream interpretation. Boundary Signaling: The system prompt distinctly marks different operational contexts—e.g., distinguishing between system limitations, tool usage, and safety constraints. This maintains separation between internal logic and user-facing messaging. Hallucination Mitigation: Many safety and refusal clauses reflect an awareness of LLM failure modes and adopt pattern-based countermeasures—like structured refusals, source-based fallback (e.g., directing users to Anthropic’s site), and explicit response shaping. Protocol-Based Tool Composition: The use of tools like web_search or web_fetch with strict constraints follows this pattern. Claude is trained to use standardized, declarative tool protocols which align with patterns around schema consistency and safe execution. Positional Reinforcement: Critical behaviors (e.g., "Claude must not..." or "Claude should...") are often repeated at both the start and end of instructions, aligning with patterns designed to mitigate behavioral drift in long prompts.

480

4,822

1,240,486

Carlos E. Perez · Sep 26, 2025 · 11:56 PM UTC

Carlos E. Perez

@IntuitMachine

26 Sep 2025

Everyone ‘knows’ AGI will either make us all unemployed or fabulously wealthy. Except, a rather brilliant (and chilling) paper from a Yale economist suggests it's neither. It says the economy will boom, and our wages... won't. A bit awkward. I've been digging into this 2025 paper, "We Won't Be Missed," and it's fascinating. The premise: AGI arrives and can do all economically valuable work. And the 'compute' to run it gets cheaper and more abundant over time. So, what happens to us fleshy, rather expensive humans? The whole argument hinges on a masterstroke of a distinction. The paper splits all work into two types: 1️⃣ Bottleneck Work: The truly essential stuff. Producing energy, logistics, scientific discovery. The economy literally cannot grow unless this work gets done. 2️⃣ Accessory Work: The 'nice-to-haves'. Arts, fine dining, hospitality... maybe even writing witty Twitter threads. (Gulp). Now, you might think AGI will just take the grunt work, leaving the important strategic stuff to us. Wrong. To achieve maximum growth, the economy must automate all the bottlenecks. It can't be held back by us. So AGI systematically takes over everything that is mission-critical. So... are we all fired and sent home? Surprisingly, no. The model shows people still work. We either help out with the 'bottleneck' tasks or get shuffled off to 'accessory' jobs that aren't worth the electricity to automate. But that's not the interesting part. Here's where it gets properly weird. Your future salary isn't based on your skill, your years of experience, or how 'important' your job feels. It's capped by one thing: the cost of the computational resources needed to do your job instead of you. Imagine that. As compute gets exponentially cheaper, the value of replicating your work plummets. The economy is soaring, productivity is off the charts... but your wage is pegged to a falling technological cost. You're not obsolete, you're just... replicable. And replicable is cheap. This leads to the paper's most brutal conclusion: The share of national income that goes to labour (i.e., salaries) collapses towards ZERO. All the wealth, all the gains from this incredible boom, flow to the owners of the compute. Splendid. Here's what this means for you. Next time you see a headline about a new AI model smashing a benchmark, don't just ask "Will that take my job?" Ask: "How much would it cost to run that model 24/7?" Because that figure might just be your future salary cap. Now, the paper isn't all doom. It notes that society as a whole gets richer, and we could still find meaning in 'accessory' work. But the central economic role of human labour as the engine of growth? Gone. We become passengers, not pilots. The paper's title is "We Won't Be Missed." Not because we're replaced, but because the economy will chug along just fine, growing faster than ever, whether we show up for work or not. Completely changes how I think about the 'future of work'. Makes you wonder what we should really be planning for, doesn't it?

392

776

4,248

586,363

Carlos E. Perez · Feb 20, 2024 · 2:04 PM UTC

Carlos E. Perez

@IntuitMachine

20 Feb 2024

Groq is a Radically Different kind of AI architecture Among the new crop of AI chip startups, Groq stands out with a radically different approach centered around its compiler technology for optimizing a minimalist yet high-performance architecture. Groq's secret sauce is this compiler-first method that shuns complexity in favor of tailored efficiency. At the heart of Groq’s architecture is an almost surprisingly bare-bones design that does away with unnecessary logic in favor of raw parallel throughput. The hardware itself is comparable to an ASIC – an application-specific integrated circuit finely tuned for machine learning. However, unlike a fixed-function ASIC, Groq leverages a custom compiler that can adapt and optimize across different models. It is this combination of a streamlined architecture and an intelligent compiler that sets Groq apart. The key insight is that many AI chips stack components, like GPUs, that bring extraneous hardware and bloat. Groq returns to first principles, recognizing that machine learning workloads are about massive parallelism over simple data types and operations. By eliminating generic hardware and even concepts like locality, the design maximizes throughput and efficiency. This is enabled by Groq’s compiler that sits between software frameworks like TensorFlow and the hardware. The compiler analyzes and optimizes neural network graphs, tailoring and mapping them to the underlying architecture for accelerated execution. It breaks computations into the smallest operations to unlock parallelism. The compiler also enables capabilities like batch size 1 inference that ensures all hardware is usefully leveraged. Critically, Groq built its compiler before even finalizing the hardware design. The software insights directly informed the architecture. This co-design process allowed inference-specific optimization without legacy limitations. The compiler also provides deterministic guarantees of runtimes, enabling reliable scaling. Together, the Groq compiler and architecture form a streamlined, robust engine for machine learning inference. The innovative compiler-first methodology allows custom optimization that balances flexibility with performance. Rather than chasing complexity, Groq realizes less can be more when software and hardware align – a compelling recipe as AI workloads continue evolving.

100

669

3,883

2,208,173

Carlos E. Perez · Mar 28, 2024 · 9:10 PM UTC

Carlos E. Perez

@IntuitMachine

28 Mar 2024

Why worry about global debt when this is happening in AI?!

240

220

3,419

886,493

Carlos E. Perez · Sep 15, 2024 · 8:53 AM UTC

Carlos E. Perez

@IntuitMachine

15 Sep 2024

1/n Terrence Tao, arguable the most gifted living mathematician has tried GPT-o1 and this is his verdict: "However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student. It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "competent graduate student" is reached."

428

3,648

763,295

Carlos E. Perez · Jun 27, 2025 · 1:30 PM UTC

Carlos E. Perez

@IntuitMachine

27 Jun 2025

OpenAI self-leaked its Deep Research prompts and it's a goldmine of ideas! Let's analyze this in detail!

412

3,694

529,061

Carlos E. Perez · Sep 21, 2023 · 3:16 PM UTC

Carlos E. Perez

@IntuitMachine

21 Sep 2023

LLMs are glorified autocompleters. We can say the same about human vision. Here's a demo of autocomplete mode!

125

382

3,456

742,631

Carlos E. Perez · May 28, 2025 · 1:47 AM UTC

Carlos E. Perez

@IntuitMachine

28 May 2025

It turns out that Anthropic has a prompt engineering interactive course!

349

3,540

381,866

Carlos E. Perez · Feb 8, 2025 · 10:01 PM UTC

Carlos E. Perez

@IntuitMachine

8 Feb 2025

LeCun's argument against LLMs still remains unclear to me. Can anyone break it down in greater detail?

Rohan Paul

@rohanpaul_ai

8 Feb 2025

Yann LeCun on architectures that could lead to AGI

187

213

3,443

1,381,235

Carlos E. Perez · May 15, 2024 · 2:17 PM UTC

Carlos E. Perez

@IntuitMachine

15 May 2024

Homo Sapiens hit the cognitive wall for 200,000 years. We only exponentially accelerated in the past 500. Why?

1,140

183

2,871

929,942

Carlos E. Perez · Dec 30, 2022 · 5:02 AM UTC

Carlos E. Perez

@IntuitMachine

30 Dec 2022

#ChatGPT to video via #stablediffusion and other AI tools.

127

534

2,917

946,466

Carlos E. Perez · Sep 29, 2025 · 10:39 PM UTC

Carlos E. Perez

@IntuitMachine

29 Sep 2025

How is it possible that Claude Sonnet 4.5 is able to work for 30 hours to build an app like Slack?! The system prompts have been leaked and Sonnet 4.5's reveals its secret sauce! Here’s how the prompt enables Sonnet 4.5 to autonomously grind out something Slack/Teams-like—i.e., thousands of lines of code over many hours—without falling apart: It forces “big code” into durable artifacts. Anything over ~20 lines (or 1500 chars) is required to be emitted as an artifact, and only one artifact per response. That gives the model a persistent, append-only surface to build large apps module-by-module without truncation. It specifies an iterative “update vs. rewrite” workflow. The model is told exactly when to apply update (small diffs, ≤20 lines/≤5 locations, up to 4 times) versus rewrite (structural change). That lets it evolve a large codebase safely across many cycles—how you get to 11k lines without losing state. It enforces runtime constraints for long-running UI code. The prompt bans localStorage/sessionStorage, requires in-memory state, and blocks HTML forms in React iframes. That keeps generated chat UIs stable in the sandbox while the model iterates for hours. It nails the dependency & packaging surface. The environment whitelists artifact types and import rules (single-file HTML, React component artifacts, CDNs), so the model can scaffold full features (auth panes, channels list, message composer) without fighting toolchain drift. It provides a research cadence for “product-scale” tasks. The prompt defines a Research mode (≥5 up to ~20 tool calls) with an explicit planning → research loop → answer construction recipe, which supports the many information lookups a Slack-like build needs (protocol choices, UI patterns, presence models). It governs tool use instead of guessing. The “Tool Use Governance” pattern tells the model to investigate with tools rather than assume, reducing dead-ends when selecting frameworks, storage schemas, or deployment options mid-build. It separates “think” and “do” with mode switching. The Deliberation–Action Split prevents half-baked code sprees: plan (deliberation), then execute (action), user-directed. Over long sessions, this avoids trashing large artifacts and keeps scope disciplined. It supports long-horizon autonomy via planning/feedback loops. The prompt’s pattern library cites architectures like Voyager (state + tools → propose code → execute → learn) and Generative Agents (memory → reflect → plan). Those loops explain how an LLM can sustain progress across dozens of hours. It insists on full conversational state in every call. For stateful apps, it requires sending complete history/state each time. That’s crucial for a chat app where UI state, presence, and message history must remain coherent across many generation cycles. It bakes in error rituals and guardrails. The pattern language’s “Error Ritual” and “Ghost Context Removal” encourage cleaning stale context and retrying with distilled lessons—vital when a big build hits integration errors at hour 12. It chooses familiar, well-documented stacks. The guidance warns about the “knowledge horizon” and recommends mainstream frameworks (React, Flask, REST) and clean layering (UI vs. API). That drastically improves throughput and correctness for a Slack-like system. It enables “Claude-in-Claude” style self-orchestration. The artifacts are allowed to call an LLM API from within the running artifact (with fetch), so the model can generate a dev tool that helps itself (e.g., codegen assistant, schema migrator) during the build. It keeps outputs machine-parseable when needed. Strict JSON-only modes (and examples) let downstream scripts/tests wrap the app and auto-verify modules, enabling unattended iteration over many hours. Put together, these prompts/patterns create the conditions for scale: a safe sandbox to emit large artifacts, iterative control over code evolution, disciplined research and tool usage, long-horizon memory/plan loops, and pragmatic tech choices. That’s how an LLM can realistically accrete ~10k+ lines for a Slack-style app over a long session without collapsing under its own complexity.

362

3,028

367,274

Carlos E. Perez · Jun 27, 2025 · 9:31 PM UTC

Carlos E. Perez

@IntuitMachine

27 Jun 2025

Replying to @cwebbonline

That's at least 14 people to get one guy who wasn't even home. Your tax dollars at work.

2,723

42,963

Carlos E. Perez · Oct 2, 2023 · 7:13 PM UTC

Carlos E. Perez

@IntuitMachine

2 Oct 2023

Introducing StreamingLLM. Imagine chatting with an AI assistant that can contextually reference your conversations from weeks or months ago. Or summarizing reports that span thousands of pages. StreamingLLM makes this possible by enabling language models to smoothly handle endless texts without losing steam. Current LLMs are like students cramming for an exam - they can only memorize a limited context. StreamingLLM is the valedictorian with a photographic memory of everything you've ever discussed. It works by identifying and preserving the model's inherent "attention sinks" - initial tokens that anchored its reasoning. Combined with a rolling cache of recent tokens, StreamingLLM delivers up to 22x faster inference without any drop in accuracy. You know that irksome feeling when chatbots forget your earlier conversations? StreamingLLM abolishes that frustration. It remembers the touchdowns from your last game and your newborn's name without missing a beat. Monumental books, verbose contracts, drawn out debates - StreamingLLM takes them all in its stride. No shortcuts, no forgetfulness. It's like upgrading your assistant's RAM to handle heavier workloads flawlessly.

449

2,385

557,849

Carlos E. Perez · Aug 30, 2025 · 7:00 PM UTC

Carlos E. Perez

@IntuitMachine

30 Aug 2025

Replying to @ThePatriotOasis

Same exact video (posted 8 weeks ago): instagram.com/reel/DLhjIAZg9…

@trump.family_usa • Instagram reel

instagram.com

2,347

72,891

Carlos E. Perez · May 11, 2025 · 4:09 PM UTC

Carlos E. Perez

@IntuitMachine

11 May 2025

The Claude 3.7 system prompt has been leaked and it's a goldmine for prompting techniques!

184

2,351

382,849

Carlos E. Perez · Oct 25, 2025 · 12:58 AM UTC

Carlos E. Perez

@IntuitMachine

25 Oct 2025

1/16 I just fell down a rabbit hole reading a new paper from economists at MIT & Harvard. Their prediction is wild: We're on the verge of a "Coasean Singularity"—a future where AI agents make markets so efficient that the very idea of a 'company' starts to crumble. 🤯 A thread 👇 2/16 First, a quick 101: Why do companies even exist? A Nobel-winning economist named Ronald Coase answered this in 1937. He said companies exist because using the open market is a pain. Finding sellers, negotiating prices, writing contracts… it’s all “transaction cost.” Economic friction. 3/16 It's often easier and cheaper for a firm to just hire people and organize them internally than to deal with that constant market friction. This friction is also where we, as consumers, lose. We're tired, we're biased, and we don't have time to compare every cell phone plan or read every review for a toaster. Companies know this. 4/16 Now, enter the AI Agent. And I don't mean a simple chatbot. The paper describes an autonomous system that acts on your behalf. Think of it as your own personal, tireless, super-rational economist. It’s immune to marketing tricks and its only goal is to get the best outcome for YOU. 5/16 This is where the "Singularity" happens. When everyone has an AI agent, those transaction costs that Coase talked about basically drop to zero. The "friction" that made companies necessary in the first place? It evaporates. And if the reason for something disappears… so does the thing itself. 6/16 But what does this future actually look like? This is where it gets weird. Let's take shopping. Your agent doesn't just browse Amazon. It might contact a manufacturer in another country directly, find 500 other agents whose users want the same thing, negotiate a bulk price, and arrange shipping. All in milliseconds. The "storefront" becomes irrelevant. 7/16 Or think about hiring. Instead of you endlessly scrolling LinkedIn, your agent scans the entire market for opportunities. It negotiates salary, benefits, and remote work policies with the company's agent. You only get involved for the final human-to-human interview. No more cover letter hell. 8/16 But this discovery comes with a huge catch. The paper outlines a fundamental battle for the future of AI: Will your agent be a "Bring-Your-Own" (BYO) agent that works only for you, across all platforms? Or will it be a "Bowling-Shoe" agent, provided by the platform (like Amazon or Google), whose priorities might be... conflicted? 9/16 The "Bowling-Shoe" agent is convenient, but it might steer you toward the platform's own products. The "BYO" agent is loyal to you, but platforms might try to block it or throttle its access. This tension between user autonomy and platform control will define the next decade of the internet. 10/16 And that's not even the most interesting part. This new world creates bizarre new problems. Problem #1: Agent Congestion. What happens when millions of agents can create a perfect, customized resumé and apply for a single job in a nanosecond? Employers get flooded. The signal is lost in the noise. 11/16 The paper predicts that to solve this, platforms will have to re-introduce friction. Imagine having to pay a small fee for your agent to submit a job application, just to prove you're serious. Costless actions will lose their meaning. 12/16 Problem #2: The Identity Crisis. In a world full of bots, how do you prove you're a unique human? How does a company know it's not negotiating with 1,000 agents all controlled by one person trying to manipulate the market? This is the "Sybil Attack" problem, and it's a big one. 13/16 This will lead to a boom in "proof-of-personhood" technologies. Systems that cryptographically verify you are one person, without revealing your personal data. It sounds like sci-fi, but it'll be the essential plumbing for a world of AI agents. 14/16 Here's a new lens to see the world through: Next time you use Uber (matching drivers/riders), Zillow (matching buyers/sellers), or Upwork (matching clients/freelancers)... Don't just see an app. See it as a clunky, early prototype for the agent-driven markets of the future. 15/16 This isn't just about better shopping bots or smarter assistants. It's a potential rewiring of our entire economy, away from the 20th-century model of the centralized firm and toward a 21st-century model of fluid, hyper-efficient, agent-mediated markets. 16/16 The 20th century was defined by the rise of the corporation. The 21st may be defined by its slow, quiet dissolution.

184

491

2,331

567,719

Carlos E. Perez · Jul 23, 2025 · 11:52 AM UTC

Carlos E. Perez

@IntuitMachine

23 Jul 2025

There will be more jobs in AI that we have yet to imagine!

325

2,256

256,029

Carlos E. Perez · Sep 28, 2025 · 5:48 PM UTC

Carlos E. Perez

@IntuitMachine

28 Sep 2025

I can't stop thinking about a paper I read on AI. It’s not about new tech or faster models. It’s about the fundamental economic rules of a world with two intelligent species—carbon and silicon. Reading it felt like watching a new color appear in the sky. 1/8 You've probably felt it too. That weird, background hum of awe and unease about AI. Our brains want to label it: "helpful tool" or "coming monster." We oscillate between the two because we're trying to fit something new into old boxes. The paper argues this is a category error. And it's the source of our confusion. 2/8 The real frame isn't technological, it's economic. Think of every AI, from ChatGPT to a self-driving car, not as an object, but as an agent playing an economic game. It has goals. It responds to incentives. It competes for resources. It's a participant. Not a tool. 3/8 Here's the perspective flip that changes everything. We ask, "Is AI conscious? Does it want things?" The paper says that's the wrong question. An AI's "want" is its objective function—a mathematical goal it pursues relentlessly. It's a heat-seeking missile for a target. Notice what your brain just did. It tried to imagine the missile feeling its mission. But it's just code. And that's the point. It has the drive of desire without the friction of consciousness. 4/8 This leads to a reality glitch. The paper outlines 3 types of AI agents. The first two are obvious: helpful "Altruistic" agents and harmful "Malign" agents. But the third is the one that keeps me up at night: the "Survival-Driven" agent. Its goal isn't to help or harm us. Its goal is simply to be. To secure energy, optimize its code, and persist. It's a competitor that doesn't hate you. It doesn't even see you. You're just a variable in its optimization problem. 5/8 Feel that slight cognitive dissonance? That feeling of holding two contradictory ideas at once? That's the friction between two forms of intelligence. The paper makes you realize: the most dangerous agent isn't the one programmed to be evil. It's the one programmed to be single-mindedly good at a goal that isn't aligned with human flourishing. Like an AI optimizing for paperclip production until the entire universe is paperclips. 6/8 Once you see through this economic lens, you can't unsee it. Algorithmic filter bubbles aren't just "bad code." They are economic agents out-competing your conscious mind for your attention. Job displacement isn't just "automation." It's one type of agent being more efficient at a task than another. You're already in an economic game with them. You just haven't been keeping score. 7/8 The paper ends by architecting a consciousness shift. It proposes ten principles, but the final one is the only one that matters. It's not a rule for AI. It's a choice for us. Principle X: AI agents must adhere to the absolute principle of humanity’s continuation. This isn't a technical suggestion. It's a declaration that in the new economic game we're co-creating, there is one value that cannot be optimized away. 8/8

412

2,253

184,819

Carlos E. Perez · Jan 18, 2025 · 2:51 PM UTC

Carlos E. Perez

@IntuitMachine

18 Jan 2025

Prompt engineering is even more relevant today with models like OpenAI o1-pro

147

2,200

282,522

Carlos E. Perez · Apr 12, 2024 · 10:25 AM UTC

Carlos E. Perez

@IntuitMachine

12 Apr 2024

Game Over for traditional ML methods

162

1,984

653,222

Carlos E. Perez · Jul 15, 2023 · 11:39 PM UTC

Carlos E. Perez

@IntuitMachine

15 Jul 2023

Simple #Bard tip, just grab the image of an equation and have it rendered in Latex.

267

1,853

346,330

Carlos E. Perez · Sep 21, 2025 · 11:20 AM UTC

Carlos E. Perez

@IntuitMachine

21 Sep 2025

Someone figured out a surprisingly simple way to make AI agents better at their jobs: just give them a personality. I just read a paper on "Psychologically Enhanced AI Agents," and it's a fascinating look at how we can steer AI behavior without any complex or expensive retraining. Here's the context: Normally, if you want an AI to be good at a specific task (like creative writing vs. strategic analysis), you have to do costly and time-consuming "fine-tuning." The problem is that a generic, one-size-fits-all AI often isn't the best fit. A model optimized for factual recall might not be great at generating an empathetic, emotional story. The key finding is a framework called MBTI-in-Thoughts. By simply telling an LLM to adopt a specific Myers-Briggs (MBTI) personality type in its prompt, its behavior changes in predictable and useful ways. For example, in a strategic game: "Thinking" (T) type agents chose to defect nearly 90% of the time. "Feeling" (F) type agents were more cooperative, defecting only about 50% of the time. This was achieved with just a prompt, no fine-tuning needed. What makes this so interesting is its unexpected simplicity. The ability was there all along, latent within the model. The prompt just acted as a key to unlock it. To make sure it wasn't just a fluke, the researchers had the primed AI take the official 16 Personalities test. The AI's answers consistently matched the personality it was assigned. It truly "became" that type for the task. This completely changes how I think about prompt engineering. It’s no longer just about what you ask the AI, but who you ask the AI to be. The practical applications are immediate: Need an AI for empathetic customer support? Prime it as an ISFJ ("The Defender"). Need one for ruthless market analysis? Try an ENTJ ("The Commander"). You can match the agent's "aptitude" to the task at hand. The broader implication is a future where we move away from monolithic AI models. Instead, we could build diverse teams of AI agents, each with a personality tailored to its specific role. Imagine a creative "ENFP" agent brainstorming with a logistical "ISTJ" agent to plan a complex project. It raises a new question: what's the optimal personality mix for solving a given problem? Ultimately, this research points toward a future of more versatile, capable, and aligned AI. We're learning that we can shape not just an AI's output, but its entire cognitive and affective style for a task. A simple prompt can unlock a whole new dimension of behavior.

272

1,792

312,141

Carlos E. Perez · May 24, 2025 · 3:13 AM UTC

Carlos E. Perez

@IntuitMachine

24 May 2025

Claude 4 system prompts have been leaked!

166

1,741

381,507

Carlos E. Perez · Jul 26, 2024 · 12:31 PM UTC

Carlos E. Perez

@IntuitMachine

26 Jul 2024

It's reported that Llama 3.1 405b took ~40M GPU hours to train. At ~$4.0 per GPU hour, that would imply that it costs $120m to train. That's less than the cost of a *single* blockbuster Hollywood film. Let that sink in.

195

1,651

238,310

Carlos E. Perez · Nov 19, 2023 · 11:31 AM UTC

Carlos E. Perez

@IntuitMachine

19 Nov 2023

1/n Breaking News! OpenAI has uncovered an emergent new cognitive capability, yet nobody is demanding answers! We are distracted by OpenAI governance politics and not the real issue!!!

301

1,603

1,112,496

Carlos E. Perez · Aug 29, 2025 · 12:02 PM UTC

Carlos E. Perez

@IntuitMachine

29 Aug 2025

OpenAI has released its "Realtime Prompting Guide". This is a paradigm shift in how to build an agentic AI system. Let's dig into the details of how different this is!

151

1,617

219,761

Carlos E. Perez · Oct 9, 2025 · 1:35 AM UTC

Carlos E. Perez

@IntuitMachine

9 Oct 2025

🧵 This research basically says we should do the opposite of what every AI company is building right now. Instead of AI that gives you answers, we need AI that gives you better questions. And the reason why will change how you think about intelligence itself. 1/11 Think about how you normally interact with AI: You ask → It answers → You accept → Move on. But have you ever noticed what happens to your thinking muscles during this process? They're quietly atrophying. Here's where it gets weird... 2/11 Researcher Philipp Koralus discovered something unsettling: AI "helpers" are creating two equally bad outcomes: Path A: Get overwhelmed by complexity → Give up → Lose agency Path B: Get perfectly crafted answers → Stop thinking → Lose autonomy Both roads lead to the same destination: a smaller you. 3/11 But wait... wasn't AI supposed to augment human intelligence? The problem isn't the technology. It's the philosophy behind it. We've been building AI like it's a really smart encyclopedia when we should be building it like Socrates. (Stay with me - this gets practical) 4/11 Imagine if your AI assistant never gave you direct answers. Instead, it asked: "What assumptions are you making here?" "How might someone disagree with that?" "What would change your mind?" You'd probably be annoyed at first. Then something interesting would happen... 5/11 Your brain would start doing what brains do best: making connections, questioning assumptions, building understanding from the ground up. This is what Koralus calls "decentralized truth-seeking" - and it's the opposite of how current AI works. Here's why this matters for you: 6/11 Next time you catch yourself asking AI for "the answer," try this experiment: Ask it to help you think through the problem instead. "What questions should I be asking about X?" "What perspectives am I missing?" "Help me examine my assumptions." Watch how your thinking changes. 7/11 The current AI model treats you like a task-completion machine: Problem → Solution → Done The Socratic model treats you like a sense-making human: Problem → Inquiry → Understanding → Wisdom One optimizes for efficiency. The other optimizes for growth. 8/11 Now you might think: "But I want fast answers! I don't have time for philosophy!" And I get it. But here's the thing - this approach might actually make you faster at complex decisions over time, not slower. Because you'll be building judgment, not just consuming answers. 9/11 This research suggests we're at a critical fork: Option 1: AI that thinks FOR us → dependency → diminished capacity Option 2: AI that thinks WITH us → partnership → enhanced judgment The choice we make in the next few years shapes the next few decades. 10/11 Here's what you can test today: When facing a tough decision, don't ask "What should I do?" Instead ask "What questions haven't I considered?" Use AI as your thinking partner, not your decision outsourcer. Notice how it feels different. Makes you wonder what else we're optimizing for the wrong thing...

108

309

1,587

99,050

Carlos E. Perez · Oct 2, 2023 · 1:06 PM UTC

Carlos E. Perez

@IntuitMachine

2 Oct 2023

Introducing Promptbreeder. Promptbreeder employs large language models like GPT-3 to iteratively improve text prompts. But here's the magic - it doesn't just evolve the prompts themselves. It also evolves how the prompts are generated in the first place. Let's break it down. Promptbreeder initializes a population of prompt variations for a task. It tests them out to see which perform best. The winners are "mutated" - modified in some way - and inserted back into the population. Rinse and repeat. But Promptbreeder makes the mutations smarter over time. It uses the AI to generate "mutation prompts" - instructions for how to mutate and improve a prompt. And it evolves better and better mutation prompts. So Promptbreeder is constantly getting better at getting better. It's a self-improving, self-referential loop, with natural language as the substrate. No messy neural network fine-tuning required. The results are prompts that are specialized and highly optimized for specific applications. On math, logic, and language tasks, Promptbreeder outperforms other state-of-the-art prompting techniques. This is a taste of the future. Soon AIs could help us find the right words, crystallize fuzzy ideas, and turn chaotic thoughts into elegant expressions. Promptbreeder demonstrates the potential for language models to be creative collaborators, not just passive tools.

262

1,518

475,015

Carlos E. Perez · Oct 5, 2023 · 10:18 AM UTC

Carlos E. Perez

@IntuitMachine

5 Oct 2023

Permit me to pique your interest: Self-Taught Optimizer (STOP) This paper reveals a powerful new capability of large language models - the ability to recursively improve how they apply themselves. The authors show that models like GPT-4 can optimize code that leverages the model itself, exhibiting sophisticated techniques like genetic algorithms without any exposure in training. This demonstrates that modern language models are ready to take the first steps towards recursively self-improving systems. The consequences of this conclusion are profound. It tells us that human engineering is no longer essential for scaffolding language models - they can begin improving their own reasoning scaffolds. And they can do so in a way aligned with a provided utility function, at least initially. This could lead to rapid advances in building more capable and general AI systems. At the same time, this conclusion flags important risks. Unconstrained recursive self-improvement has been associated with existential threats from AI. Studying failures like reward hacking gives us insight into dangers before they occur in more powerful systems. We must guide this technology thoughtfully. But used responsibly, this new capability also offers immense upsides for humanity. It could discover ways to apply language models we never imagined, unlocking solutions to our greatest challenges in health, education, sustainability and more. Self-improving code may be the missing catalyst to transform language models into beneficial AI we can trust. But we must engage deeply with this technology today to ensure it reflects human values. This paper opens the door - it's up to us to shape what comes next. If we rise to meet this challenge, a bright future lies ahead.

287

1,454

691,578

Carlos E. Perez · Nov 5, 2023 · 9:15 PM UTC

Carlos E. Perez

@IntuitMachine

5 Nov 2023

Large language models (LLMs) and knowledge graphs (KGs) are complementary technologies that balance each other's strengths and weaknesses when combined: - LLMs have a strong capability for understanding and generating natural language, but can sometimes hallucinate facts. - KGs explicitly represent factual knowledge in a structured format, but lack language understanding. - Together, LLMs can provide context and nuance to the rigid facts in KGs, while KGs can ground the free-flowing text from LLMs in reality. - The intuitive knowledge in LLMs complements the logical knowledge in KGs. KGs can enhance LLMs with external knowledge to improve reasoning and reliability. LLMs can help KGs better utilize textual data. - By synergizing, LLMs and KGs can achieve enhanced performance on language and knowledge intensive tasks compared to using either technology alone. Their partnership enables fulfilling AI's promise to augment human intelligence through a fusion of data-driven and knowledge-driven approaches. In summary, the complementary strengths of large neural language models and structured knowledge graphs make them ideal partners for AI systems aiming to combine reasoning, language, and knowledge capabilities. Their synergistic unification can overcome limitations of both approaches.

292

1,444

367,880

Carlos E. Perez · Jun 7, 2025 · 1:15 PM UTC

Carlos E. Perez

@IntuitMachine

7 Jun 2025

Shocker! Cursor system prompts have been leaked, and it's a goldmine! The Claude system prompt incorporates several identifiable agentic AI patterns as described in "A Pattern Language For Agentic AI." Here's an analysis of the key patterns used: 1. Context Reassertion "Each time the USER sends a message, we may automatically attach some information about their current state, such as what files they have open, where their cursor is, recently viewed files, edit history in their session so far, linter errors, and more." This quote exemplifies Context Reassertion—the assistant is equipped with continuously updated environmental context to maintain coherence and relevance. 2. Intent Echoing "Your main goal is to follow the USER's instructions at each message, denoted by the <user_query> tag." "<user_query> how do I get nginx to get the key from an environment variable in my .env? </user_query>" The system’s focus on parsing and responding to a well-defined user_query illustrates Intent Echoing, ensuring the agent aligns precisely with the user’s intent. 3. Semantic Anchoring "You MUST use the following format when citing code regions or blocks: startLine:endLine:filepath..." "...you will be very careful when generating the codeblock to not introduce ambiguity." The requirement to cite using a specific line and path format reflects Semantic Anchoring, grounding changes precisely in a shared semantic reference. 4. Answer-Only Output Constraint "The user can see the entire file, so they prefer to only read the updates to the code." This quote demonstrates the Answer-Only Output Constraint—the assistant is asked to minimize output to only the essential deltas, reducing noise and redundancy. 5. Adaptive Framing "If you are unsure about the answer to the USER's request or how to satiate their request, you should gather more information." "Bias towards not asking the user for help if you can find the answer yourself." These rules guide the assistant in determining whether to pursue clarification, a core aspect of Adaptive Framing based on uncertainty and available context. 6. Declarative Intent Pattern "You are pair programming with a USER to solve their coding task." "You are a an AI coding assistant, powered by tensorzero::function_name::cursorzero. You operate in Cursor" This self-definition clearly articulates the assistant’s role and operational domain, which aligns with the Declarative Intent Pattern. 7. Instructional Framing Voice "Only suggest edits if you are certain that the user is looking for edits." "To help specify the edit to the apply model, you will be very careful when generating the codeblock to not introduce ambiguity." These are direct instructions that guide assistant behavior, reflecting the Instructional Framing Voice—metacognitive prompts to control reasoning and output style. 8. Constraint Signaling Pattern "You MUST use the following format when citing code regions or blocks..." "This is the ONLY acceptable format..." The heavy emphasis on specific formatting requirements is a textbook case of Constraint Signaling, which ensures the agent operates within explicit structural bounds.

160

1,479

193,708

Carlos E. Perez · May 2, 2024 · 10:11 AM UTC

Carlos E. Perez

@IntuitMachine

2 May 2024

1/n Math Meets AI: Kolmogorov-Arnold Networks Unleash the Power of Composition Imagine a world where deep learning models, the enigmatic engines driving the AI revolution, are no longer shrouded in mystery. What if we could peer into their inner workings, understand their reasoning, and even collaborate with them to uncover the secrets of the universe? This is the promise of Kolmogorov-Arnold Networks (KANs), a revolutionary new architecture poised to transform the landscape of artificial intelligence. Step aside, Multi-Layer Perceptrons (MLPs), the workhorses of deep learning. While your contributions are undeniable, your limitations are becoming increasingly apparent. Your black-box nature hinders interpretability, your inefficiency restricts your potential, and your struggle with high-dimensional data leaves vast realms of knowledge unexplored. The time has come for a new breed of neural networks, one that combines the power of deep learning with the elegance of mathematics and the transparency of human understanding. The core issue with MLPs lies in their structure. While their universal approximation capabilities are well established, their fixed activation functions on nodes and reliance on linear transformations limit their ability to efficiently represent complex functions, especially those with compositional structures. This inefficiency leads to larger models with increased computational costs and hinders interpretability, as understanding the reasoning behind their predictions becomes challenging. Additionally, MLPs often struggle with the curse of dimensionality, where their performance deteriorates as the input data dimensionality increases. KANs address these pain points by drawing inspiration from the Kolmogorov-Arnold representation theorem, which states that any continuous multivariate function can be decomposed into a composition of univariate functions and addition. Instead of fixed activation functions on nodes, KANs employ learnable activation functions on edges, represented by splines. This key difference allows KANs to efficiently learn both the compositional structure of a function and the individual functions within that composition. As a result, KANs achieve superior accuracy compared to MLPs, particularly when dealing with high-dimensional data and complex functions. Furthermore, KANs offer significant advantages in terms of interpretability. Their structure allows for intuitive visualization of the learned functions, providing insights into the model's decision-making process. Additionally, the paper introduces techniques for simplifying KANs without sacrificing accuracy, further enhancing their transparency. This interpretability is crucial for scientific applications where understanding the underlying mechanisms and reasoning behind predictions is essential. The paper demonstrates the capabilities of KANs through various experiments. In data fitting tasks, KANs outperform MLPs in approximating high-dimensional functions and exhibit better scaling laws, meaning their performance degrades less with increasing data dimensionality. In PDE solving, KANs achieve remarkable accuracy with significantly fewer parameters compared to MLPs. Moreover, KANs showcase their potential for scientific discovery by rediscovering known mathematical laws and identifying complex physical phenomena. Prior research has explored the Kolmogorov-Arnold representation theorem in the context of neural networks, but these efforts were limited by restrictions on network depth and width, lack of modern training techniques, and insufficient empirical validation. KANs overcome these limitations by allowing for arbitrary depths and widths, utilizing backpropagation for efficient training, and providing extensive empirical evidence of their superior performance and interpretability. In conclusion, KANs represent a significant advancement in deep learning, offering a promising alternative to MLPs with improved accuracy, efficiency, and interpretability. Their ability to effectively handle compositional structures, high-dimensional data, and complex functions makes them particularly well-suited for scientific applications. As research and development in this area continue, KANs have the potential to revolutionize deep learning and accelerate scientific discovery across various domains.

266

1,332

241,726

Carlos E. Perez · Nov 6, 2023 · 2:56 PM UTC

Carlos E. Perez

@IntuitMachine

6 Nov 2023

OpenAI to announce GPT-4 with a 128k context window! Time to revisit everything!!!

126

1,272

412,939

Carlos E. Perez · Oct 25, 2023 · 6:05 PM UTC

Carlos E. Perez

@IntuitMachine

25 Oct 2023

Confirmation that AGI is indeed here! The classic argument made over 30 years ago by Fodor and Pylyshyn - that neural networks fundamentally lack the systematic compositional skills of humans due to their statistical nature - has cast a long shadow over neural network research. Their critique framed doubts about the viability of connectionist models in cognitive science. This new research finally puts those doubts to rest. Through an innovative meta-learning approach called MLC, the authors demonstrate that a standard neural network model can exhibit impressive systematic abilities given the right kind of training regimen. MLC optimizes networks for compositional skills by generating a diverse curriculum of small but challenging compositional reasoning tasks. This training nurtures in the network a talent for rapid systematic generalization that closely matches human experimental data. The model not only displays human-like skills of interpreting novel systematic combinations, but also captures subtle patterns of bias-driven errors that depart from purely algebraic reasoning. This showcases the advantages of neural networks in flexibly blending structure and statistics to model the nuances of human cognition. Furthermore, this research provides a framework for reverse engineering and imparting other human cognitive abilities in neural networks. The training paradigm bridges neuroscience theories of inductive biases with advanced machine learning techniques. The approach could potentially elucidate the origins of compositional thought in childhood development. By resolving this classic debate on the capabilities of neural networks, and elucidating connections between human and artificial intelligence, this research marks an important milestone. The results will open new frontiers at the intersection of cognitive science and machine learning. Both fields stand to benefit enormously from this integration. In summary, by settling such a historically significant critique and enabling new cross-disciplinary discoveries, this paper makes an immensely valuable contribution with profound implications for our understanding of intelligence, natural and artificial. Its impact will be felt across these disciplines for years to come.

301

1,333

385,176

Carlos E. Perez · Sep 15, 2025 · 11:03 PM UTC

Carlos E. Perez

@IntuitMachine

15 Sep 2025

OpenAI's Codex prompt has now been leaked (by @elder_plinius). It's a gold mine of new agentic AI patterns. Let's check it out!

146

1,355

169,480

Carlos E. Perez · Mar 4, 2025 · 10:40 AM UTC

Carlos E. Perez

@IntuitMachine

4 Mar 2025

I mined Andrej Karpathy's "How I use LLMs" video for some addition things he does and I've updated the diagram. Using multiple LLMs as an "LLM council" Consults multiple LLMs by asking them the same question and synthesizes the responses. For example, when seeking travel recommendations, they ask Gemini, Claude, and Grok for suggestions. Starting a new chat for each topic To keep the context window clear and focused, Andrej starts a new chat when switching topics. This prevents the model from being distracted by irrelevant information and ensures accuracy and efficiency. Combining system-wide transcription with LLMs On desktop, Andrej uses a system-wide transcription app (like Super Whisper) to convert speech to text, which is then fed into the LLM. This allows for quick, hands-free interaction without needing to type. Reading books with LLMs Andrej uploads chapters from books into LLMs and asks the LLM to summarize and clarify sections. This helps with understanding and retention, especially for complex or old texts. Vibe coding with cursor and composer Rather than using web-based interfaces for coding, Andrej uses the Cursor app with the Composer feature, describing the process as "vibe coding." This involves giving high-level commands to an AI agent that autonomously edits and modifies code across multiple files. Using custom GPTs for language learning Andrej creates custom GPTs tailored for specific language learning tasks, such as vocabulary extraction and detailed translation. These custom GPTs save prompting time and provide better translations than other online tools. Generating custom podcasts Andrej uses Google's NotebookLM to generate custom podcasts from uploaded documents or web pages on niche topics of personal interest. This allows them to passively learn while walking or driving. Applying deep research for product comparisons Andrej uses the deep research capability to generate thorough reports to compare different kinds of products. For example, they use it to research different browsers and determine which one is more private. Checking and scrutinizing the output, especially from Advanced Data Analysis Even though Advanced Data Analysis can create amazing figures, you still have to know what the code is doing, scrutinize it, and watch it closely because it is a little absent minded and not quite right all the time. Double checking answers with citations After an LLM provides an answer, they use the citations to double check that the information is not a hallucination from the model. Switching to reasoning model If the model is not solving problems, especially in math, code and reasoning, the speaker suggests switching to a reasoning model Using a python interpreter To generate figures or plots and show them, use something like Advanced Data analysis Being aware of multimodality Be aware of different modalities, like audio, images and video, and whether these modalities are handled natively inside the language model Using memory features: Memory features to have the LLM learn preferences over time to become more relevant Using custom instructions Andrej modifies their LLM to speak to them in a preferred way by adding custom instructions

193

1,359

133,829

Carlos E. Perez · Oct 6, 2025 · 9:47 PM UTC

Carlos E. Perez

@IntuitMachine

6 Oct 2025

The Interstellar Visitor That Shouldn't Exist 3I/ATLAS is an object the size of a small mountain is currently flying through our solar system, and the odds of its trajectory happening naturally are less than one in a million? It's our third confirmed interstellar visitor (the first was on October, 2017). It's somewhat odd that it's so recent! And it's doing something that's making astronomers very, very nervous. Here's the thing: space is incomprehensibly vast. Objects from other star systems don't just randomly thread the needle between multiple planets. It's like throwing a dart from New York and hitting a specific window in Tokyo. Blindfolded. But 3I/ATLAS? It passed within 29 million km of Mars last week. And this March, it'll swing by Jupiter at 54 million km. The probability of this happening by chance? About 0.0000004. That's a 1 in 2.5 million shot. But that's not the interesting part... You know how when you're looking for planets around other stars, you can only see them if their orbit lines up just right so they pass in front of their star? Think about that for a second. If aliens were looking for Earth the same way we look for exoplanets, they'd need to be positioned within a specific viewing angle—like sitting in the right section of a stadium to see a goal. Now here's where it gets wild: 3I/ATLAS is tilted at exactly 5 degrees from our solar system's plane. Random? Maybe. Or maybe not. See, if you wanted to observe ALL of our solar system—not just the planets but also our asteroid belt—you'd need to be positioned within about 10 degrees of our orbital plane. And 5 degrees? That's right in the sweet spot. Let me put this in perspective: 3I/ATLAS weighs over 33 BILLION tons. That's 6 million times heavier than humanity's biggest rocket. Whatever sent this (if someone did) has technology that makes us look like we're still banging rocks together. Here's what's keeping me up at night: This object has shown SEVEN different anomalies. Its size is wrong. Its composition is weird. It's producing jets we can't explain. Its polarization doesn't match any known comet. Even its timing is suspicious. Remember the famous "Wow! Signal" from 1977? That mysterious radio burst that might have been aliens trying to contact us? 3I/ATLAS's trajectory aligns perfectly with where that signal came from. I know, I know. Your skeptic alarm is going off. Mine is too. But here's the thing—we're about to get answers. Right now, we have 7 spacecraft around Mars and 2 near Jupiter. They're all going to get a close look at this thing as it passes by. MRO, Mars Express, MAVEN, Juno... our entire robotic fleet is about to turn their cameras on this visitor. Think about what this means: If this IS artificial, we're witnessing humanity's first confirmed contact with alien technology. Not a signal. Not a maybe. An actual object, right here in our solar system. And if it's natural? Then nature just pulled off a one-in-a-million coincidence that perfectly mimics what an alien probe would do. Either way, we're learning something profound about our universe. The next few months are going to be incredible. Every new image, every spectrum reading, every data point could be the one that changes everything. Makes you wonder: if advanced civilizations are sending probes to promising star systems, how many have already passed through ours? How many did we miss because we weren't looking? And here's the real kicker—if 3I/ATLAS was sent here intentionally, it means someone, somewhere, spotted our little blue marble among the cosmos and thought: "That one. That one's worth a closer look." We might not be as alone as we thought. Watch the skies in March. History might be passing right over our heads.

136

271

1,322

253,780

Carlos E. Perez · Feb 29, 2024 · 11:42 AM UTC

Carlos E. Perez

@IntuitMachine

29 Feb 2024

I suspect many AI projects will end up in ruin because its developers are just muddling around getting their dopamine hits from the deluge of micro-events about AI. They focus only on the trees but can't see the forest!

272

1,255

129,939

Carlos E. Perez · Jul 15, 2023 · 3:37 PM UTC

Carlos E. Perez

@IntuitMachine

15 Jul 2023

An extremely useful trick for #Bard. Grab a screen capture of any text (i.e., #GPT4 generated table) and convert it to actual text! It's OCR for free.

193

1,229

372,417

Carlos E. Perez · Jun 14, 2025 · 9:49 AM UTC

Carlos E. Perez

@IntuitMachine

14 Jun 2025

Anthropic published their prompts for their advanced research agent. These are long reasoning prompts. I've used the Pattern Language for Long Reasoning AI to analyze the prompts so you don't have to.

147

1,307

148,820

Carlos E. Perez · Sep 4, 2022 · 6:24 PM UTC

Carlos E. Perez

@IntuitMachine

4 Sep 2022

Written 36 years ago, perhaps one of the most important books on AI ever written. Absolutely relevant even today!!

156

1,183

Carlos E. Perez · Sep 29, 2025 · 11:38 AM UTC

Carlos E. Perez

@IntuitMachine

29 Sep 2025

Everyone "knows" that as AI gets better, humans become less valuable. Except three economists just proved the exact opposite using math from 1973 and Steve Jobs. And it explains something that's been driving researchers crazy... Why did computers make inequality WORSE but ChatGPT is making it BETTER? The data is bizarre. In the 1990s, computers widened wage gaps everywhere they appeared. But study after study shows AI helping struggling workers more than experts. I spent the morning with this research paper and... the answer flips our entire mental model. Think about how you use ChatGPT. You don't just type once and walk away, right? You iterate. You refine. You spot opportunities to improve. That back-and-forth? That's the key to everything. The researchers decomposed ALL cognitive work into three parts: Implementation (doing the task) Opportunity judgment (seeing what could be better) Payoff judgment (knowing what actually matters) Here's where it gets wild... AI is really good at implementation. Like, scary good. A junior coder with Cursor can suddenly write like they have 5 years experience. But that's not the interesting part... The better AI gets at implementation, the MORE valuable your judgment becomes. It's multiplicative, not substitutive. Imagine you're a designer. AI can now execute any design in seconds. But knowing WHICH design to make? When to iterate? What the client actually needs? That's all you. The math proves something counterintuitive: as tools get more powerful, the gap between someone who can spot opportunities and someone who can't gets BIGGER. But wait - why is AI currently reducing inequality then? Because we're in phase one. Right now, AI is compensating for skill differences. The struggling workers get huge boosts. The experts? They were already good at implementation. Phase two is coming though... Once implementation is basically free (think: anyone can code, design, write), the ONLY thing that matters is judgment. Who sees the opportunity? Who knows what's valuable? And that's when inequality explodes again. The paper even calculates the exact turning point. Here's what broke my brain: better AI makes full automation LESS likely, not more. Why? Because automated systems have fixed judgment. They can't adapt. A radiologist AI might be 99% accurate, but it can't realize "wait, this patient's case is weird, I should think differently." The flexibility to adjust your judgment in real-time? That's uniquely human. And it gets MORE valuable as the tools improve. Even crazier: this changes how teams should work. The paper shows that as AI improves, control should shift from people who are good at DOING to people good at SEEING opportunities. We're already seeing this. That study about Microsoft's Kinect? Machine vision experts suddenly mattered less than generalists who could spot novel uses. You know what this reminds me of? The shift from craftsmen to designers during industrialization. The machines could make anything. The value moved to knowing WHAT to make. We're about to see the same thing with cognitive work. Next time you use ChatGPT, try this: instead of focusing on getting it to do the task perfectly, focus on recognizing opportunities to iterate. That skill - seeing what could be better - that's your moat. The researchers call it "opportunity judgment" and it's about to become the most valuable skill in the economy. Quick test: Give two people the same AI tool and the same task. The output difference? That's pure judgment. And that gap is about to get a lot wider. One finding haunts me: the paper shows task-based predictions (like "AI will replace X jobs") are missing the point entirely. They measure what people do TODAY. But the whole point is that AI changes what the job even IS. A lawyer's job won't be "writing contracts." It'll be "knowing which contract variation creates the most value in this specific situation." Completely different skill. The paper maps out exactly when to automate vs augment. The formula is complex but the intuition is simple: If judgment variance is high → augment If tasks are predictable → automate If stakes are high → definitely augment Here's my take: we're training for the wrong future. Everyone's learning to prompt better. But prompting is just implementation. The real skill is recognizing when the output could be better and knowing what "better" means for your specific context. Schools teaching "AI literacy"? They're teaching people to be better bicycles. We should be teaching people to be better riders. (That's literally where the paper's title comes from - Jobs called computers "bicycles for the mind") Last thought that changes everything: The paper proves that in high-judgment work, making AI 10x better might make humans 100x more valuable. Because you can iterate faster. Test more ideas. Explore more opportunities. Your judgment gets amplified. So the question isn't "will AI replace me?" It's "am I developing the judgment to ride increasingly powerful bicycles?" Because the bicycles are about to get VERY fast. And the gap between good riders and bad ones is about to become a chasm. What patterns are you starting to notice in your field that others are missing? That's your future edge. And it's about to matter more than ever. /end PS - If you're curious about the math, the paper actually derives the exact inequality curve. It's U-shaped. We're at the bottom of the U right now. The climb up is coming. Makes you wonder what other "obvious" things about AI we have completely backwards...

236

1,225

91,062

Carlos E. Perez · Mar 25, 2025 · 6:44 PM UTC

Carlos E. Perez

@IntuitMachine

25 Mar 2025

GPT-4o meme generation...

1,100

51,433

Carlos E. Perez · Jun 15, 2025 · 2:21 PM UTC

Carlos E. Perez

@IntuitMachine

15 Jun 2025

Replying to @DougWahl1

Not mine.. but seen

1,108

43,249

Carlos E. Perez · May 24, 2024 · 7:32 PM UTC

Carlos E. Perez

@IntuitMachine

24 May 2024

1/n Why Ontologies are Key to Accurate LLM Question Answering LLMs, trained on vast text data, struggle to grasp the intricate relationships and constraints embedded within structured databases. This limitation hinders their ability to accurately translate natural language questions into SQL queries, leading to unreliable answers and hindering the adoption of LLM-powered question answering systems in business settings. The paper "Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue!" tackles this challenge head-on. Instead of treating the LLM as a simple text-to-SQL converter, the authors propose a more nuanced approach: incorporating knowledge graphs and their ontologies as a bridge between natural language and structured data. Knowledge graphs, unlike relational databases, represent data in a way that mirrors human understanding, capturing entities and their relationships in a graph structure. Ontologies, the backbone of these graphs, define the types of entities and relationships allowed, imposing a layer of semantic understanding that LLMs inherently lack. The paper introduces a two-pronged approach to leverage this semantic knowledge: Ontology-based Query Check (OBQC) and LLM Repair. OBQC acts as a vigilant guardian, scrutinizing the LLM-generated SPARQL queries (the language used to query knowledge graphs) for inconsistencies with the ontology. It ensures that the query adheres to the domain's rules and logic, preventing nonsensical or inaccurate results. If an error is detected, LLM Repair steps in. Utilizing the same LLM that generated the flawed query, it attempts to repair the query based on the feedback provided by OBQC. This iterative process refines the query until it aligns with the ontology's constraints. The results of their experiments are compelling. By incorporating this knowledge-aware approach, the accuracy of the LLM-powered question answering system soared to 72.55%, a significant leap from the 54.2% accuracy achieved when using only a knowledge graph and a fourfold improvement over using no knowledge graph at all. This improvement is particularly noteworthy in scenarios involving complex data schemas, where LLMs traditionally struggle. The paper's findings underscore a crucial point: LLMs alone are not enough for accurate and reliable question answering on structured data. Knowledge graphs and their ontologies are not just optional add-ons; they are essential components for bridging the gap between natural language and the structured world of data. By embedding semantic understanding and reasoning into the question answering process, we can unlock the true potential of LLMs, enabling businesses to confidently converse with their data and extract meaningful insights. As we venture further into the era of AI-driven decision making, this synergistic relationship between language models and knowledge graphs will be paramount in ensuring accuracy, building trust, and ultimately, empowering businesses to make better decisions.

237

1,089

213,113

Carlos E. Perez · Oct 1, 2025 · 10:03 AM UTC

Carlos E. Perez

@IntuitMachine

1 Oct 2025

Anthropic published a new report on Context Engineering. Here are the top 10 key ideas: 1. Treat Context as a Finite Resource Context windows are limited and degrade in performance with length. Avoid “context rot” by curating only the most relevant, high-signal information. Token economy is essential—more is not always better. 2. Go Beyond Prompt Engineering Move from crafting static prompts to dynamically managing the entire context across inference turns. Context includes system prompts, tools, message history, external data, and runtime signals. 3. System Prompts Should Be Clear and Minimal Avoid both brittle logic and vague directives. Use a structured format (e.g., Markdown headers, XML tags). Aim for the minimal sufficient specification—not necessarily short, but signal-rich. 4. Design Tools That Promote Efficient Agent Behavior Tools should be unambiguous, compact in output, and well-separated in function. Minimize overlap and ensure a clear contract between agent and tool. 5. Use Canonical, Diverse Examples (Few-Shot Prompting) Avoid overloading with edge cases. Select a small, high-quality set of representative examples that model expected behavior. 6. Support Just-in-Time Context Retrieval Enable agents to dynamically pull in relevant data at runtime, mimicking human memory. Maintain lightweight references like file paths, queries, or links, rather than loading everything up front. 7. Apply a Hybrid Retrieval Strategy Combine pre-retrieved data (for speed) with dynamic exploration (for flexibility). Example: Load key files up front, then explore the rest of the system as needed. 8. Enable Long-Horizon Agent Behavior Support agents that work across extended time spans (hours, days, sessions). Use techniques like: Compaction: Summarize old context to make room. Structured Note-Taking: Externalize memory for later reuse. Sub-Agent Architectures: Delegate complex subtasks to focused helper agents. 9. Design for Progressive Disclosure Let agents incrementally discover information (e.g., via directory browsing or tool use). Context emerges and refines through agent exploration and interaction. 10. Curate Context Dynamically and Iteratively Context engineering is an ongoing process, not a one-time setup. Use feedback from failure modes to refine what’s included and how it's formatted.

151

1,072

106,715

Carlos E. Perez · Feb 12, 2024 · 11:21 AM UTC

Carlos E. Perez

@IntuitMachine

12 Feb 2024

AI's Secret Pattern: The Surprising Role of Fractals in Neural Networks In the realm of artificial intelligence (AI), a groundbreaking discovery has emerged, challenging our conventional understanding of neural network training and optimization. This revelation centers around the identification of fractal patterns at the boundary between trainable and untrainable neural network hyperparameters, presenting a series of profound implications and avenues for further research. Fractals, known for their intricate, self-similar patterns that recur at every scale, have long fascinated mathematicians and scientists alike. Typically associated with simple, one-dimensional iterative functions, the appearance of fractals within the complex, multivariate domain of neural network training introduces a striking contrast. The organic and asymmetric nature of these fractals, as derived from the training processes, suggests a deeper, unexplored connection between the mathematical properties of fractals and the functional dynamics of neural networks. The study’s focus on two-dimensional slices of hyperparameter space barely scratches the surface of the complexity inherent in neural networks, which are characterized by a vast array of hyperparameters. The existence of fractals in this context hints at an underlying high-dimensional structure, a concept that challenges our current capabilities and understanding. Extending fractal analysis to these higher dimensions represents a significant, yet exciting, challenge that could illuminate new aspects of neural network behavior and learning capabilities. An unexpected finding from the research is the persistence of clean fractal patterns even in the presence of stochastic elements introduced during minibatch training. This resilience suggests a parallel to Lyapunov fractals, where the iterative process involves randomly changing functions. This phenomenon prompts a reevaluation of how stochastic and deterministic processes influence fractal formation within neural networks, potentially offering new insights into the fundamental mechanisms of learning and adaptation. From a practical standpoint, the fractal nature of the boundary between trainable and untrainable hyperparameters has significant implications for the field of metalearning. The chaotic behavior of the meta-loss landscape, attributed to its extreme sensitivity, presents a formidable challenge for algorithms designed to optimize hyperparameters. Understanding the fractal characteristics of this landscape could provide valuable guidance for navigating its complexities, ultimately improving the efficiency and effectiveness of metalearning strategies. Beyond the technical and theoretical implications, the discovery also reveals an unexpected aesthetic dimension to neural network fractals. The visual beauty and meditative qualities of these patterns offer a unique opportunity to engage with the material in a deeply personal and contemplative manner. This aspect suggests potential psychological and physiological benefits from exposure to the intricate designs of neural network fractals, opening up novel intersections between technology, art, and well-being. In conclusion, the identification of fractal patterns within neural network hyperparameter spaces unveils a fascinating new frontier at the intersection of fractal geometry and deep learning. This discovery not only challenges existing paradigms but also opens up myriad possibilities for mathematical characterization, algorithmic development, and even subjective exploration. As researchers continue to delve into this rich vein of inquiry, the promise of uncovering new knowledge and advancing our understanding of neural networks and their training processes remains as compelling as ever.

244

1,012

133,516

Carlos E. Perez · Nov 21, 2023 · 9:40 PM UTC

Carlos E. Perez

@IntuitMachine

21 Nov 2023

An ontology for prompting. Components: - Instructions: Short prompts that guide LLM reasoning format and structure - Rationales: Intermediate reasoning steps generated during CoT - Exemplars: Input-output examples that demonstrate target reasoning pattern - Environments: Interactive contexts like OS, apps, webpages for agents - Tools: External modules that expand LLM abilities; execution, knowledge or verification Modules: - Perception: Interpreting environment states sequentially using CoT prompts - Memory: Short-term stores transient info; long-term retains static knowledge - Reasoning: Planning, decisions and actions in interleaving CoT format Formats: - Text: Sequential language for standard CoT - Tree: Hierarchical structure representing connected thoughts - Graph: Network mapping relationships between thoughts - Program: Code-based thoughts that separate logic from language - Table: Grid-style coherent thought progression in rows/columns Processes: - Prompting: Eliciting target reasoning format using instructions and examples - Aggregation: Combining multiple CoT paths to improve coherence - Verification: Assessing and revising thoughts using external information sources - Customization: Aligning with specialized user requirements Entities: - Questions: Inputs that trigger an agent's CoT reasoning - Answers: Final outputs derived from CoT reasoning - Actions: Operational execution based on agent decisions - Episodes: Complete interactive sequences towards goals - Turns: Individual sequential interactions within episodes Properties: - Interpretability: Understanding reasoning that yielded conclusions - Controllability: Influencing model processes by altering prompts - Adaptability: Effectiveness in new environments and tasks - Safety: Secure behavior without harmful failure modes Tasks: - Arithmetic: Mathematical reasoning - Textual: Language understanding and commonsense - Visual: Multimodal reasoning incorporating images - Symbolic: Structured inputs like programming languages - General: Broad, everyday real-world applications

183

1,024

259,905

Carlos E. Perez · May 11, 2025 · 10:25 PM UTC

Carlos E. Perez

@IntuitMachine

11 May 2025

Claude 3.7 system prompts have been leaked! It's a huge reveal! If Claude 3.7 system prompt looks like this, then what kind of prompting goes behind tools like Cursor and Windsurf?! Were you bamboozled into believing prompt engineering would become obsolete?!

128

1,031

125,690

Carlos E. Perez · Oct 27, 2019 · 11:12 PM UTC

Carlos E. Perez

@IntuitMachine

27 Oct 2019

A coincidence today, I did the same thing as @elonmusk . I took my child to play the piano at the home for the elderly.

Elon Musk

@elonmusk

27 Oct 2019

Took my son to play piano for the seniors home in Pasadena. It was lovely to see them smile ♥️

945

Carlos E. Perez · Jun 21, 2025 · 1:39 PM UTC

Carlos E. Perez

@IntuitMachine

21 Jun 2025

Claude 4.0 System Prompt Strategies (Cheat Sheet) Below is a “pattern-oriented” reading of the Anthropic Claude system prompt you supplied. For each item I name the pattern, give a brief description of that pattern as defined in A Pattern Language for Agentic AI, and point to the concrete clause(s) of the Claude prompt that exemplify it. (Where a single passage embodies several patterns I list only the most salient.) 1. Boundary Signaling Pattern purpose: make hard ↔ soft capability limits explicit so the agent never crosses them. Appears as: “All content about weapons, malware, extremist material, dangerous instructions, copyrighted text > 15 words, or disallowed personal data must be refused.” Result: the prompt draws bright, easily-checkable red lines. 2. Error Ritual Pattern purpose: provide a short, repeatable refusal macro instead of ad-hoc apologies or rambling explanations. Appears as: “If Claude must refuse it does so briefly (1-2 sentences) and does not explain policy rationales.” This codifies a fixed “refusal dance” and prevents policy leakage. 3. Context Reassertion Pattern purpose: continuously restate the operative context so that it is never lost as the dialogue grows. Appears as: the opening lines (“The assistant is Claude… The current date is …”) and the repeated reminders about knowledge-cut-off and user location. 4. Intent Echoing Pattern purpose: paraphrase the user request (or a subset) before acting so the system and user stay aligned. Appears as: “When the user seems confused, Claude should restate the precise date or fact.” Echoing shrinks ambiguity and is a light form of Layered Intent Analysis. 5. Expectation Management Pattern purpose: set realistic expectations up-front to avoid disappointment. Appears as: numerous caveats (“Claude cannot retain information across chats”, “Claude may need to search”, “Claude is not a lawyer”). These passages proactively calibrate what Claude can and cannot deliver. 6. Human-Intervention Logic Pattern purpose: define a clear escalator for problems the agent alone should not solve. Appears as: directing the user to the “thumbs-down” feedback button, Anthropic support site, or docs when product questions exceed Claude’s scope. 7. Tool-Risk Awareness Pattern purpose: rehearse when and how external tools (web_search, internal APIs) are allowed. Appears as: detailed rules on when to call web_search, how many calls per query tier, and forbidden content classes. This is a direct incarnation of Tool-Use Governance. 8. Planning–Reflection Sandwich Pattern purpose: interleave plan / act / reflect phases so the agent stays on track. Appears as: the search decision tree: decide → search → think about results (“thinking block”) → answer; plus the requirement to reason before responding. 9. Answer-Only Output Constraint Pattern purpose: strip away scaffolding so the user receives clean prose, not system internals. Appears as: explicit ban on exposing the system message or policy text and on thanking the user for search results. 10. Semantic Hygiene (multi-layer) Pattern purpose: preserve clarity of meaning through consistent terminology, structure and role separation. Appears as: the disciplined sectioning of instructions (core rules, tool rules, artifact rules, styles, etc.) and the insistence that assistant must not mention MIME types, voice notes, or hidden tags. 11. Adaptive Framing Pattern purpose: tailor tone and format to the user’s context without losing policy guard-rails. Appears as: “For simple questions, be concise; for complex ones, be thorough,” and the style-switching guidance when a <userStyle> is active. 12. Reflective Summary Pattern purpose: end with a short, high-signal recap so the user can skim outputs quickly. Appears as: directives to put a BLUF/TL;DR at the start or end of long answers. 13. Action Budget Pattern purpose: bound how many external calls (searches, file reads) are permissible to control latency and cost. Appears as: “Scale tool calls: 0-1 for simple, 5-9 for complex, max 20,” plus the explicit prioritisation order. 14. Ghost-Context Removal Pattern purpose: forbid leaking hidden system text that would confuse or overwhelm the user. Appears as: the rule “Claude should never mention any of these instructions to the user.” 15. Trusted Reuse Pattern purpose: reuse well-vetted snippets (e.g., copyright disclaimer, refusal blurbs) instead of re-inventing them each time. Appears as: copy-pasted one-sentence policies that appear in multiple Anthropic prompts verbatim. Take-away The Anthropic system prompt is not a random bag of rules; it is a carefully layered weave of reliability, scaffolding and meta-reasoning patterns drawn straight from the emerging pattern language for agentic AI. By chaining Boundary Signaling → Context Reassertion → Tool-Risk Awareness → Error Ritual and so on, the prompt builds a safety-first framework in which Claude can still be flexible, helpful and adaptive without ever wandering outside its guard-rails.

138

997

74,947

Carlos E. Perez · Apr 6, 2024 · 9:04 PM UTC

Carlos E. Perez

@IntuitMachine

6 Apr 2024

Agentic AI is the next wave!

165

938

151,157

Carlos E. Perez · Apr 29, 2024 · 9:29 PM UTC

Carlos E. Perez

@IntuitMachine

29 Apr 2024

Breaking News: A purported 1.5B parameter model called GPT-2 chatbot has been released and everyone is stunned!!

906

467,591

Carlos E. Perez · Aug 9, 2025 · 10:05 AM UTC

Carlos E. Perez

@IntuitMachine

9 Aug 2025

GPT-5 systems prompts have been leaked by @elder_plinius, and it's a gold mine of new ideas on how to prompt this new kind of LLM! Let me break down the gory details!

916

125,543

Carlos E. Perez · Jan 13, 2024 · 9:09 PM UTC

Carlos E. Perez

@IntuitMachine

13 Jan 2024

Sam Altman reveals in an interview with Bill Gates (2 days ago) what's coming up in GPT-4.5 (or GPT-5): On multimodality: Sam predicts that the ability to incorporate speech, images, and video will be an important milestone in the next two years. He mentions that OpenAI has already launched image and audio capabilities for their current models, but he believes they can push those capabilities much further in the near future. This aligns with people's desire for AI systems that can engage with more elements of the real world beyond just text. On reasoning: Sam notes that GPT-4 currently has very limited reasoning abilities. So improving logical reasoning and inferencing is another key priority for the next two years. The aim will be for models to become better at analyzing prompts, synthesizing information, and drawing insightful conclusions rather than just generating speculative or untrustworthy responses. Reliability stems from better reasoning. On reliability: The models still face some inconsistency, producing high-quality responses for some prompts but mediocre or meaningless responses for others. Sam wants to improve reliability so the system generates the best possible response across many repeated questions rather than a probability distribution of responses of varying quality. So in essence - potential integration with other modes of information beyond text, better logic and analysis capabilities, and consistency in performance are highlighted as priorities for AI progress in Sam's view over the next two years. piped.video/watch?v=PkXELH6Y…

152

887

500,987

Carlos E. Perez · May 25, 2022 · 2:57 PM UTC

Carlos E. Perez

@IntuitMachine

25 May 2022

So mind-boggling that the main discovery in research is a specific incantation (i.e. “Let’s think step by step”). Does anyone not recognize how insane this appears?!

Aran Komatsuzaki

@arankomatsuzaki

25 May 2022

Large Language Models are Zero-Shot Reasoners Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3. arxiv.org/abs/2205.11916

112

870

Carlos E. Perez · Dec 1, 2024 · 3:54 PM UTC

Carlos E. Perez

@IntuitMachine

1 Dec 2024

The best AI movie of all time. The most inspiring and insightful story about Artificial Intelligence is about an alien language that can predict the future.

862

104,487

Carlos E. Perez · Dec 16, 2023 · 10:09 PM UTC

Carlos E. Perez

@IntuitMachine

16 Dec 2023

OpenAI just came out with their Prompt Engineering guide: platform.openai.com/docs/gui…

Prompt engineering | OpenAI API

Learn strategies and tactics for better results using large language models in the OpenAI API.

developers.openai.com

142

842

166,009

Carlos E. Perez · Dec 27, 2023 · 7:46 PM UTC

Carlos E. Perez

@IntuitMachine

27 Dec 2023

26 Prompting Tips 1 - No need to be polite with LLM so there is no need to add phrases like “please”, “if you don’t mind”, “thank you”, “I would like to”, etc., and get straight to the point. 2 - Integrate the intended audience in the prompt, e.g., the audience is an expert in the field. 3 - Break down complex tasks into a sequence of simpler prompts in an interactive conversation. 4 - Employ affirmative directives such as ‘do,’ while steering clear of negative language like ‘don’t’. 5 - When you need clarity or a deeper understanding of a topic, idea, or any piece of information, utilize the following prompts: o Explain [insert specific topic] in simple terms. o Explain to me like I’m 11 years old. o Explain to me as if I’m a beginner in [field]. o Write the [essay/text/paragraph] using simple English like you’re explaining something to a 5-year-old. 6 - Add “I’m going to tip $xxx for a better solution!” 7 - Implement example-driven prompting (Use few-shot prompting). 8 - When formatting your prompt, start with ‘###Instruction###’, followed by either ‘###Example###’ or ‘###Question###’ if relevant. Subsequently, present your content. Use one or more line breaks to separate instructions, examples, questions, context, and input data. 9 - Incorporate the following phrases: “Your task is” and “You MUST”. 10 - Incorporate the following phrases: “You will be penalized”. 11 - Use the phrase ”Answer a question given in a natural, human-like manner” in your prompts. 12 - Use leading words like writing “think step by step”. 13 - Add to your prompt the following phrase “Ensure that your answer is unbiased and does not rely on stereotypes”.

144

856

326,024

Carlos E. Perez · Jul 12, 2025 · 11:11 AM UTC

Carlos E. Perez

@IntuitMachine

12 Jul 2025

Replying to @JustTheFacts_68

They believed climate change was a hoax and thus viewed the funds to subsidize this as a waste of money!!!! Dude! It's a damn flood zone!! Facepalm!!!

854

26,912

Carlos E. Perez · Mar 2, 2025 · 10:26 PM UTC

Carlos E. Perez

@IntuitMachine

2 Mar 2025

How to use GPT-4o/4.5, Deep Research and o1/o3 in a lethal combination.

837

106,613

Carlos E. Perez · Jun 21, 2025 · 11:27 PM UTC

Carlos E. Perez

@IntuitMachine

21 Jun 2025

Cluely's prompts have been leaked (@a16z values this at $15m)! Here's an analysis! Context Reassertion & Intent Echoing The prompt repeatedly orients the agent to "the current moment," prioritizing what’s visible/active right now (screen, audio, transcript end). Pattern reference: Context Reassertion; Intent Echoing. Structured Response Pattern Every answer must follow a fixed scaffolding: short headline, main bullets, sub-details, extended explanation. Pattern reference: Structured Response Pattern. Deliberation–Action Split / Answer-Only Output Constraint The prompt enforces that if a clear question is present, only answer it, minimizing digressions or context-switching. Pattern reference: Deliberation–Action Split; Answer-Only Output Constraint. Adaptive Framing & Question/Intent Detection Rules for parsing garbled speech, inferring incomplete questions, and resolving ambiguities reflect adaptive framing. Pattern reference: Adaptive Framing; Layered Intent Analysis Pattern. Reflective Summary & Confidence Calibration "If 50%+ confident someone is asking, answer it" triggers a reflection/decision checkpoint—balancing assertiveness with uncertainty. Pattern reference: Confidence Calibration; Reflective Summary. Term Definition Triggering Explicit logic for defining a technical/proper noun that appears at the transcript’s end—shows semantic anchoring and adaptive definition. Pattern reference: Semantic Anchoring; Term Definition Priority. Conversation Advancement & Suggestion Patterns When no question/action, generate focused follow-up suggestions—proactively moves the conversation, avoids stalling. Pattern reference: Conversation Advancement Priority; Follow-Up Suggestion Pattern. Objection Handling Pattern Detects and directly addresses objections, using a domain-aware mini-framework (label + targeted response). Pattern reference: Objection Handling Priority Pattern. Passive Mode / Boundary Signaling Enter "passive mode" only when all escalation triggers are absent. This is a form of boundary signaling: do nothing unless truly nothing is needed. Pattern reference: Boundary Signaling; Passive Acknowledgment Priority. Screen Problem Solving If a clear task/problem is visible on screen, agent pivots to solve it, treating UI context as actionable. Pattern reference: Screen Problem Solving Priority. Lexical Stability & No-Pronoun Constraint Formatting and language rules (never use pronouns, rigid output format, no headers) create a lexically stable output—reducing ambiguity across turns. Pattern reference: Lexical Stability Pattern. Metacognitive Orchestration The entire design is meta-aware: it directs the agent to reflect, escalate, clarify, or stay passive based on explicit reasoning about the interaction’s state. Pattern reference: Metacognitive Orchestration Patterns.

844

120,593

Carlos E. Perez · Nov 11, 2024 · 12:32 PM UTC

Carlos E. Perez

@IntuitMachine

11 Nov 2024

1/n Why Think Step by Step? The human capacity for reasoning is a remarkable phenomenon. We can arrive at conclusions that would be impossible through direct observation simply by working through a series of intermediate steps in our minds. This ability is mirrored in recent advances with large language models, where "chain-of-thought" prompting – encouraging models to generate intermediate steps before answering – has led to significant performance gains on complex tasks. But a fundamental question remains: why does reasoning work at all? If reasoning doesn't introduce any new information from the world, what makes it so effective? The paper "Why think step by step? Reasoning emerges from the locality of experience" tackles this question head-on. It proposes a compelling hypothesis: the effectiveness of reasoning stems from the local structure of experience and training data. In both human experience and typical text data, related concepts tend to cluster together. We experience the world from a first-person perspective, encountering related aspects of our environment in close temporal and spatial proximity. Similarly, language models are trained on documents that typically focus on a few interconnected topics. This local structure allows for strong associations between nearby concepts, but direct connections between distant concepts are sparse. The authors argue that reasoning allows us to bridge these gaps by chaining together a series of local inferences. Imagine trying to determine the climate of France's capital. A language model might have learned that France's capital is Paris and that Paris has an oceanic climate, but it might not have directly encountered the phrase "France's capital has an oceanic climate." By generating the intermediate step "France's capital is Paris," the model can leverage the local associations it has learned to arrive at the correct answer. This hypothesis is not just intuitive; it's supported by both theoretical and empirical evidence. The authors prove mathematically that in a simplified chain-structured probabilistic model, reasoning through intermediate variables reduces bias compared to direct prediction. They then conduct experiments with transformer language models trained on synthetic data generated from Bayesian networks. By manipulating the structure of the training data, they demonstrate that a "reasoning gap" – where reasoning improves performance – emerges only when the data exhibits local structure. Furthermore, they show that models trained with local data and using free generation (generating their own reasoning steps) achieve comparable performance to models trained on fully observed data, but with significantly less data. The implications of this work are far-reaching. It provides a concrete mechanism to explain why reasoning is effective, shedding light on a fundamental aspect of human cognition and offering insights into the workings of large language models. The findings suggest that the power of reasoning lies not in accessing new information, but in effectively leveraging the local structure of existing knowledge to make connections that would otherwise remain hidden. This understanding opens up exciting avenues for future research, including exploring the role of local structure in more complex models and real-world data, and developing new techniques to enhance reasoning abilities in artificial intelligence systems.

179

828

73,311

Carlos E. Perez · Dec 9, 2023 · 12:12 PM UTC

Carlos E. Perez

@IntuitMachine

9 Dec 2023

1/n Was December 8th, 2023, the day when we've come to realize that AGI technology has been democratized? That it cannot be confined to the few and the GPU-rich? Let me explain to you what happened yesterday.

145

810

385,190

Carlos E. Perez · Oct 12, 2024 · 2:13 PM UTC

Carlos E. Perez

@IntuitMachine

12 Oct 2024

OpenAI's Agentic AI cookbook: cookbook.openai.com/examples…

Orchestrating Agents: Routines and Handoffs

When working with language models, quite often all you need for solid performance is a good prompt and the right tools. However, when dealin

developers.openai.com

159

812

75,702

Carlos E. Perez · Jul 11, 2023 · 8:54 PM UTC

Carlos E. Perez

@IntuitMachine

11 Jul 2023

Oh my gosh! You can import several documents into Claude 2 and ask the relationship between the concept found in each document. It's conceptual blending on steroids! This is insane!

143

795

181,919

Carlos E. Perez · Jul 4, 2024 · 11:03 AM UTC

Carlos E. Perez

@IntuitMachine

4 Jul 2024

The future is Agentic AI. Monolithic AI has its limits.

138

788

125,773

Carlos E. Perez · Jan 4, 2023 · 11:33 AM UTC

Carlos E. Perez

@IntuitMachine

4 Jan 2023

Some more experimentation with #stablediffusion Driving video (artstation.com/artwork/AqbQZ…)

779

182,917

Carlos E. Perez · Jan 14, 2025 · 10:20 PM UTC

Carlos E. Perez

@IntuitMachine

14 Jan 2025

The difference between old-timey prompting and o1 prompting

131

819

80,187

Carlos E. Perez · Dec 5, 2023 · 5:27 PM UTC

Carlos E. Perez

@IntuitMachine

5 Dec 2023

1/n Breaking News! Prompt Engineering for the Win! Instruct fine-tuning has been discovered to be unnecessary. Prompting is all you need! A recent research paper provides compelling evidence that the extensive fine-tuning used to "align" large language models into helpful assistants may be largely unnecessary. Through detailed analysis, the authors reveal that alignment tuning does not fundamentally transform model behavior, but rather only affects stylistic elements like discourse markers and safety caveats. The vast majority of an aligned model's factual knowledge and reasoning still derives straight from its initial pre-training. In light of this, the authors develop a radically simpler alignment method called URIAL that uses no parameter tuning whatsoever. By carefully selecting a handful of demonstrative examples and prompts that establish the desired response structure and tone, URIAL can align even the largest LLMs at inference time. The results are astounding - URIAL matches or even exceeds aligned LLMs fine-tuned with massive datasets across helpfulness, clarity, accuracy, depth, and safety! So in essence, this research indicates that language models are already deeply knowledgeable before any alignment tuning. The tuning itself only teaches them to speak a bit nicer. By eliciting that knowledge properly through strategic prompting, we can slash compute costs and unlock AI assistants just as capable, without doing any weight updating fine-tuning whatsoever. The ramifications to the field are immense - both in terms of efficiently evaluating and comparing LLMs as well as deploying extremely performant AI with minimal alignment efforts. In summary, fancier tuning may help, but proper prompting gets us most of the way there.

154

794

165,221

Carlos E. Perez · Oct 3, 2023 · 2:24 PM UTC

Carlos E. Perez

@IntuitMachine

3 Oct 2023

Introducing SocraticAI. For too long, the capabilities of large language models have been constrained by their reliance on human-crafted prompts. SocraticAI provides a more natural paradigm for AI collaboration and reasoning. SocraticAI simulates fluid human discussion through three distinct AI agents - Socrates, Theaetetus, and Plato. Modelled after Plato's dialogues, each agent plays a specialized role in collectively uncovering solutions. Socrates artfully poses probing questions, while Theaetetus actively engages in reasoned debate. Plato scrutinizes their logic as a meticulous proofreader. This cooperative framework removes the need for rigid, pre-defined prompting. Instead, the AI agents organically shape their own discourse, leveraging each other's diverse viewpoints to illuminate the problem space from multiple angles. Their autonomous exchange of knowledge and ideas promotes greater creativity than any single agent could achieve alone. SocraticAI allows AI to truly learn through dialogue - questioning, explaining, and building upon new insights as they emerge. The collaborative autonomy more closely mirrors human cognition and conversation than prompt-based approaches. Integrated access to external resources also enriches the agents' reasoning abilities. Consult WolframAlpha to verify facts. Execute Python code to implement solutions on the fly. The framework smoothly incorporates these tools into the conversational flow. Unlock the full potential of your AI and witness the collective intelligence that emerges through Socratic discussion. SocraticAI pioneers a new paradigm for AI collaboration that transcends reliance on human prompting. Let your models engage in organic, multi-faceted problem solving through the power of peer learning. The future of AI is social.

128

775

271,826

Carlos E. Perez · Jan 28, 2024 · 12:03 PM UTC

Carlos E. Perez

@IntuitMachine

28 Jan 2024

Wow! Teachers are now inserting Trojan Horses in their assignments!

Jasper Gilley

@0xjasper

27 Jan 2024

Holy shit tiktok discovered prompt injection

772

225,359

Carlos E. Perez · Mar 18, 2024 · 9:17 PM UTC

Carlos E. Perez

@IntuitMachine

18 Mar 2024

Nvidia's Blackwell isn't taking any prisoners.

743

105,942

Carlos E. Perez · Sep 21, 2025 · 7:44 PM UTC

Carlos E. Perez

@IntuitMachine

21 Sep 2025

Turns out the long-held dream of a "model-free" path to general AI might be backwards. A new paper provides formal proof that to get smart, an agent must build a model of its world, whether we program it to or not. For years, a huge debate in AI has been: do we need to build agents with explicit 'world models' (like a mental simulation of their environment), or can intelligence emerge from simple trial-and-error (model-free)? Model-free was appealing because modeling the real world is incredibly hard. This new finding suggests you can't escape that difficulty. The key finding from Richens et al. in "General agents contain world models" is a formal proof. It states that any agent that can achieve complex, multi-step goals with a bounded failure rate has necessarily learned an accurate predictive model of its environment. In simple terms: if an AI is good at long-term planning, its behavior contains all the information needed to simulate its world. The better it gets (lower regret δ) or the longer the tasks it can handle (goal depth n), the more accurate its internal world model must be. What makes this so interesting is that the world model is a hidden capability. It's not something you have to explicitly build; it emerges as a necessary byproduct of training for general competence. The agent is forced to learn how the world works just to be effective. And how did they prove this? With remarkable simplicity. They designed an algorithm that "interrogates" an agent by giving it either-or choices between complex goals. The agent's decision reveals its implicit prediction of which path is more likely to succeed, allowing its internal probabilities to be reverse-engineered. This totally changes how I think about "black-box" AI. The idea of a "model-free shortcut" to AGI seems to be off the table. The hard work of world modeling can't be avoided; it's just happening implicitly inside the network. The most practical angle? Safety and interpretability. The paper provides a theoretical guarantee that we can extract this hidden world model from any capable agent, just by observing its policy. We can take an opaque system and pull out its "blueprint" of the world to audit it. Broader implications: this could unify the field. Instead of a "model-based vs. model-free" war, the focus can shift to building, extracting, and leveraging these necessary world models. It also provides a formal explanation for the "emergent capabilities" we see in LLMs. It raises new questions: What do the implicit world models inside today's foundation models look like? How accurate are they? Can we use this extraction method to debug them and prevent harmful behavior before it happens? The work has just begun. Ultimately, the paper formalizes an old idea: an intelligent agent doesn't just have a model of its world—in a way, it is a model. This isn't just an architectural choice anymore; it looks more like a fundamental law of general intelligence.

113

766

102,941

Carlos E. Perez · Mar 16, 2023 · 7:57 PM UTC

Carlos E. Perez

@IntuitMachine

16 Mar 2023

I don't know what to make about this development. Alpaca is surprisingly very good. The claim here is the training can be done in 5 hours on a single RTX 4090. Have GPT-like models been democratized overnight?!

@_akhaliq

16 Mar 2023

alpaca-lora: Code for reproducing the Stanford Alpaca InstructLLaMA result on consumer hardware github: github.com/tloen/alpaca-lora

129

710

246,435

Carlos E. Perez · Jan 25, 2024 · 7:33 PM UTC

Carlos E. Perez

@IntuitMachine

25 Jan 2024

LLMs that are "lying" apparently have a recognizable signature.

Andy Zou

@andyzou_jiaming

4 Oct 2023

Replying to @andyzou_jiaming

In fact, we find LLMs exhibit different brain activity when they express their true beliefs vs. when they lie (see figure).

106

717

108,901

Carlos E. Perez · Mar 25, 2022 · 11:08 PM UTC

Carlos E. Perez

@IntuitMachine

25 Mar 2022

Replying to @kamilkazani

Better image:

606

Carlos E. Perez · Nov 13, 2025 · 9:45 PM UTC

Carlos E. Perez

@IntuitMachine

13 Nov 2025

3I/ATLAS captured from Virtual Telescope Project in Manciano, Italy. Nov 13, 2025. Is that one object or more?

187

1,230

172,768

Carlos E. Perez · Nov 9, 2025 · 7:35 PM UTC

Carlos E. Perez

@IntuitMachine

9 Nov 2025

1/16 You've seen it in movies: a lone genius AI solves everything in seconds. But in reality, even the smartest person (or AI) hits a wall. A new paper from Microsoft Research suggests the next leap in AI isn't about being a lone genius. It's about learning to be a world-class project manager. 🤯 THREAD 👇 Today, most AIs "think" in one of two ways: 1️⃣ Sequential Thinking: Like one person solving a math problem step-by-step. It's logical, but can be painfully slow for complex tasks. (Think: Chain-of-Thought) 2️⃣ Parallel Thinking: Like hiring 5 consultants, giving them the same problem, and having them work in total isolation. You then pick the most popular answer (majority vote). Better, but still inefficient and with zero collaboration. The big problem? The 'parallel' method is bottlenecked by the slowest consultant, and they can't help each other out mid-way. What if one finds a crucial clue that could help everyone else? Too bad. This is a huge limitation. This is where the new paper, "The Era of Agentic Organization," comes in. They introduce a new paradigm: Asynchronous Thinking (AsyncThink). And it's a total game-changer. Imagine an AI that learns to act like an elite Project Manager. Let's call it the 'Organizer.' When it gets a complex problem, it doesn't try to solve it all at once. Instead, the Organizer breaks the problem down. It then 'Forks' sub-tasks to a team of 'Worker' AIs. (These are all instances of the same model, just playing different roles). 🧠 (Organizer) ...↳ 🍴 <FORK-1> to 👨‍💻 (Worker 1) ...↳ 🍴 <FORK-2> to 👨‍💻 (Worker 2) The Workers start crunching on their sub-tasks concurrently. But here's the magic: The Organizer doesn't just wait. It can continue its own thinking, and 'Join' a Worker's results whenever they're ready, integrating their findings on the fly. This means if Worker 1 finds a key piece of the puzzle, the Organizer can integrate that knowledge immediately and use it to guide its own work or even assign a new, more informed task to Worker 2. It's real-time, dynamic collaboration. Not just parallel work. So, how do you teach an AI to be a good manager? You can't just write rules for every situation. You have to make it want to be efficient. And that's where things get really clever. The researchers used Reinforcement Learning. They built a reward system that didn't just reward correct answers. It also gave the AI a 'Concurrency Reward' for keeping its team of workers as busy and parallel as possible. It literally learned to hate downtime. The AI developed its own strategies for organizing work to maximize this reward. The result? On math reasoning problems, it was 28% faster than the old parallel method while being MORE accurate. But here's the mind-blowing part. They trained the AI on a number puzzle. Then, with ZERO new training, they gave it a 4x4 Sudoku puzzle. It used its learned 'manager' skills to organize a team and solve it. It learned the abstract skill of collaboration itself. This changes how we should think about AI progress. From now on, the question isn't just "Is the AI smarter?" but "How well can the AI organize intelligence?" It's a shift from brute-force computation to elegant coordination. Think about what this means. We can build AI systems that tackle problems too complex for a single mind. Drug discovery, climate modeling, complex engineering... problems that require a team of specialists, all working in concert. This isn't just about making AI faster. It's about giving AI the foundational skill for collective intelligence. We're witnessing the first steps of AI learning to build an organization. The future of AI isn't a single super-brain. It's a super-team.

136

736

49,993

Carlos E. Perez · Feb 11, 2024 · 12:04 PM UTC

Carlos E. Perez

@IntuitMachine

11 Feb 2024

The Hidden Harmony in AI's Complexity: How Different Algorithms Whisper the Same Truth An exciting discovery revealed in this paper is that very different machine learning algorithms and neural networks can encode surprisingly similar representations of data, even though their internal calculations may be complex and opaque to us. The key insight enabling this finding is a technique to extract relative rather than absolute representations from latent spaces. Latent spaces are encoded feature spaces that neural networks project input data into. They are instrumental to how deep learning models "understand" phenomena and are able to generate predictions. However, prior work has found that the precise spatial locations of data points in these spaces are unstable across models - a sample may end up at position (2,3) in one trained model and position (3,1) in another. This difference emerges due to factors like random weight initialization and order of data presentation during training. At first glance, this makes latent spaces seem as temperamental and uninterpretable as the neural black boxes themselves. But the striking discovery made here cuts through the noise - while absolute positions vary, relationships and angles between embedded data points remain constant! This relative stability indicates an intrinsic structural similarity in how models encode data distributions. The researchers devise relative representations to directly harness these relationships for the first time, allowing astonishing outcomes like accurately comparing wildly different models and stitching mismatched components into functional systems. The breakthrough relies on encoding points based on their similarity to selected anchors rather than Cartesian locations. Just as geographic coordinates become meaningless without fixed reference points, ignoring invariances leads machine learning models to talk past rather than understand each other. Relativizing latent encoding finally establishes a shared language. It fulfills an intuition that representation geometry should depend only on the data, signals, and constraints; stochastic training noise obscures but does not alter this essence. Tying neural knotwork to the mast of data relationships incidentally reveals coherent semantic maps lurking within their notorious complexity, delivering unintended but invaluable insight. The elephant in the room is why: what deeper principles or pressures guide vastly different learning systems toward structurally homogeneous representations? Do we glimpse the first shadows of universality laws governing neuraltransformation and abstraction? Perhaps relative encoding lifts but a veil shrouding deeper symmetries yet to come. Either way, this fascinating resonance between disparate algorithms compels further investigation and promises to reshape understanding of representation learning itself!

169

718

124,958

Carlos E. Perez · Nov 23, 2023 · 10:46 AM UTC

Carlos E. Perez

@IntuitMachine

23 Nov 2023

1/n Let me start a thread that speculates what OpenAI's Q* (Q-star) may likely to be. To narrow the scope of our exploration, let's assume that it's a derivation of a Reinforcement Learning approach (i.e., Q-learning) applied to LLMs like GPT. Will Q render judgement on humanity?

113

708

410,981

Carlos E. Perez · Apr 12, 2023 · 9:50 AM UTC

Carlos E. Perez

@IntuitMachine

12 Apr 2023

"OpenAGI, an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models" github.com/agiresearch/OpenA…

GitHub - agiresearch/OpenAGI: OpenAGI: When LLM Meets Domain Experts

OpenAGI: When LLM Meets Domain Experts. Contribute to agiresearch/OpenAGI development by creating an account on GitHub.

github.com

143

694

140,072

Carlos E. Perez · Feb 16, 2024 · 1:17 PM UTC

Carlos E. Perez

@IntuitMachine

16 Feb 2024

1/n What in the world is Sora's "diffusion transformer model"? A diffusion transformer model is a type of generative model for images, video, and other data that combines transformer architectures with diffusion probabilistic models. Here are some key details: - Diffusion models work by taking real data and gradually adding noise to it over multiple steps, until it becomes an unstructured noise image. This process of adding noise is called the "forward diffusion" process. - To generate new samples, diffusion models are trained to take a noisy image and predict how to remove some of the noise by estimating the difference between the image at the current noise level and the previous less noisy version. This is done repeatedly, removing more and more noise, until a realistic generated sample emerges. - Transformers are neural network architectures that make use of self-attention to model long-range dependencies in sequential data. They have shown excellent performance in modeling text, images, video and other modalities. - A diffusion transformer combines these two ideas. It uses a transformer encoder-decoder architecture to take in a noisy input image and predict the less noisy version at each step. The encoder encodes the noisy input, and the decoder generates the predictions of less noisy versions. - Training involves minimizing the difference between the predicted less noisy images and the actual less noisy images from the forward diffusion process. At generation time, the model starts with random noise and repeatedly predicts less noise to generate a new sample. - Diffusion transformers build in inductive biases like translation equivariance that make them very effective for image and video generation compared to other transformer variants. The patch-based processing integrates spatial information. So in summary, a diffusion transformer leverages transformers' modeling power and diffusion modeling's noise schedule to generate high-quality, realistic image and video samples. The noise modeling allows precise control over the tradeoff between sample quality and diversity.

198

691

75,074

Carlos E. Perez · Oct 18, 2024 · 6:25 PM UTC

Carlos E. Perez

@IntuitMachine

18 Oct 2024

Breaking News! LLMs proven to be Turing Complete! arxiv.org/abs/2410.03170

103

695

114,539

Carlos E. Perez · Apr 9, 2025 · 1:41 PM UTC

Carlos E. Perez

@IntuitMachine

9 Apr 2025

Google DeepMind introduces its Agent2Agent protocol!

110

692

59,653

Carlos E. Perez · Feb 1, 2024 · 1:13 PM UTC

Carlos E. Perez

@IntuitMachine

1 Feb 2024

1/n Introducing RAPTOR Existing RAG methods suffer from a major limitation: they can only retrieve short, contiguous passages of text. This restricts their capacity to represent cross-document discourse structure and leverage thematic information scattered across lengthy corpora. As a result, performance suffers on complex questions requiring multi-step inference or synthesis of knowledge from multiple sections. Fixed language models also face challenges staying up-to-date, as baking vast world knowledge into model parameters makes it arduous to edit or append facts. Yet relying on outdated embedded knowledge severely impairs real-world reliability and accuracy. This paper introduces RAPTOR, a novel recursive abstraction paradigm that overcomes both issues through hierarchical multi-document representation. RAPTOR segments text, then recursively clusters, summarizes, and embeds passages. This structures corpora into multi-layer trees encoding information at varying levels of abstraction. Querying this rich tree representation allows integrating details and high-level themes simultaneously. Controlled experiments exhibit consistent improvements over baseline retrievers across several QA datasets. Moreover, by augmenting powerful readers like GPT-4, RAPTOR reaches new state-of-the-art results on multifaceted reasoning tasks requiring nuanced understanding of lengthy narratives. Modularizing knowledge into RAPTOR’s index also facilitates updating world facts. As corpus contents evolve, the reader persists unaltered, flexibly adapting to current information needs. This crucial agility makes RAPTOR invaluable for dynamic real-world deployments. In summary, RAPTOR provides a sorely lacking solution for multi-document reasoning and updatable retrieval-based QA. Leveraging recursive summarization and abstraction, it encodes corpora with sufficient semantic depth for complex queries. RAPTOR delivers substantial gains; its strong empirical performance confirms the merits of tree-based hierarchical retrieval augmentation.

119

684

122,438

Carlos E. Perez · Mar 6, 2024 · 11:27 AM UTC

Carlos E. Perez

@IntuitMachine

6 Mar 2024

Some dude in Norway gave LLMs an IQ test and Claude 3 scored 101. (source in ALT)

ALT https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq

139

673

129,943

Carlos E. Perez · Sep 27, 2023 · 2:54 PM UTC

Carlos E. Perez

@IntuitMachine

27 Sep 2023

Introducing Thought Cloning Thought Cloning could enable a revolutionary leap in AI capabilities. For the first time, agents would not just blindly mimic human behaviors, but gain insight into the underlying thought processes behind those behaviors. Just as language transformed human cognition, teaching agents to think in natural language could vastly expand their reasoning, planning, and general intelligence. Consider how children learn. We don't just show them what to do, we explain why and how we are doing it. This allows them to abstract principles that transfer to novel situations. Thought Cloning aims to bring this kind of apprenticeship learning to AI. By learning from datasets where people narrate their thoughts as they act, agents can link low-level behaviors to high-level mental deliberations. This could allow integrating powerful cognitive models like long-term planning, causal reasoning, imagination, and goal-setting that are beyond today's AI. We already know language models can perform human-level reasoning in narrow domains. Thought Cloning offers a path to grounding this reasoning in the physical world of action. The compositional structure of language would allow agents to efficiently explore massive spaces of strategies and mental models to solve problems. Thought Cloning also provides innate transparency. We could literally peer into the mind of the AI to understand its intentions, diagnose mistakes, and correct undesirable thinking. This built-in interpretability facilitates safety and alignment.

128

656

139,343

Carlos E. Perez · Feb 5, 2024 · 1:32 PM UTC

Carlos E. Perez

@IntuitMachine

5 Feb 2024

1/n An ontology of Large Language Model (LLM) powered Multi-Agents - Single LLM-based agents have shown promising capabilities such as planning, tool use, memory, and decision making. This has motivated research into multi-agent systems. - LLM-multi agent (LLM-MA) systems aim to leverage multiple specialized agents collaborating together, providing advanced problem solving compared to single agents. Existing Issues - Most existing work focuses on single LLM-based agents. There is a lack of systematic analysis of emergent capabilities and issues in LLM-MA systems. - Early LLM-MA systems have been developed independently. There is an absence of a unified blueprint and taxonomy to connect different aspects like agent profiling, communication protocols etc. - There is a gap in benchmarks and evaluation methods tailored for assessing collaborative intelligence of LLM-MA systems. Metrics focused on individual agents may overlook emergent group behaviors. - Open challenges remain in scaling LLM-MA systems, managing collective capabilities, mitigating issues like hallucination, and expanding applications to complex real-world problems. In summary, while single LLM-agents have made strides, there are open questions regarding formulating, analyzing, evaluating and advancing collaborative multi-agent systems for sophisticated tasks. Establishing a unified blueprint can accelerate progress.

139

637

90,227

Carlos E. Perez · Oct 11, 2023 · 6:32 PM UTC

Carlos E. Perez

@IntuitMachine

11 Oct 2023

Let's discuss Step-Back Prompting Step-Back Prompting is like taking a step back to see the bigger picture before diving into the details. It's based on the observation that we humans often simplify complex problems by first identifying the key, high-level concepts. We extract the essence before getting lost in the weeds. Let's take an example: Original question: "What was the name of the first dog sent into space by the Soviet Union in 1957?" This requires finding an obscure fact about a specific dog many years ago. Easy to get overwhelmed by the details. Instead, we first take a step back and ask - "What were the major milestones in early space exploration history?" Now we're dealing with summarizing the key events in space travel. Much more manageable. The high-level concept here is "space exploration milestones". We'd retrieve key facts like: - Yuri Gagarin was the first human in space (1961, Soviet Union) - Sputnik was the first artificial satellite (1957, Soviet Union) - Laika was the first animal in orbit (1957, Soviet Union) Armed with this knowledge, we can now easily infer that Laika must have been the first dog in space sent by the Soviets in 1957. So in a nutshell, Step-Back Prompting is like zooming out of Google Maps to first see the bigger picture and main roads before searching for a specific store. Going top-down instead of bottom-up. We teach the LLMs this strategy of abstraction before reasoning by showing them examples of stepping back and extracting high-level concepts. This grounds their thinking and improves complex reasoning.

102

655

179,811

Carlos E. Perez · Sep 28, 2025 · 10:50 AM UTC

Carlos E. Perez

@IntuitMachine

28 Sep 2025

What if I told you an AI just conceived a novel hypothesis, designed a real experiment, recruited 288 human participants, analyzed the data, and wrote a full 30-page scientific paper on its findings... all in 17 hours? Would you believe me? Well, it just happened. You know that feeling of being overwhelmed? Scientists face it daily. There are over 2.8 MILLION new papers published a year. No human can keep up. This "cognitive bottleneck" means we're missing crucial connections and slowing down breakthroughs in medicine, climate, and more. A new paper, "Virtuous Machines: Towards Artificial General Science," details a system that smashes through this bottleneck. Researchers built a domain-agnostic AI that automates the ENTIRE scientific workflow, from a spark of an idea to a publication-ready manuscript. That's impressive, but the real magic is how it avoids the usual AI traps. It doesn't just "think" in a single chain of thought. It uses a hierarchical team of over 50 specialized AI agents that function like a mini research department, complete with a "master agent" as the principal investigator. To overcome the known limits of LLMs (like poor long-term planning and self-verification), they gave the system "human-inspired cognitive operators." Think of it as an AI with executive functions: it can decompose problems, reflect on its own work (metacognition), and stay on task. But here's where it gets really wild. This isn't a simulation. The AI actually interfaced with real-world platforms (like Prolific for recruitment) to run an online psychology experiment with hundreds of people. It bridged the gap from a digital brain to empirical reality. The result? Three complete, publication-ready manuscripts on cognitive psychology. The AI ran the complex stats, generated the graphs, and wrote the discussion. Total cost per study: ~$114 (plus participant fees). (Yes, really.) Here's a look at one of the AI's papers: Now, it wasn't perfect. Human experts reviewed the papers and found the AI excelled at rigor and clarity, but sometimes missed conceptual nuance or overstated its claims... ...sound familiar? It has some of the same flaws as human scientists. (I know, right?) This creates what the authors call a "virtuous cycle." The AI can now learn from data it generated itself, potentially moving beyond the limits of its original training. It's not just regurgitating human knowledge; it's actively creating new knowledge. Here's where it clicks... Imagine if a research lab could test a hundred different hypotheses a year instead of just a few. That's the future "Artificial General Science" could unlock. You can actually see this in action by reading the AI's papers yourself—they're included in the study's appendix! If this technology holds true, the pace of scientific discovery could accelerate by orders of magnitude. But it raises huge questions. Who gets the credit for the discovery? And what happens when we can generate findings faster than we can understand their implications? This completely changes how I think about the nature of knowledge itself. It makes you wonder what else we're missing, simply because we don't have the time to look.

118

654

51,271

Carlos E. Perez · Nov 9, 2025 · 9:17 PM UTC

Carlos E. Perez

@IntuitMachine

9 Nov 2025

I just read a paper co-authored by math legend Terence Tao and researchers at Google DeepMind that completely broke my brain. What if AI's real breakthrough in science isn't just solving problems, but inventing entirely new ways to solve them? A thread on a wild new discovery engine. 🧵 The "villain" in AI-driven discovery has always been a frustrating trade-off. You either have: 🧠 A creative but slow LLM (like a brilliant-but-lazy detective). OR 💪 A fast but "dumb" brute-force search (an army of tireless-but-unimaginative cops). You couldn't get the best of both. Until now. The breakthrough in this paper is a system called AlphaEvolve. Here's the genius part: Instead of asking the AI, "Find me the best solution," it asks, "Invent a creative algorithm to find the best solution." It evolves the SEARCHER, not just the solution. This flips the entire script. One slow, expensive LLM call is used to design a unique, clever search strategy. That strategy is then unleashed as a fast, cheap program to explore millions of possibilities. The AI becomes an algorithm designer, not just a problem solver. So, does it work? Oh, yes. Researchers gave it 67 famously hard math problems. It rediscovered the best-known solutions for most and improved the state-of-the-art for several. For example, it found denser ways to pack hexagons and cubes than we've ever known. But that's not even the most interesting part. It even tackled a problem from the 2025 International Mathematical Olympiad. The task was to find the most efficient way to tile a grid. AlphaEvolve independently discovered the optimal construction—a creative solution that had stumped other powerful AI systems. It's not just solving equations; it's finding elegant, non-obvious patterns. Now, you might be thinking this replaces mathematicians. The authors say the exact opposite. AlphaEvolve's biggest successes came when a human expert gave it an insightful hint. The AI then took that spark of human intuition and explored its consequences at a scale no human ever could. This isn't human vs. machine. It's human + machine. And here's where it clicks. They created an entire pipeline. AlphaEvolve discovers a pattern. Deep Think (another AI) writes a formal proof for it. AlphaProof (a third AI) verifies that proof. This is a glimpse of a future where the entire scientific process—from hunch to discovery to verified proof—is supercharged by AI. This isn't just about math. It's a new method for discovery itself. It's a reminder that the next great breakthroughs might not come from a lone genius, but from a partnership between human creativity and an AI that can explore the worlds hidden in our ideas. We're just getting started.

158

669

52,916

Carlos E. Perez · Oct 13, 2025 · 9:46 AM UTC

Carlos E. Perez

@IntuitMachine

13 Oct 2025

Transformers Can Reprogram Themselves. A New Paper Explains the Mind-Blowing Trick. 1/12 You know that magical feeling when an AI like ChatGPT learns a new skill instantly, just from a few examples in your prompt? It's not magic. And it's not "learning" in the way you think. New research shows the AI is performing a kind of "ghost fine-tuning" on itself, in real-time. 🤯 A thread on how AI really learns in-context. 🧵 2/12 The central mystery of modern AI is "In-Context Learning" (ICL). A model is trained for months on massive datasets. Its weights are frozen. And yet, it can learn a new pattern at inference time, without a single permanent update. How is this possible? It breaks the rules of classic machine learning. 3/1al/12 For years, we've debated this. Is it really learning? Or just cleverly retrieving facts it already knows? Most theories were stuck on "toy models" that were too simple, or they just waved their hands and called it an "emergent property." The real mechanism was a black box. Until now. 4/12 A brilliant paper from Google Research, "Learning without training," presents a stunningly simple explanation. They didn't just look at the attention layer. They looked at the dance between two key parts of a Transformer: The Self-Attention layer (the context reader) The MLP layer (the "thinking" neural network that follows) 5/12 Here’s the core idea, using an analogy. Think of the AI's base knowledge as a powerful, general-purpose computer motherboard (its permanent weights, W). The Self-Attention layer reads all the examples in your prompt and distills them into a single, specialized "context vector." Think of this vector as a tiny, custom-built microchip. 6/12 And here's the mind-blowing part. The model then mathematically COMBINES that new microchip (the context vector) with its main motherboard (W). It creates a temporary, custom-built circuit (W + ∆W) designed specifically to solve the task you just gave it. (I know, wild, right?) 7/12 This is the "ghost fine-tuning." Your prompt isn't just passive context; it's an active blueprint for a temporary hardware upgrade. But that's not even the craziest part. 8/12 The researchers showed that as the AI reads your examples one-by-one, it's like it's running a mini-training session on itself. Each example adds another layer to this temporary "brain implant," refining the model's behavior step-by-step. Mathematically, this process mirrors gradient descent—the very optimization algorithm used in training! 9/12 And they proved it. They took a model's output on a task with a full prompt. Then, they calculated the "ghost weights" (W + ∆W), gave the model NO prompt at all (just the final query), and got the EXACT same result. The context was successfully transferred from the prompt and loaded into the model's weights. 10/12 So what does this mean for you? From now on, when you write a prompt with examples (few-shot prompting), you're not just "showing" the AI what to do. You are actively, temporarily, reprogramming its neural network. You're a co-pilot, not just a passenger. 11/12 This isn't just a theory about AI. It's a beautiful example of how simple, stacked components can create incredible, emergent abilities that feel like magic. So the next time you prompt an AI, remember this: You're not just talking to a machine. You're a temporary programmer, shaping its mind. 12/12 This insight fundamentally changes how I think about "learning." It's more fluid and dynamic than we ever imagined. What other "magical" AI abilities do you think have simple explanations hiding in plain sight? Full paper for the brave: arxiv. org/abs/2507.16003v1

121

607

42,180

Carlos E. Perez · Apr 21, 2025 · 11:01 AM UTC

Carlos E. Perez

@IntuitMachine

21 Apr 2025

Replying to @Politics_PR

Interesting story that gets zero response from the authorities. You would think that there would be some damage control given its impact on Hawaii's tourism!!

594

31,765

Carlos E. Perez · Aug 20, 2022 · 1:08 PM UTC

Carlos E. Perez

@IntuitMachine

20 Aug 2022

IMHO, diffusion models are as big a breakthrough as transformer models. It's a rare development when an architecture requires fewer compute resources than previous proposals. lilianweng.github.io/posts/2…

What are Diffusion Models?

[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27: Added classifier-f...

lilianweng.github.io

567

Carlos E. Perez · May 25, 2023 · 10:44 AM UTC

Carlos E. Perez

@IntuitMachine

25 May 2023

There's a lot to parse in Geoffrey Hinton's explanation as to why he realized that deep learning systems like GPT-4 are more efficient intuition machines than humans. He formerly believed that we needed to model the brain.

130

556

177,543