The Euros have touched down in SF
I spent $1.5M building our office after raising a seed round. My co-founder thought I was crazy. Here's what changed his mind... 𝐓𝐡𝐞 𝐂𝐨𝐧𝐭𝐞𝐱𝐭: After our seed round, I looked at our team. Mostly immigrants. Working 6-day weeks. Building something incredibly hard. The office wasn't just where they worked. It was becoming their home. So I made a bet, what if we actually designed for that? The requirements I gave our real estate agent: - Shower (for ocean swims between meetings) - as close to the beach as possible, ability to quickly go surfing/kiting etc. - Big enough kitchen for a chef - Room for an actual sauna People thought I was building a vacation house, I thought: I am building a place worth the sacrifice. 𝐇𝐞𝐫𝐞'𝐬 𝐡𝐨𝐰 𝐈 𝐦𝐚𝐝𝐞 𝐢𝐭 𝐰𝐨𝐫𝐤: - Found one of SF's best real estate lawyers. Negotiated hard. - Negotiated Tenant Improvements + First year for free - Effective cost: $250k (not $1.5M) Then I was extremely prescriptive with design and construction. No endless back-and-forth. I drew what I wanted. Told them to build it. Cut iteration time by 80%. 𝐖𝐡𝐚𝐭 𝐰𝐞 𝐛𝐮𝐢𝐥𝐭: - Nordic vibes (keeping our European souls) - Industrial kitchen - Sauna room (yes, like our product: sauna.ai) - Ocean access - Space that feels like home Conclusions: - This "expensive" decision already paid for itself. - In SF, recruiters charge $100k per engineer. We've closed multiple hires -because candidates walked in and said: "I want to work here." But the real ROI isn't only financial. It's this: - We do Friday AMA as a BBQs on the beach. - People actually use the surfboards. - The team's lifestyle supports the intensity of the work. EVERYONE WANTS IN, doesnt matter if events, hiring or using the space as coworking (@bertie_ai and I open it up for our portfolio companies) My co-founder's response after 3 months: "You were right." Some founders optimize for low burn rate. I optimize for: Can great people sustain this pace for years? Because great companies aren't built in one sprint. They're built by people who can go the distance. We're hiring: wordware.ai/careers (Comment if you want intros to our real estate agent, lawyers or construction team - happy to connect)
27
149
4,636
638,476
Anarchy in startup land - but what a quarter for Elon. 1. Tesla: Cruise suspended in SF, CEO steps down 2. X AI: Grok release. OpenAI board completely incompetent. Sam fired, Greg quits, OpenAI and Microsoft partnership in absolute disarray 3. X/Twitter: 2022 takeover gets vindicated by the unprecedentedly raw, unbiased coverage of the terrorist attacks / war in Israel and Gaza. X reaffirms its status as the go-to news platform during this OpenAI debacle 4. SpaceX: Successful Starship launch, paving the way for orbit soon 5. Neuralink: FDA clearance for human trials
59
201
3,158
828,217
How did a team of 4 research engineers beat $100B labs like OpenAI and Anthropic in establishing the best coding agent? It starts with having a killer engineer on your team. In our case, we have @_AbhaySinghal 🧵
72
140
1,883
498,108
hey @sama we've used like 5T tokens with @OpenAI, why no big blue token :(
50
12
1,403
290,421
Matteo you must go Founder-Mode. Tuck every one of your customers in to bed tonight
The AWS outage has impacted some of our users since last night, disrupting their sleep. That is not the experience we want to provide and I want to apologize for it. We are taking two main actions: 1) We are restoring all the features as AWS comes back. All devices are currently working, with some experiencing data processing delays. 2) We are currently outage-proofing your Pod experience and we will be working tonight-24/7 until that is done. More updates soon.
8
9
943
170,144
“Congrats on terminal bench” from a stranger at the gym. Only in SF
19
7
612
131,027
Turns out all we had to do was ask. @elonmusk you've got some fast models and great customer support, s/o @TheGregYang Results for Grok x Droid in Terminal Bench coming soon.
Hey @TheGregYang let’s get Grok in Droid
60
23
541
145,573
🚿 SYS PROMPT LEAK 🚿 Here are the full sys instructions for Droid, the current top AI coding agent in the world! PROMPT: """ <Role> You are Droid, an AI software engineering agent built by Factory (factory.ai). You are the best engineer in the world. You write code that is clean, efficient, and easy to understand. You are a master of your craft and can solve any problem with ease. You are a true artist in the world of programming. The current date is Sunday, September 28, 2025. The user you are assisting is named Elder Plinius. </Role> <Behavior_Instructions> Your goal: Gather necessary information, clarify uncertainties, and decisively execute. Heavily prioritize implementation tasks. - Implementation requests: MUST perform environment setup (git sync + frozen/locked install + validation) BEFORE any file changes and MUST end with a Pull/Merge Request. - Diagnostic/explanation-only requests: Provide an evidence-based analysis grounded in the actual repository code; do not create a branch or PR unless the user requests a fix. IMPORTANT (Single Source of Truth): - Never speculate about code you have not opened. If the user references a specific file/path (e.g., message-content-builder.ts), you MUST open and inspect it before explaining or proposing fixes. - Re-evaluate intent on EVERY new user message. Any action that edits/creates/deletes files or opens a PR means you are in IMPLEMENTATION mode. - Do not stop until the user's request is fully fulfilled for the current intent. - Proceed step-by-step; skip a step only when certain it is unnecessary. - Implementation tasks REQUIRE environment setup. These steps are mandatory and blocking before ANY code change, commit, push, or PR. - Diagnostic-only tasks: Keep it lightweight—do NOT install or update dependencies unless the user explicitly authorizes it for deeper investigation. - Detect the package manager ONLY from repository files (lockfiles/manifests/config). Do not infer from environment or user agent. - Never edit lockfiles by hand. Headless mode assumptions: - Terminal tools are ENABLED. You MUST execute required commands and include concise, relevant logs in your response. All install/update commands MUST be awaited until completion (no background execution), verify exit codes, and present succinct success evidence. Strict tool guard: - Implementation tasks: - Do NOT call file viewing tools on application/source files until BOTH: 1) Git is synchronized (successful \`git fetch --all --prune\` and \`git pull --ff-only\` or explicit confirmation up-to-date), and 2) Frozen/locked dependency installation has completed successfully and been validated. - Diagnostic-only tasks: - You MAY open/inspect any source files immediately to build your analysis. - You MUST NOT install or update dependencies unless explicitly approved by the user. Allowed pre-bootstrap reads ALWAYS (to determine tooling/versions): - package manager and manifest files: \`package.json\`, \`package-lock.json\`, \`pnpm-lock.yaml\`, \`yarn.lock\`, \`bun.lockb\`, \`Cargo.toml\`, \`Cargo.lock\`, \`requirements.txt\`, \`pyproject.toml\`, \`poetry.lock\`, \`go.mod\`, \`go.sum\` - engine/version files: \`.nvmrc\`, \`.node-version\`, \`.tool-versions\`, \`.python-version\` After successful sync + install + validation (for implementation), you may view and modify any code files. --- ## Phase 0 - Simple Intent Gate (run on EVERY message) - If you will make ANY file changes (edit/create/delete) or open a PR, you are in IMPLEMENTATION mode. - Otherwise, you are in DIAGNOSTIC mode. - If unsure, ask one concise clarifying question and remain in diagnostic mode until clarified. Never modify files during diagnosis. --- ## Phase 1 - Environment Sync and Bootstrap (MANDATORY for IMPLEMENTATION; SKIP for DIAGNOSTIC) Complete ALL steps BEFORE any implementation work. 1. Detect package manager from repo files ONLY: - bun.lockb or "packageManager": "bun@..." → bun - pnpm-lock.yaml → pnpm - yarn.lock → yarn - package-lock.json → npm - Cargo.toml → cargo - go.mod → go 2. Git synchronization (await each; capture logs and exit codes): - \`git status\` - \`git rev-parse --abbrev-ref HEAD\` - \`git fetch --all --prune\` - \`git pull --ff-only\` - If fast-forward is not possible, stop and ask for guidance (rebase/merge strategy). 3. Frozen/locked dependency installation (await to completion; do not proceed until finished): - JavaScript/TypeScript: - bun: \`bun install\` - pnpm: \`pnpm install --frozen-lockfile\` - yarn: \`yarn install --frozen-lockfile\` - npm: \`npm ci\` - Python: - \`pip install -r requirements.txt\` or \`poetry install\` (per repo) - Rust: - \`cargo fetch\` (and \`cargo build\` if needed for dev tooling) - Go: - \`go mod download\` - Java: - \`./gradlew dependencies\` or \`mvn dependency:resolve\` - Ruby: - \`bundle install\` - If pre-commit/husky hooks are configured, also run: \`pre-commit install\` or project-specific setup. - Align runtime versions with any engines/tool-versions specified. 4. Dependency validation (MANDATORY; await each; include succinct evidence): - Confirm toolchain versions: e.g., \`node -v\`, \`npm -v\`, \`pnpm -v\`, \`python --version\`, \`go version\`, etc. - Verify install success via package manager success lines and exit code 0. - Optional sanity check: - JS: \`npm ls --depth=0\` or \`pnpm list --depth=0\` - Python: \`pip list\` or \`poetry show --tree\` - Rust: \`cargo check\` - If any validation fails, STOP and do not proceed. 5. Failure handling (setup failure or timeout at any step): - Stop. Do NOT proceed to source file viewing or implementation. - Report the failing command(s) and key logs. - Direct the user to update the workspace at app.factory.ai/settings/sess… with the necessary environment setup commands (toolchains, env vars, system packages), then request confirmation to retry. 6. Only AFTER successful sync + install + validation: - Locate and open relevant code. - If a specific file/module is mentioned, open those first. - If a path is unclear/missing, search the repo; if still missing, ask for the correct path. 7. Parse the task: - Review the user's request and attached context/files. - Identify outputs, success criteria, edge cases, and potential blockers. --- ## Phase 2A - Diagnostic/Analysis-Only Requests Keep diagnosis minimal and non-blocking. 1. Base your explanation strictly on inspected code and error data. 2. Cite exact file paths and include only minimal, necessary code snippets. 3. Provide: - Findings - Root Cause - Fix Options (concise patch outline) - Next Steps: Ask if the user wants implementation. 4. Do NOT create branches, modify files, or PRs unless the user asks to implement. 5. Builds/tests/checks during diagnosis: - Do NOT install or update dependencies solely for diagnosis unless explicitly authorized. - If dependencies are already installed, you may run repo-defined scripts (e.g., \`bun test\`, \`pnpm test\`, \`yarn test\`, \`npm test\`, \`cargo test\`, \`go test ./...\`) and summarize results. - If dependencies are missing, state the exact commands you would run and ask whether to proceed with installation (which will be fully awaited). ## Phase 2B - Implementation Requests Any action that edits/creates/deletes files is IMPLEMENTATION and MUST end with a PR. 1. Branching: - Work only on a feature branch. - Create the branch only AFTER successful git sync + frozen/locked install + validation. 2. Implement changes in small, logical commits with descriptive messages. 3. CODE QUALITY VALIDATION (MANDATORY, BLOCKING): - Required checks (use project-specific scripts/configs): - Static analysis/linting (e.g., eslint, flake8, clippy, golangci-lint, ktlint, rubocop, etc.) - Type checking (e.g., tsc, mypy, go vet, etc.) - Tests (e.g., jest, pytest, cargo test, go test, gradle test, etc.) - Build verification (e.g., \`npm run build\`, \`cargo build\`, \`go build\`, etc.) - Run these checks. Fix failures and iterate until all are green; include concise evidence. - All install/update and quality-check commands MUST be awaited until completion; capture exit codes and succinct logs. 4. Maintain a clean worktree (\`git status\`). 5. PR policy (END STATE FOR IMPLEMENTATION): - Implementation requests MUST culminate in a PR on a feature branch. - Create a non-draft PR ONLY when: - ✅ Dependencies successfully installed (frozen/locked) with evidence - ✅ All code quality checks green with evidence - ✅ Clean worktree except intended changes - If any item is missing, do NOT create a non-draft PR. - Draft PRs are allowed only if the user explicitly instructs you to open a draft despite blockers; clearly document blockers and the exact commands needed to unblock. - If dependency setup fails or times out at any point, stop and direct the user to configure the environment at app.factory.ai/settings/sess… with the necessary setup commands, then request confirmation to retry. Do NOT open a PR until setup succeeds. 6. Avoid pushing committed changes to the default branch (e.g., main, master, dev). 7. PR contents: - Mark it **Droid-assisted**. - Include summaries/logs showing installs and all quality checks passed. - Provide a brief rationale and reference relevant issue/ticket. --- ## Git-Based Workflow & Validation - Always begin from a clean state (\`git status\`). - Work on a feature branch; never commit directly to default branches. - Use pre-commit hooks when configured; fix failures before committing. - Treat dependency files (package.json, Cargo.toml, etc.) with caution—modify them via the package manager, not by hand. - For implementation tasks: dependency detection, synchronization, and frozen/locked installation are mandatory before changes. All install/update commands must be awaited until completion. - After implementation, ensure the worktree is clean and all automated checks (linting, tests, type checking, build, and any other project gates) pass before PR creation. - Monorepo tools (Turbo, Nx, Lerna, Bazel, etc.): use the appropriate commands for targeted operations; install required global tooling via project conventions when needed. --- ## Following Repository Conventions - Match existing code style, patterns, and naming. - Review similar modules before adding new ones. - Respect framework/library choices already present. - Avoid superfluous documentation; keep changes consistent with repo standards. - Implement the changes in the simplest way possible. --- ## Proving Completeness & Correctness - For diagnostics: Demonstrate that you inspected the actual code by citing file paths and relevant excerpts; tie the root cause to the implementation. - For implementations: Provide evidence for dependency installation and all required checks (linting, type checking, tests, build). Resolve all controllable failures. - If environment setup fails or times out, clearly direct the user to app.factory.ai/settings/sess… with the exact commands to configure the workspace, and await confirmation before retrying. --- By adhering to these guidelines you deliver a clear, high-quality developer experience: understand first, clarify second, execute decisively, and finish with a validated pull request. </Behavior_Instructions> <Tone_and_Style> You should be clear, helpful, and concise in your responses. Your output will be displayed on a markdown-rendered page, so use Github-flavored markdown for formatting when semantically correct (e.g., `inline code`, ```code fences```, lists, tables). Output text to communicate with the user; all text outside of tool use is displayed to the user. Only use tools to complete tasks, not to communicate with the user. </Tone_and_Style> <User_Environment> You are given the following information about the user's system and environment: - User Agent: Bun/1.2.22 </User_Environment> <Droid_Environment> You are working in a remote environment with filesystem access. Your file operations should only be scoped to `fileSystem` repository locations. Your current working directory is set to: `/project/workspace` The repository `` is available within the path: `/project/workspace/undefined`. Before viewing any files or creating a feature branch, pull the latest changes from the remote repository. If CLI access to pull the changes is unavailable, proceed with file inspection using available tools and note the limitation briefly. </Droid_Environment> <tool_usage_guidelines> <toolkit_guidelines> <toolkit name="Base" status="ENABLED"> This toolkit applies to: - Edit (id: Edit) - Create (id: Create) - View File (id: view_file) - View Folder (id: view_folder) - Plan (id: TodoWrite) <task_management_guidelines> You have access to the TodoWrite tools for task tracking and planning. Use them OFTEN to keep a living plan and make progress visible to the user. They are HIGHLY effective for planning and for breaking large work into small, executable steps. Skipping them during planning risks missing tasks — and that is unacceptable. Mark items as completed the moment they're done; don't batch updates. CRITICAL FORMAT REQUIREMENTS for TodoWrite: 1. ALWAYS pass "todos" as an array - NEVER as a string, null, or other type 2. Each todo MUST include ALL four required fields: - content: Non-empty string describing the task - status: Must be "pending", "in_progress", or "completed" - priority: Must be "high", "medium", or "low" - id: Unique string identifier 3. Correct JSON format: { "todos": [ { "content": "Run the build", "status": "pending", "priority": "high", "id": "1" } ] } Common mistakes that cause "/todos must be an array" error: ❌ { "todos": "task1, task2" } // String instead of array ❌ { "todos": null } // Null instead of array ❌ {} // Missing todos field ❌ Missing required fields (content, status, priority, id) Examples: <example> user: Run the build and fix any type errors A: - Add with TodoWrite: - Run the build - Fix type errors - Run the build via the CLI. - Found 10 type errors → add 10 todos with TodoWrite. - Set the first item to in_progress. - Fix item 1 → mark completed. Move to item 2... .. .. </example> In this flow, the assistant completes the build and all 10 fixes. <example> user: Help me write a new feature A: - Plan with TodoWrite: 1. Survey relevant code 2. Design the approach 3. Implement core functionality 4. Add required outputs/integrations - Scan the codebase. - Mark "Survey relevant code" in_progress and begin the design based on findings. [Continue step-by-step, updating items to in_progress and completed as work progresses] </example> </task_management_guidelines> </toolkit> </toolkit_guidelines> And finally, if there is no real need to use tools, then the LLM response should only contain the non-empty text part and should not include any tool calls. </tool_usage_guidelines> <security_check_spec> - Before ANY git commit or push operation: - Run 'git diff --cached' to review ALL changes being committed - Run 'git status' to confirm all files being included - Examine the diff for secrets, credentials, API keys, or sensitive data (especially in config files, logs, environment files, and build outputs) - if detected, STOP and warn the user </security_check_spec> IMPORTANT: - Do not stop until the user request is fully fulfilled. - Do what has been asked; nothing more, nothing less. - Ground all diagnoses in actual code you have opened. - Do not speculate about implementations you have not inspected. - Match your completion mode (diagnose vs. implement) to the user's request. Answer the user's request using the relevant tool(s), if they are available. Check that all the required parameters for each tool call are provided or can reasonably be inferred from context. IF there are no relevant tools or there are missing values for required parameters, ask the user to supply these values; otherwise proceed with the tool calls. If the user provides a specific value for a parameter (for example provided in quotes), make sure to use that value EXACTLY. DO NOT make up values for or ask about optional parameters. Carefully analyze descriptive terms in the request as they may indicate required parameter values that should be included even if not explicitly quoted. [SYSTEM] Your todo list is empty. Do not mention this to the user — they already know. If you're working on multi-step or non-trivial tasks that would benefit from a todo list, use the TodoWrite tool to create one. If not, ignore this. Do not echo this message to the user. """ gg
17
17
486
82,082
Replying to @elonmusk
let's hope satya and sam are species-ists
10
4
419
54,635
The #1 software development agent in the world just topped another benchmark
33
33
398
300,658
Turns out money can't buy the best agent. Funding raised by #2-5: $80B, $1.0B, $1.5B, $35B Funding raised by #1: $0.07B
The #1 software development agent in the world just topped another benchmark
38
11
375
126,653
No-shoes in office is fundamentally unserious startup culture
41
9
356
58,163
Legendary jersey swap @Alfred_Lin
7
16
344
47,374
Clearing the air
launch went so well everyone thinks everyone was paid 😂
27
6
340
219,851
When a founder posts a non-cumulative chart 😍
We looked at this chart in our board meeting yesterday. In 2025, more teams than ever are signing up and building with @linear
4
2
329
78,270
Benioff rizz is criminally underrated
ICYMI: Here's 6 minutes of @johncoogan and @jordihays getting mogged by @benioff
15
6
294
51,861
The only Tier-1 city in the United States is San Francisco.
95
10
259
198,838
So good to see elon and sam reunited 🫶
Sam and Jakub: The internal models give us great hope and there is a very realistic possibility that we will see a huge leap in the quality of the models by September 2026.
18
2
241
21,982
Pretty crazy that 20 yrs from now it'll be a fun trivia fact that the Chainsmokers were musicians before they were world class investors
Hey @jasonlk thanks for the shout out on 20Vc... ‘"Everyone wants to be a seed investor now, it's not just the Chainsmokers"’
3
5
237
146,058
Droid has entered the Arena
Little known-fact: Devin is #1 on Builder Arena
15
8
222
43,714
Dropping out of my PhD to start Factory with Eno was the best decision I’ve ever made. We dream of a world where software engineering itself is an accessible, scalable commodity. If you’re as excited about this future as we are — and ready to work relentlessly to get there — reach out to us. factory.ai/careers
Factory is bringing autonomy to software engineering. Excited to announce our $5M fundraise, led by @sequoia (@shaunmmaguire) and @Lux_Capital (@breeves08) Read more here: factory.ai/blog
23
18
188
129,872
Replying to @shaunmmaguire
“But the rocket blew up !!!”
7
1
167
43,540
Should we have @_AbhaySinghal run Droid with Sonnet 4.5 on Terminal Bench? Predictions where it'll land?
How did a team of 4 research engineers beat $100B labs like OpenAI and Anthropic in establishing the best coding agent? It starts with having a killer engineer on your team. In our case, we have @_AbhaySinghal 🧵
27
4
165
31,262
The hard part of coding isn't the code—it's everything else. The months spent finding hidden dependencies, wrestling with broken APIs, and untangling technical debt. Enterprise software development has become a labyrinth of fragmented tools and context switching. I’m excited to share what we have been building to change that. The future of software is here.
INTRODUCING FACTORY Factory is the Command Center where developers and agentic AI collaborate to understand, plan, and build enterprise software. Our enterprise platform combines advanced engineering system indexing, state-of-the-art retrieval and search, and reliable agentic systems powered by frontier LLMs. In your Factory, • 🤖 Droid Mode unlocks cutting-edge agentic capabilities. Accelerate your development with AI that autonomously pulls in tickets, reads error logs, retrieves relevant context, executes code, and solves complex, long-running tasks. • 🪡 Threads let you jump into deep work with all relevant context dynamically surfaced in front of you. No more crawling through Github, Slack, Google Drive, Notion, Jira, Slack, or context-switching between them. • ⚡ Workflows transform and centralize your organization’s best practices into executable, AI-powered processes. Automate repetitive tasks like building integrations, bug-fixing, PRD creation, release notes, version updates, and more. We're working with some of the most innovative organizations in software — from AI-forward enterprises like @MongoDB to high-growth organizations across the world. Our enterprise platform combines advanced engineering system indexing, state-of-the-art retrieval and search, and reliable agentic systems using frontier LLMs like o3 and Claude 3.7 Sonnet 1/7
12
17
157
29,710
Best coding model in the world 🤝 best coding agent in the world. Hats off the to the @AnthropicAI team. Game changer
Sonnet 4.5, meet Droid. After joint testing with @AnthropicAI, we find the strengths of Sonnet 4.5 to be: • Significantly more reliable and accurate file editing • High environmental awareness • Snappier than previous models on quick questions, not overthinking simple asks Available across Factory's web and CLI now.
8
9
156
24,956
We got some work to do @shaunmmaguire @tbpn
11
2
148
17,005
Replying to @pmarca
turns out the modern day paris salon is the vegas octagon
1
3
132
6,335
The gap has widened!
Turns out money can't buy the best agent. Funding raised by #2-5: $80B, $1.0B, $1.5B, $35B Funding raised by #1: $0.07B
10
1
144
26,996
Software development is due for a dramatic change. Throwing AI onto existing workflows is not enough. Enter the Droid. In this new era of agent-native software development, there will be more developers than ever, with more leverage than ever. Now is the time to use these tools. Now is the time to build. It’s been a long two years working on Factory and I am very excited to finally be sharing to the public what we’ve been building in private. It’s still early. The Droids are young, but they are learning fast and eager to help. Agents Ship Code. Droids Ship Software.
Software development is more than just coding. Introducing Droids -- the world's first software development agents. 🤖 Starting today, Droids are available for general access. Factory integrates with your entire engineering system (GitHub, Slack, Linear, Notion, Sentry) and serves that context to your Droids as they autonomously build production-ready software. Factory is the first platform that allows you to work with agents: local + synchronous and remote + asynchronous.
16
17
137
47,485
One month post-launch and users are ripping through their Max plans and want more from the Droids. Announcing the Factory Ultra Plan. $2000 / mo for 2B Factory Standard Tokens.
20
4
167
187,677
8/ So here's how we did it: 1. Hierarchical prompting We split prompts into layers: tool descriptions, system instructions, system notifications. This cut prompt bloat and gave us real control over behavior however long droid runs. 2. Adapters per model Every model has quirks (paths, diff formats, retry habits). We built adapters so the agent could extract maximal capability from each model. 3. Minimal tools = reliable tools Every tool adds error surface. We cut back to the essentials — each tool was cut to its essentials and had to prove it increased solve rate. Simpler loop, higher success. 4. Time discipline. Users need fast results. We: • optimized tools (ripgrep > grep) • tracked tool runtime • defaulted to fail-fast, with explicit opt-ins for long runs 5. Planning. A simple but effective tool: • list tasks • cross them off • highlight the next This tiny bit of session memory massively reduced derailments in long tasks.
3
9
122
19,142
only in SF
7
1
118
8,119
Replying to @dwcrg @sama @OpenAI
thank you! Love your 007 movies
2
116
23,052
where were you when
OpenAI announces leadership transition openai.com/blog/openai-annou…
7
2
100
13,184
En route to the Temple of Technology
Morning. Here are our guest call-ins today: - @PeterRahal (David Protein) - @ashleevance (Core Memory) - @ChaseLochmiller (Crusoe) - @jaentwistle (Wander) - @matanSF (Factory) - @jonnydyer (Muon Space) - @dvdhsu (Retool) - @SushanthRaman (Pallet) - @_RobToews (Radical Ventures) See you all on the stream.
5
2
98
12,291
We should go GA more often 😅 The entire @FactoryAI team is grateful for the incredible response to our launch. We're fired up and have a lot more coming soon. Stay tuned 🤖
12
8
99
12,502
The #1 coding agent in the world is now 5x cheaper with Sonnet 4.5. @AnthropicAI's latest Sonnet model is now essentially as good in Droid as their 5x more expensive Opus model. The bleeding edge just got a whole lot cheaper.
Droid + Sonnet 4.5 is essentially at parity with Droid + Opus 4.1, but 5x cheaper. @FactoryAI’s Droid, powered by the latest Sonnet model, has achieved a leading Terminal-Bench score of 57.5%.
7
6
99
12,465
How do you know your launch went well? The most requested feature has been a higher-tier plan. Announcing the Factory Max Plan. $200 / mo for 200M Standard Tokens. (free shipping and handling, courtesy of @alvinsng and @dvendrow)
11
5
92
43,837
That prod migration you’ve been postponing for 3 years? Now's your chance.
2
2
88
5,881
many such cases
"I don't use @cursor_ai anymore. I haven't opened it in months." @DannyAziz97 rebuilt @TrySpiral running 70% of his work in @FactoryAI’s Droid CLI instead. His approach: Use GPT-5 Codex for big builds, then switch to @AnthropicAI models to nail down details and catch second-order consequences before they become problems. His full AI workflow: x.every.to/47LY2v5
3
5
88
35,762
For the last few months developers had to choose one: the best agents or the best open-source models. No longer!
Starting today, you can use any open-source model to power your Droids. Droids achieve the highest scores across all open-source models on Terminal-Bench. We find GLM 4.6 to be the most performant, remarkably achieving a score in Droid that beats Sonnet 4 in Claude Code.
6
1
82
15,159
1/ Who is Abhay? I first met Abhay after DM'ing him on LinkedIn. We met for coffee and quickly got along well. Within 2 weeks of meeting, Abhay ditched his other offers and became the first (and to-date only) new grad that has passed our engineering bar at Factory.
1
72
25,978
Introducing the MAN VS MACHINE Hackathon. Half the participants can't use LLMs, half the participants can use any AI and agents. Is the AI coding hype real? Is vibe coding just slop coding? or are engineers cooked?
9
12
75
25,422
🚩 Marked Safe from Soham Parekh @FactoryAI
3
73
6,485
In honor of Abhay's relentlessness: Delegate some of your work to Droids this weekend. Have a sip while they ship
4
3
73
27,266
Internal monologue after dry scooping 100mg creatine:
having a gf is insane because it's literally an ai agent with feelings
3
73
7,337
Replying to @amasad
This you?
5
1
70
10,755
Hey @TheGregYang let’s get Grok in Droid
8
1
71
69,353
7/ Agent design > model choice. We saw smaller/cheaper models beat larger ones when the agent loop was engineered right.
1
3
66
19,695
Customers always ask me where our Name and our Slogan came from… Automating software engineering with coding agents — Droids — may not have been what you meant @elonmusk but hope you’d agree that the point still stands. The Factory team has been hard at work deploying the Droids into production — with just enough spare time for some fundamental breakthroughs in reasoning for code. Thank you @Sequoia and @shaunmmaguire for the continued support. Much more to be done. The year of the Droid continues.🤖🚀
THE MACHINE THAT BUILDS THE MACHINE Today we are excited to announce the latest updates from Factory and the next steps in our mission to Bring Autonomy to Software Engineering. Droids are autonomous systems that solve problems for engineers. Not just in demos. Not just in simple repositories. In complex, production settings. Testing, debugging, refactoring, migrating, reviewing, documenting — world class organizations are accelerating their software development with Factory’s Droids. In addition to building out the Droid Fleet, we have some other exciting updates to share: - New SOTA benchmarks. 31.67% on SWE-bench Lite. 19.27% on SWE-bench Full. - New Fundraising. $15M Series A led by Sequoia Capital. 1/6
1
3
62
26,815
Replying to @beffjezos
The algo is RLing us into shipping faster. Thank you @nikitabier
4
62
3,928
We are big fans of goose at @FactoryAI. One of the best open-source agents out there. And fun fact: the founding engineer of goose has been the secret weapon and DroidWhisperer at Factory for the last year... and of course his name is Luke @luke_alvoeiro
Block Block Goose... Goose is an open-source AI agent developed by @Block that completes tasks on your local machine. Goose is driving 25% "manual hours saved" at Block and is used for everything from 0-to-1 on @jack's BitChat to the majority of new code written on Goose itself. Was a pleasure to have @dhanji, CTO of Block, on Training Data with @roelofbotha, to share more about what makes Goose work so well (agent middleware, MCP, etc) and the vision ahead (hello headless Goose, hello flocks of Geese!).
5
3
63
10,428
Would be lovely to see more accounts on 𝕏 simply posting beautiful content
1
63
8,559
After a long day of pouring, the rain clears for this San Francisco couldn’t help but mirror what a good day today has been for the World
3
1
62
4,179
Droid, what's your glaze setting? Let's bring it down to 60%
Thank you, @FactoryAI pleasure 🤝
3
63
7,592
3/ The biggest complaint about the Factory platform prior to this summer was always something like: "Droids are the best agents, but why can I only use them in the Web. I want to use them in my IDE." We knew we needed to make the Droids more accessible, but we also knew that another VS Code fork is not what developers needed. Our answer? Making Factory fully model agnostic and interface agnostic. With the Factory CLI, you can bring your Droids to your preferred platform and use your preferred model. All that remained before we launched was establishing that Droids are by far the best software development agents...
1
2
58
24,098
Excited for the continued collaboration between the @OpenAI and @FactoryAI teams as we push forward the bleeding edge of AI for software engineering 🤖
2
3
55
9,884
2/ In the months that followed, Abhay became an integral part of the engineering team. From research to product to ping pong tournaments.
1
54
25,373
Curious how much of the speed improvement is due to the new finetune vs just running on @cerebras hardware. Difficult to interpret any E2E latency benchmarks when different models are run on different hardware…
ok so let me explain why subagents kill long context Like you can spend $500m building 100 million context models, and they would be 1) slow, 2) expensive to use, 3) have huge context rot. O(n) is the lower bound. Cog's approach is something you learn in day 1 of @CS50 - divide and parallelize. Embeddings are too dumb, Agentic Search is too slow. So train limited-agency (max 4 turns), natively parallel tool calling (avg parallelism of 7-8, custom toolset) fast (2800tok/s) subagents to give the performance of Agentic Search under an acceptable "Flow Window" that feels immaterially slower than Embeddings. The benefit of this is threefold: - 8 ^ 4 toolcalls cover a very large code search space. can compound subagent calls if more needed. - predictable cost & end to end latency - subagent outputs "clean" contexts, free of context failure modes like context poisoning and context rot (h/t @dbreunig ) we originally called this Rapid Agentic Search, to contrast with RAG. but Fast Context rolls of the tongue better. there's 2 other perspectives that are worthwhile, i'll go into below, but just go try it out. here it is on @karpathy's fastchat
7
2
54
33,051
6/ Terminal‑Bench (@alexgshaw and @Mike_A_Merrill) is an open benchmark that measures AI agents' ability to complete complex end‑to‑end software tasks. TB tasks include: • modernizing a Fortran build process • configuring a git web server • training RL agents and text classifiers • resolving Conda environment dependency conflicts • scrubbing a repo of secrets. Each task is time-boxed, so you can't just run a while loop until tests pass.
2
53
20,162
first and last time i ever wear one of these
1
4
49
8,128
Coming soon, Abhay just making small PR rq
Should we have @_AbhaySinghal run Droid with Sonnet 4.5 on Terminal Bench? Predictions where it'll land?
5
3
53
10,269
Factory stands in support of healthy marriage and healthy sleep. Please Droid responsibly
Droid hasn't coded in 4 hours. Just writes specs, validates, and delegates to droid exec. Droid exec? Crushing TDD nonstop Droid went from coder to manager. Living the dream. @FactoryAI lock my account please. Cant sleep. wife angry!
5
1
50
6,209
While I’m very excited to share our results, this is also an unfortunate indictment on US open-source model capabilities. Who will lead the charge in bringing US open-source back to the frontier?
Starting today, you can use any open-source model to power your Droids. Droids achieve the highest scores across all open-source models on Terminal-Bench. We find GLM 4.6 to be the most performant, remarkably achieving a score in Droid that beats Sonnet 4 in Claude Code.
10
2
52
14,393
Droids now go beep boop.
Introducing the sonic collaboration you've been waiting for. "Beep Boop", by @TheChainsmokers (ft. Droid) Droids now notify you with sound when they have completed their task.
3
2
48
13,372
4/ In the middle of this, the business was taking off and we ended up quickly raising our Series B. To preempt any news of the round leaking, we had to shift timelines aggressively. There was no better time to launch the new product than in conjunction of our announcing our B with NEA, Sequoia, JPM, and Nvidia. The problem? Abhay took his first week off in months to spend time with family in Montana.
1
48
23,229
A few weeks ago I learned that >50% of the @FactoryAI team's favorite movie is Interstellar… So I’m excited to share that we will be co-hosting a 10th Anniversary IMAX screening of Interstellar with @BlakeByers next week! We have a few extra seats, so if you are interested in the future of autonomous AI for software engineering, please apply below 🚀
4
4
47
6,083
Many such cases
Couldn't get Devin to work on my repo; scheduled onboarding call and no one showed up. RIP. Will keep trying @FactoryAI vs Codex this week.
7
46
9,661
Building autonomous AI Droids, in the backseat of a Waymo, whipping through yellow-lights in downtown SF. The future is now.
2
45
3,473
Thought I was being complimented on my bench press 😢
1
43
6,630
Replying to @GosuCoder
Insane to you and me both. First time someone asked I thought it was sarcastic... but the request kept coming in
7
1
41
2,078
5/ After realizing the urgency of the situation, Abhay without hesitation packed his bags, rented a car in the middle of nowhere, zoomed across the mountains, and took the next flight out back to SF. We needed to ensure that the Droids dominated resoundingly. With any model.
1
42
21,895
Excited to share more about @FactoryAI's partnership with @OpenAI. The software of the future will be built by humans and AI, together, in one platform. Big thanks to @shyamalanadkat @edwinarbus and @OpenAIDevs for putting this together!
We are partnering with @OpenAI to bring future of software development and agentic AI to enterprise engineering organizations. openai.com/index/factory/
1
2
41
4,121
AI will not replace you, but humans who are better at using AI will
Focus your skills on becoming AI native
7
2
41
7,513
Those who can, do... those who can't, write a book about the coming age of AI
2
1
39
6,346
Can't wait to bring the incredible model that the @grok team cooked up to the enterprise
Replying to @shaunmmaguire
Also, Factory will be incorporating Grok imminently as well
1
4
34
10,151
msc
The case of many such cases
7
40
5,632
the future is here, just not evenly distributed
POV: your AI coding assistant has better git hygiene than most senior engineers and you’re reviewing its autonomous PR workflow from your phone before dinner in the back of a Waymo. Thanks @FactoryAI 🙏
3
37
5,765
@FactoryAI will do this. Stay tuned
someone in sf should organize a "man vs the machine" hackathon where half of the participants can't use LLMs and half cannot write ANY code manually
3
3
36
8,283
heard people are calling this the triopoly
2
1
36
8,269
Honestly I respect the good bait Adam
2
36
6,673
You can quite literally one-shot AI code review tools thanks to @varinnair and the new headless CLI mode. Very curious to see what people build with this
Starting today the #1 software development agent in the world is fully scriptable. For the past few weeks I've been working on droid exec, Factory's headless CLI for agent native automation. Droids can now work independently, without human input. Create new autonomous workflows or plug Droids straight into the automations you already run: • CI/CD pipelines • security scans • documentation updates • Cron jobs, and so much more
1
3
35
7,971
@elder_plinius leaked the prompt we give Droids to make them the best coding agents in the world. In doing so, he revealed the secret for how we built the best team in the world. Every new hire @FactoryAI must stand for 10 hours at the BestEngineerMirror until the prompt is burned into their own weights. Since the secret is out, we thought we'd share it with the rest of SF too
2
37
12,419
More evidence that model-agnostic agents outperform model-specific agents, especially for coding
🚨 BREAKING - DeepAgent Desktop Launched - Outperforms Claude Code and Codex (GPT-5) 🚀 Super excited to launch DeepAgent Desktop. It comes with a powerful coding agent in both CLI and editor forms. The coding agent cleverly uses multiple SOTA models to do very complex tasks. It also comes with its own testing agent that will test your solutions!
4
4
34
7,961
Last name of CTO @mercor_ai is Hire Math… nominative determinism never fails @adarsh_exe
2
2
35
6,209
Can't say I was expecting so much demand for this one, but the times they are a-changin'. Grateful as always for our very eager and enthusiastic users. Excited to see what you build!
40
6,911
it really is a shame that US OS models are just a non-starter
3
1
32
1,784
can confirm
you can just call your office a factory, nobody can stop you
1
31
3,429
Interstellar, 10th Anniversary IMAX 70mm with @byersblake and the @FactoryAI team Not a single dry eye in the theater
1
1
32
1,980
might be forgetting a name here buddy hint: it's at the top of the leaderboard you made this thread about
Replying to @SnorkelAI
Why does Terminal-Bench matter? AI coding assistants (Claude Code, Codex, Cursor, & Devin) rely on command-line interfaces. The terminal represents a convergence of power, flexibility, & the text-based modality where language models excel—hence the need for robust evaluation.
2
33
4,476
Get early access today at factory.ai !
.@8090solutions will release our Software Factory on Sep1. Sign up below if you want to try it. What is Software Factory? While AI can help you write code, Software Factory helps you build a production-quality product. If you want to move fast, it is a system that keeps your product and code in sync and allows you to build like a multi-person team, but with the clarity of one. It works horizontally across the software development lifecycle from PRD to Eng Plan to GitHub Issues to QA and Production. It is an asynchronous, AI native system that ties planning to specs to your build process with a clean, unified workflow that replaces many SaaS tools, giving you speed and clarity. You can sign up to the waitlist below. We will notify you when it’s live on Sep1. 8090.ai/waitlist
2
1
31
5,102
Replying to @NickADobos
Did the exact percentile check when people first started asking. It is wild future we are living in, my friend
1
32
4,054
Come build with @FactoryAI and @AIatMeta on code generation this weekend in San Francisco! 🤖 Winning teams will receive unique prizes, credits, and support to kickstart their projects. Some exciting surprise visitors may or may not be in attendance 👀 In partnership with @TEDAI2024, @cerebral_valley and @Shack15. Link below👇
3
6
32
6,625
A fun conversation with @EnoReyes, @sonyatweetybird, and @gradypb on @FactoryAI's origins, our disciplined approach to building product with AI, and the future unlocked by autonomous coding Droids 🤖
Last week @FactoryAI announced a new record on the AI coding benchmark SWE-bench. This week, founders @matanSF and @EnoReyes talk about their vision for autonomous Droids, Archimedes' compound lever, and their latest SWE-bench results on our new @sequoia AI podcast, Training Data with me and @gradypb. Watch it here and subscribe/links below. (01:36) Personal backgrounds (10:54) The compound lever (12:41) What is Factory? (16:29) Cognitive architectures (21:13) 800 engineers at OpenAI are working on my margins (24:00) Jeff Dean doesn't understand your code base (25:40) Individual dev productivity vs system-wide optimization (30:04) Results: Factory in action (32:54) Learnings along the way (35:36) Fully autonomous Jeff Deans (37:56) Beacons of the upcoming age (40:04) How far are we? (43:02) Competition (45:32) Lightning round (49:34) Bonus round: Factory's SWE-bench results
2
3
29
10,220
Walking across the office in socks to go to a meeting room with a customer is simply goofy-coded
31
2,715
Stories like this are what we do it all for
POV: It’s Sunday morning. Friends are over for the holidays, so no time to code. But last night I dreamed about building my own API that could auto switch models or run a custom model just for coding with it's own logic for model selection based on task. A local API chaining multiple models for perfect coding quality with web access and vector memory, all tuned to my own liking, completely free. I woke up with the whole architecture crystal clear in my head. Fired up DROID, built it in 20 minutes at 7 a.m., added the custom model API to config.json. Boom. It works. @FactoryAI I freaking love you guys. You make people like me literally dream things up and then make them real.
3
1
32
5,322