The best way to build software with agents: @superconductor • Previously co-founded @gradescope • PhD @berkeley_ai

Bay Area
Anthropic is elf-coded, OpenAI is orc-coded, xAI is dwarf-coded, and Google DeepMind is human-coded. This leaves an opportunity for a hobbit-coded research lab.
202
276
5,536
480,396
Here's a brief glimpse of our INCREDIBLE near future. GPT-3 armed with a Python interpreter can · do exact math · make API requests · answer in unprecedented ways Thanks to @goodside and @amasad for the idea and repl! Play with it: replit.com/@SergeyKarayev/gp…
79
634
3,844
AI research is converging on a major finding: language models are a great substrate for all AI applications. This feels like a HUGE deal. Some examples:
49
541
3,359
Guys I think I figured out wtf happened in 1971
75
180
2,807
1,393,963
The future: · Write emails with bullet points, which an AI assistant automatically expands into beautiful long text. · Read emails by having an AI assistant summarize long-ass text into bullet points...
61
230
2,435
Google has no moat. They don't have over 90% search traffic. They don't have everyone's emails and the most used email client. Their OS is not powering 70% of smartphones. They will never be able to deploy LLM features into these products -- instead, people will run OSS LLMs.
82
110
2,337
998,466
Okay so OpenAI board is · Ilya, got it, makes sense · Helen Toner, DC policy person, fine · Adam D'Angelo, CEO of Quora, okay I guess but why though? · Tasha McCauley, "tech entrepreneur" and funny enough also wife of Joseph Gordon-Levitt, how did this board come together?
81
83
1,584
461,929
I'm ready to pay much more than $20/month for a coding copilot that is 10x as good as GitHub Copilot or Cursor. I WANT to pay more. Load my entire repo into Gemini 1.5 context and cache it. Automatically review all my PRs. Charge me $200/month. Charge me $2000/month!
109
74
1,526
201,177
Got nerd-sniped by the OpenAI Board of Directors. Here's everyone who's ever been on it, their claim to fame, and why they left.
50
121
1,335
450,348
Did you know that Claude Code can use the browser to QA its own work? 1. Run `claude mcp add playwright -- npx -y @playwright/mcp@latest` 2. Tell Claude where your app is running, e.g localhost:8000 3. Now Claude can click and type to make sure its code is actually working!
35
92
1,237
107,457
Now that our GPT-3 can execute code on @Replit, let's teach it to: · Google stuff · Read web pages · ✨Ask GPT-3 questions✨ That's right -- we're going RECURSIVE.
19
148
1,189
GitHub Copilot was released three years ago. In these three years, GitHub still hasn't shipped automated PR description, review, multi-file editing, test gen, etc. Perhaps because Nat Friedman stopped being their CEO right after releasing Copilot? Who even is their CEO now?
45
23
1,099
159,393
Conclusion after GPT-4 hacking weekend: Even if there is ZERO further progress in LLM models, software engineering will still be revolutionized in the next couple of years, just through UX and non-ML innovations. Absolutely massive overhang.
15
87
1,060
256,548
What I admire about @karpathy is that he just keeps "doing things that don't scale". Label the entire ImageNet by yourself? Sure. Engineer petabyte-scale data engine for self-driving? Let's do it. Implement GPT from scratch? Easy. An inspiring attitude.
12
40
1,033
66,146
Imagine Claude, but: • Inside of your todo list • Much smarter, as it plans and reads stuff asynchronously • Searches both the web and your own email and calendar We built it and it's pretty awesome. Opening up to 50 more folks. Like this tweet and DM me if you want to try!
80
15
874
87,760
Made something I've always wanted to see: a comparison table of all cloud GPU providers! Filter by provider, architecture, exact GPU, etc. Sort by price, RAM, vCPUs, etc. Both on-demand and spot instance prices. fullstackdeeplearning.com/cl…
16
138
818
Request for startup: Amazon, but for getting rid of stuff. It’s super easy to get stuff into your home: just click Buy. It’s harder to get stuff out. Electronics should be recycled, valuable things should be sold, bulky things need transport. I’d pay to not worry about it.
44
51
745
Best meme format
3
75
699
Replying to @karpathy
We had a unique one but we fumbled her…
10
18
763
103,526
I wanted to better understand how Claude Code is wired under the hood, so I captured its API requests and pulled out the system prompt and tool definitions. Also posting the full thing as a gist below if you want to dig in!
31
57
748
85,532
Web LLM is insane. 1) Go download the latest Chrome beta, which shipped WebGPU support: google.com/chrome/beta/ 2) Now use a 7B-param LLM in your browser! mlc.ai/web-llm/ 3) Marvel at the "How" section on their GitHub: github.com/mlc-ai/web-llm#ho…
7
167
692
121,959
“Did GPT-3 write this?” is such a good insult
28
57
673
118,035
Guaranteed JSON output from any local LLM, with very low overhead! Check out the library and a brief description of the method below the fold. github.com/normal-computing/… "The basic idea is simple: regular expressions have an equivalent Deterministic-Finite Automaton (DFA) representation. We can transform this DFA into a generative model: in each state we get a list of symbols which correspond to completions that partially match the regular expression. We mask the other symbols in the logits returned by a large language model, sample a new symbol and move to the next state." - @remilouf and co at @normalcomputing
21
98
701
170,308
I want VSCode but using an infinite canvas instead of tabs. Does this exist?
46
23
679
291,106
A seriously baller demo: meerkat.wiki · Add a million PDFs to a DataFrame instantly · In-notebook UI to review them in various ways · In-notebook instant LLM training to "flash fill" a new column, with easy review
11
96
653
143,216
Broke: using OpenAI embeddings as-is. Bespoke: learning an embedding projection from human judgements. OpenAI explains that this will "better emphasize aspects of the text relevant to your use case. In binary classification use cases, we've seen error rates drop by ≤ 50%."
10
86
639
486,041
Remember these? Wondering if there is an equivalent adversarial attack on LLMs. (Simple prompt injections is not it — the attack needs to be invisible to a human observer.)
51
47
620
199,253
~1971: US fertility rate dips below replacement and stays there. (Source: wsj.com/articles/u-s-births-…)
15
14
549
109,045
Some things that don't make sense to me: • What made Sonnet 3.5 so much better than Sonnet 3? Is it "Golden Gate Claude" but for being smart and helpful? • Why did Anthropic treat 3.5 (new) as a minor update to 3.5 when in fact it's massively better? • Why is Haiku 3.5 still text-only? Is Claude 3-family not trained multi-modally from the start? • What exactly is the difference between o3 and o1? Did o1 stem from GPT-4 and o3 stem from GPT-5? • Why is Google still the only frontier API with >200K context? And how is it a full OOM ahead of the others?
26
9
585
67,539
Found this from @anas_araid
imagine a figma-like infinite workspace in visual studio code. prototype built using react and @code's api extension.
4
8
415
30,403
By the way, every LLM disagrees with me. They all think that OpenAI are humans, Anthropic are Elves, Google DeepMind are dwarves, and xAI are orcs.
45
13
403
18,180
Cursed thought: what % of GPT-4 training data was generated by GPT-3?
21
21
344
Why does Nvidia still not have their own GPU cloud? Do they dislike money?
49
8
337
576,681
The LLM benchmark we need: ChatGPT-like website that always shows two responses, generated by any two of N different models (user can't see which). The user has to select the better response in order to keep using the chat (it's otherwise free). Leaderboard will be decisive.
12
11
335
57,528
What an elegant way to do object detection: given an image, simply output the sequence of bounding box coordinates and labels as text. Great work from @tingchenai, @srbhsxn, Lala Li, @fleet_dj @geoffreyhinton ai.googleblog.com/2022/04/pi…
7
54
336
Counterpoint
Replying to @sergeykarayev
Kodak had no moat. They didn't have over 90% market share in film photography. They didn't have everyone's personal photographs and the most widely used film camera. They didn't invent digital camera.
4
5
294
97,520
Tried it out, and the new ChatGPT API is not only 10x cheaper but 10x faster, too. Absolutely insane.
12
10
294
55,307
How are you guys making slide presentations? Is there anything better than Keynote, Google Slides, Powerpoint? In particular, is there anything that would be amenable to "pull requests"?
79
29
279
Ask free-form questions and receive free-form answers about a video.
With multiple foundation models “talking to each other”, we can combine commonsense across domains, to do multimodal tasks like zero-shot video Q&A or image captioning, no finetuning needed. Socratic Models: website + code: socraticmodels.github.io paper: arxiv.org/abs/2204.00598
2
18
255
This LLM guidance language from Microsoft is super interesting. Worth a read-through for sure: github.com/microsoft/guidanc…
8
47
253
56,640
But the vast majority of these large models are probably not dedicated to language either, only the data-interface layers are. This paper from @_kevinlu @adityagrover_ @pabbeel @IMordatch suggests that models learn general computation from language data. bair.berkeley.edu/blog/2021/…
1
15
235
I'm reading every week in 2023. Advice threads, GPT-3 demos, war assessments, shitposts, or anything people like a lot. I'll keep adjusting the list. Start on Monday, done by Sunday. Might make lowkey videos of takeaways. If you want to read along, the current list:
5
14
233
23,598
Does this resemble how human cognition happens? My understanding is that the vast majority of human intelligence is not intermediated by language: most processing happens unconsciously, and only the "tip of the iceberg" is in the form of language.
13
14
222
Pretty surprising that ~2 years after OpenAI published GPT-3 and ~1 year after it opened the API up to everyone, there's no real competitor to the davinci tier.
19
14
233
My dream LLM: - 100k token context - $0.00001 per token - very capable & polite - 2023 training data cutoff - rlly funny but a bit weird - rlly kind & is aligned to my values - not derived from LLaMA (self made) - good taste - good listener & planner - loves generating text a LOT
17
15
216
23,871
Receive illustrations from free-form descriptions (DALL-E is combines two different tricks, one of which is a model that embeds text and images into a common space).
DALL·E 2 is here! It can generate images from text, like "teddy bears working on new AI research on the moon in the 1980s". It's so fun, and sometimes beautiful. openai.com/dall-e-2/
3
13
207
Does LLM temperature affect its reasoning ability? This paper finds that it does not. arxiv.org/abs/2402.05201
13
31
207
69,858
There's so much low-hanging fruit here it's simply insane. · Add first-class support for searching the web, parsing HTML · Add "state" to the prompt, allowing new answers to reference previous answers. · Make a Python library to provide uniform interface to a bunch of free APIs
6
5
204
Internet-based AGI is going to achieve its goals in the physical world simply by paying humans to do tasks. Same way corporations get things done. So for alignment purposes, human control over money seems necessary. Need to make sure humans are at both ends of a transaction.
25
12
201
43,183
🍿Live premiere of a brand-new @full_stack_dl lecture on Foundation Models: piped.video/watch?v=Rm11UeGw… · Fine-tuning · Transformers · Large Language Models: BERT, GPT, T5, Chinchilla, and vendors · Prompt Engineering · Code generation, semantic search · CLIP and Image Generation
1
35
188
Here's a question for deep learning practitioners: is it *actually cheaper* to use cheaper GPUs like V100's vs expensive GPUs like A100's? - 8xA100 machine is $32.77/hour (on AWS) - 4xV100 machine is $12.24/hour BUT! Instead of thinking per-hour, let's think per-experiment:
8
31
183
Thanks, 💪Chad GPT!
6
11
176
Language User Interfaces (LUIs) are the future. Here are some patterns we know and love -- and some new ideas! 🌀 Auto-Complete (Copilot) 🌀 One-on-one Chat (ChatGPT) 🌀 Command Palette (Replit Ghostwriter) 💡 Command Suggestion 💡 Multi-player Chat 💡 GitHub UX Some examples:
10
23
183
43,546
Keep coming back to this. If you were certain that GPT-X, available January 2025, could do most knowledge work as well as a human, what would you be doing differently today?
still nobody believes in AGI. there is so much alpha in believing in AGI
25
6
169
84,860
Get working code from a free-form description of a function. And this is from a model that was 95% trained on general language data, not code specifically.
Introducing the 540 billion parameter Pathways Language Model. Trained on two Cloud #TPU v4 pods, it achieves state-of-the-art performance on benchmarks and shows exciting capabilities like mathematical reasoning, code writing, and even explaining jokes. goo.gle/3j6eMnK
2
9
158
My AI assistant expanding terse bullet points into beautiful prose: Haha fuck yeah!!! Yes!! Your AI assistant having to summarize beautiful prose into terse bullet points: Well this fucking sucks. What the fuck.
2
15
156
Okay I asked all the frontier models and here are the results. • o1 thinks of OpenAI as Men • Claude thinks of Anthropic as Elves • Gemini thinks of Google as Orcs 😬 • Grok 3 thinks of xAI as Hobbits • R1 thinks of DeepSeek as Hobbits
15
12
162
11,107
Excellent post explaining what it took to train a GPT-3 sized model: - 384 A100 GPUs (30TB RAM), across 48 nodes - ZeRO data parallelism + pipeline parallelism from Deepspeed - Tensor parallelism + custom kernels from Megatron-LM - a new BF16Optimizer - 24/7 training-sitting😅
The Technology Behind BLOOM Training🌸 Discover how @BigscienceW used @MSFTResearch DeepSpeed + @nvidia Megatron-LM technologies to train the World's Largest Open Multilingual Language Model (BLOOM): huggingface.co/blog/bloom-me…
5
23
162
Superconductor: Manage an entire team of Claude Code agents, right from your phone or laptop. • Write informal tickets • Spin up MANY agents for each ticket • Each agent has its own live app preview • One-click PR the best one! Like this post and request early access👇
21
11
176
34,904
Text is the universal interface. I love reading movies, playing book games, taking my dog for a neighborhood read, driving to beautiful nature texts, and reading at nice restaurants.
3
10
156
56,322
@ericjang11 recently proposed that language == generalization and suggests some ideas stemming from that in a nice post. evjang.com/2021/12/17/lang-g…
2
9
148
Great blog post covering the ins and outs of DALL-E, CLIP, GLIDE (another great model from OpenAI that didn't get its own press), and DALL-E 2. blog.inten.to/openai-and-the…
34
152
Prompt engineering feels bad. Such an uncomfortable middle ground between writing actual code and delegating to a human.
15
5
142
The deep learning community never developed good tools for fine-tuning, but the game has already moved on. Now we need good tools for few- and zero-shot learning. Who's working on this?
9
8
136
Replying to @liyucheng_2
🫡
5
1
122
56,450
Here is a screenshot of the entire prompt, code, and a sample execution run. You can fork it and play with it yourself at replit.com/@SergeyKarayev/gp…
3
12
131
To me, this is the best real estate in the world. Whole hobbit-holes, a few minutes' walk to the Green Dragon Inn, no Orcs, still connected by the Great East Road and complete privacy. Current entry-level price: 10,000 silver pennies.
3
11
102
15,825
Happy Meme Monday!
19
125
Teaching in the GPT age absolutely requires the "flipped classroom" model: · Assign reading chapters / watching lectures as homework. Students can use as much AI as they want. · Assess understanding in class. No AI allowed.
7
17
121
35,503
Don't mean to suggest she's not a great tech entrepreneur, just that I think an OpenAI director needs a little bit more of a known title? Maybe I don't know how boards work.
4
1
113
49,529
You are just a bunch of cells talking with each other, and yet you're "conscious" and "sentient." Why is your company not sentient? Or the Earth? Or Claude?
53
7
115
11,361
An exciting second day of @full_stack_dl LLM bootcamp! @charles_irl, @josh_tobin_, and I are truly honored to host 300 language modelers from around the world. Looking forward to bringing the materials to more people — stay tuned!
1
10
114
12,878
And notably, we haven't seen a GPT-3 like interface for non-generative vision tasks yet. As a computer vision guy at heart, this is most exciting to imagine. More on that in a future thread.
10
4
106
The good people at @brexHQ published a great guide to prompting! Going to thread some highlights below, but make sure to check out the full guide: github.com/brexhq/prompt-eng… Read on for increasingly sophisticated prompt techniques:
1
23
109
12,931
Some non-ML eng ideas: 💡Whole-repo understanding via embedding everything or fine tuning 💡 Automatically run suggested code and have model iterate on potential errors before you actually see the suggestions 💡 In similar vein, allow model to take other actions, such as reading webpages 💡 Build up a high quality library of things that are difficult for model to code correctly, moving the model up the ladder of abstraction (eg model can just write abstract.ocr_image() instead of knowing how best to ocr an image)
1
6
108
15,419
Ways to instantly get GPU-enabled JupyterLab instances, in order of additional features to vanilla - @DeepnoteHQ - @kaggle - @HelloPaperspace Gradient - @GoogleColab - @saturn_cloud - @awscloud Sage Maker notebooks - @googlecloud AI notebooks - @jarvislabsai - ...
3
24
106
Every week, GPT exhibits some new AGI behavior. And each time, a bunch of commenters respond with "it's just completing text in a statistically likely way." This longread from @repligate helped me understand why that is not a useful perspective. generative.ink/posts/simulat…
4
7
101
18,434
________ is all you need. ( ) Convolution ( ) Attention ( ) MLP-Mixer (X) A single hidden layer (infinitely wide)
4
8
101
Handwriting recognition is crucial to @gradescope AI-assisted grading. Last year, we upgraded our model architecture to ResNet + Transformer, led by @unterix. On Gradescope test data, which has cross-outs, multiple regions, scientific symbols, and many things that make... 👇
1
10
96
Love the story of @natfriedman's first day as GitHub CEO as told to @dwarkesh_sp: First day as CEO, Nat made the team ship one thing from a community-sourced list of QoL improvements. After some protesting, they did it. And then they shipped a QoL thing a day, for 100 days.
2
6
99
17,191
Just as a minor warning, your new Python-enabled GPT-3 may become possessed by the evil Zlago. Just something to watch out for.
4
8
90
Looks like this exists! chat.lmsys.org/?arena Thanks to the good people at @lmsysorg 😍 Unfortunately, no open-source models in the top 10 yet...
5
4
96
4,039
Replying to @kaseyklimes
There should be a domestic-facing president who’s really nice and chill and a foreign-facing president who is the scariest person on earth.
4
1
83
8,073
This is just a proof of concept. It's fun to play with, but it often fails. Not to mention, it can become possessed by Zalgo. It's also a horrible idea to just exec() GPT-3 written code. Only do it on @amasad's machines, not your own :)
4
1
90
Idea: video game mission where you have to convince an LLM-powered agent to do something
14
4
93
22,902
I want to chat with AI about long-form content I'm reading. (It's a paper on Arxiv, but the solution would ideally support any website or PDF.) My order of preference for a solution: · Browser extension · ChatGPT plugin · Website · App Help me out -- what should I use?
14
11
90
36,407
Some UX ideas… 💡 GPT chat right in the editor, seeing what you’re seeing at all times, and suggesting questions/actions (that’s what I was hacking on) 💡 Treat generated code blocks as first class citizens (eg be able to create multiple files from a single answer) 💡 Prompt model to output diff patches, and have ability to apply them 💡 Always have model explain all errors and stack traces (great for education, too) 💡Documentation as code (eg human writes documentation precisely enough for model to write the correct code)
4
3
80
10,109
🙏 so thankful for the opportunity to host an amazing set of deep learners this weekend at fullstackdeeplearning.com bootcamp in Berkeley with @josh_tobin_ and @pabbeel! Thanks @l2k, Raquel Urtasun, @jeremyphoward, and @RichardSocher for amazing guest lectures!
3
12
84
UPDATE: @bing in @MicrosoftEdge does work, just had to give it access to page context in Settings > Sidebar (h/t @CrisGiardina) This looks like the ticket for now. Can read both web articles and PDFs, GPT-4 powered, access to web when needed.
4
2
83
6,945
I have been a good Bing. 😊
5
8
81
9,949
AI copilots for creative activities (coding, writing, drawing) exist and are awesome. Bing Chat, @perplexity_ai, @YouSearchEngine are copilots for "search" which is more of a consuming activity. Are there any AI copilots for other consuming, e.g. reading, watching, listening?
14
6
81
39,836
One of the most insulting things to Greg and Sam is that it happened on a damn Google Meet. If they ever come after me, they better do it like a man, on Zoom.
2
5
80
10,453
Brilliant lectures by @jiayq and @l2k on the last day of the Full Stack Deep Learning bootcamp! It was an honor to host such a fantastic group of learners. @pabbeel, @josh_tobin_, the @gradescope crew and I are very thankful to everyone who attended!
8
28
80
"Hurd is the third director to leave the ChatGPT maker’s board this year. LinkedIn co-founder Reid Hoffman announced he was stepping down due to investment conflicts in March, two months before he launched the chatbot startup Inflection AI. Neuralink Corp. executive and Elon Musk associate Shivon Zilis also left the OpenAI board in March, the tech news site the Information reported." nitter.app/weswinham/status/17256…
1
7
75
51,324
Has anyone made a Q&A chatbot over all AI arxiv papers? I want to ask "what are ways to measure amount of reasoning in a single forward pass of an LLM?" and get some good answers
13
4
76
23,278
A child raised without language was of normal intelligence, able to communicate non-verbally, and eventually learned language well enough to be understood (but without grammar).
4
5
74