delved, you say?
Washingtonians delved into the world of artificial intelligence (AI) at the Washington AI Network’s inaugural weekend TGAIFriday Lunch for White House correspondents. trib.al/FwHF9Um
15
371
5,489
210,015
Yet another AGI lab that will need 100k H100s
Superintelligence is within reach. Building safe superintelligence (SSI) is the most important technical problem of our​​ time. We've started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence. It’s called Safe Superintelligence Inc. SSI is our mission, our name, and our entire product roadmap, because it is our sole focus. Our team, investors, and business model are all aligned to achieve SSI. We approach safety and capabilities in tandem, as technical problems to be solved through revolutionary engineering and scientific breakthroughs. We plan to advance capabilities as fast as possible while making sure our safety always remains ahead. This way, we can scale in peace. Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures. We are an American company with offices in Palo Alto and Tel Aviv, where we have deep roots and the ability to recruit top technical talent. We are assembling a lean, cracked team of the world’s best engineers and researchers dedicated to focusing on SSI and nothing else. If that’s you, we offer an opportunity to do your life’s work and help solve the most important technical challenge of our age. Now is the time. Join us. Ilya Sutskever, Daniel Gross, Daniel Levy June 19, 2024
51
223
5,319
407,077
The most underrated superpower of Generative AI is converting unstructured data to structured data
96
425
5,103
928,915
Google realizing they now lead the AI race

ALT Homelander Based GIF

102
191
3,495
378,485
ngl, going from game dev to Nobel prize in Chemistry is a goated career trajectory
Demis Hassabis, awarded the 2024 #NobelPrize in Chemistry, was born in 1976 in London, UK. He earned his PhD in 2009 from @UCL, UK. Hassabis is currently the CEO of @GoogleDeepMind, London, UK. deepmind.google/about/
23
267
2,615
270,518
Replying to @Nexuist
Same response in Morocco. When I was back there not that long ago, barber noticed from my weakened accent that I had been living outside for some time. Unprompted he said, "look, whatever you do, don't ever consider moving back"
1
18
2,104
72,716
Simple 4-part explainer on how to query a PDF (or a set of PDFs) using GPT-3
53
276
2,174
297,045
New paper by Google provides evidence that transformers (GPT, etc) cannot generalize beyond their training data
17
178
2,024
256,436
Why Google Deepmind's Mixture-of-Depths paper, and more generally dynamic compute methods, matter: Most of the compute is WASTED because not all tokens are equally hard to predict
27
223
1,741
354,784
Founding fathers of AI having dorm room after 2 bong hits level debates
42
98
1,483
205,660
Me after Claude just one-shot 100 lines of Rust code
11
38
1,419
57,143
Adobe realizing their $20B acquisition of Figma is now worth zero because of AI doodling
I think I need to go lie down.
28
86
1,339
325,489
When you realize Apple cooked harder with a calculator than Humane and Rabbit combined
iPad calculator is actually pretty nuts
10
72
1,326
63,164
The train.py file in Karpathy's nanoGPT is a marvel. It does everything right, exactly the opposite of what you were taught in school. Global variables at the top, tons of top-level code (no main), a big while True github.com/karpathy/nanoGPT/…
29
101
1,280
391,782
It's funny how almost a decade later, the frontier labs are back at the same spot, building RL gyms
Our reinforcement learning toolkit, OpenAI Gym, is now in public beta: gym.openai.com/.
15
55
1,295
206,588
🤯 I still cannot get over this Bing AI example. The riddle was posed in base64, which it figured out alone. It decoded the base64 step-by-step and discovered the instructions with math to be solved. It solved the math unlocking the true riddle. Finally it solved the riddle.
Replying to @liron @goodside
It does get it in one shot if you say it's a riddle.
17
97
1,192
449,278
This is who Rabbit and Humane are competing against
Apple’s attention to detail is INSANE. You can’t watch this and not smile.
8
40
1,045
84,500
By far the simplest implementation of mixtral I've found was by @realGeorgeHotz on his latest stream Code: github.com/tinygrad/tinygrad… Stream: piped.video/H40QRJFzThQ?si=IZIE…
6
63
760
113,929
llama 400B rn

ALT Vegeta GIF

Guys the 400B... Still in training!
7
43
620
27,643
The magic of LLMs is in converting unstructured data into structured data
29
29
557
132,928
At Stanford > lecturer: so sam, if you were 19, what would you do > sama: AI research > lecturer gleefully flexing: and where would be the best place to do it? huh? wink wink > sama: OpenAI or other big company > lecturer: ... > sama: need gpus go brrr
8
19
542
298,129
I'm still stunned by this. How did it improve so much? I mean, look at 8B vs the old 70B
42
29
522
152,237
This chart is the one. This is the thesis
🆕 The New Kings of Open Source AI latent.space/p/oct-2023 Recapping how @MistralAI took over Open Source AI, how the definition of Open is evolving, and why engineers freaked out about Copilot losses at @aidotengineer Memes of the month: @mingjie @nearcyan ft @TheNoahHein
13
51
487
163,217
Wow, Meta has just let go of some serious talent
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
19
23
492
244,140
Yes, this is the full Apple 10K (80 pages) converted to Markdown from PDF using GPT-3.
19
44
476
145,659
Replying to @finbarrtimbers
they should build that high speed rail between seattle and vancouver
4
5
453
30,791
This is by far the best walkthrough on gpu kernel implementations I have ever seen. It's about optimizing matmul in cuda, but same ideas apply generally. The visualizations are 🤌
I wrote the most naive CUDA matrix multiply and iteratively optimised it to ~80% of cuBLAS performance: siboehm.com/articles/22/CUDA…
37
451
120,915
As @karpathy said, english in the hottest new programming language Image 1: Code in English Image 2: The "compiler", simple call to GPT-4 Image 3: The Python code generated from the compiler Image 4: It works, first try
12
73
396
60,971
Google realizing Microsoft is now the new OpenAI

ALT Michael Scott No GIF

5
34
395
62,888
Fine-tuning a 7B model in 2024
6
20
382
17,309
> Ask gpt4-o for code improvements > starts yapping > press stop, demand side-by-side code, original vs improved > hallucinates original code, cites actual original code as "improved", claims it has made all the improvements > point out lies > apologizes, repeats original code
38
14
362
25,493
Went to a Jack White concert recently. One of the coolest parts was seeing adults of all ages and all backgrounds vibing. Music doesn't need gatekeepers
2
5
327
Here's a pdf of a paper, claude make slides to explain it
7
29
373
89,466
lol
Announcing Coffee: build and iterate on your UI 10x faster with AI ☕️👇 github.com/Coframe/coffee
9
16
345
77,017
Ok, you have my attention. I didn't think a 7B model was large enough to use 1M context effectively
World Model on Million-Length Video And Language With RingAttention Open-sources 7B models capable of processing long text documents and videos of over 1M tokens proj: largeworldmodel.github.io/ abs: arxiv.org/abs/2402.08268
6
48
350
78,793
I hate the bitter lesson so much
9
12
339
38,760
Nobody wants what young people want? Nobody? What about... young people?
4
1
297
14,907
GPT3: PDF -> Markdown
11
19
329
99,359
Python dependency management is the greatest obstacle to AGI
28
33
317
38,009
We are clearly headed towards a future with GPT4 level models, running at 1000 tokens / s (batch size = 1), locally on consumer laptops. This is not ASI, but it completely changes how we interact with machines
17
16
311
35,493
Not that this should matter in any way, but France being so low in the chart adds to your point
2
9
283
Aspiring AI devs and software engineers wishing to get into AI: Learn to train your own models. MNIST, Fashion MNIST, write your own CNN, implement a simple transformer and make it do something simple. Learn the value of good, clean data.
7
26
285
60,764
It's interesting how easily you can define agents when you embrace the xml
13
12
293
41,367
I sense a great disturbance in the force
Google presents Mixture-of-Depths Dynamically allocating compute in transformer-based language models Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate
2
14
245
58,008
mixtral 8x22b config
7
28
246
24,574
Q* Bookclub: ReST from Deepmind arxiv.org/abs/2308.08998
1
28
241
53,578
This single file of code by @tannerlinsley is one of the most beautiful pieces of code I have ever come across and I feel like I need to explain what is so impressive about it github.com/TanStack/table/bl…
6
19
232
Replying to @JosephPolitano
Is this why Europeans include tax in store prices?
7
204
40,083
gpt4-o is too eager to give out code and a full explanation after the code. too much slop wasting my time to realize it made NO CHANGES WHATSOEVER
25
3
197
12,645
Replying to @AlecStapp
Are you trying to make me regret moving to Texas from Madrid? Yes, the metro is a marvel. Yes, groceries were a block away. Sometimes if I realized I was missing an ingredient while cooking I would just turn off the heat, physically run (no car) to the store and back in 5 minutes
5
173
This man has to be stopped. He is simultaneously building the replacement of transformers while making attention so fast you won't need to replace them anymore. Building both the unstoppable force and the immovable object. Legend 🔥
Announcing FlashAttention-2! We released FlashAttention a year ago, making attn 2-4 faster and is now widely used in most LLM libraries. Recently I’ve been working on the next version: 2x faster than v1, 5-9x vs standard attn, reaching 225 TFLOPs/s training speed on A100. 1/
6
11
197
39,412
Replying to @jarredsumner
Side effect of functional programming I suppose
2
4
184
Replying to @Noahpinion
5. Mike Pence

ALT Stalin Photoshop GIF

9
187
5,607
Killing training runs is not news
NEW: OpenAI dropped work earlier this year on a major new AI model called Arrakis after it failed to perform as expected during training theinformation.com/articles/…
10
8
186
40,801
Replying to @YIMBYLAND
Would be scary if climate change means California gets Florida weather
2
172
44,224
The orange is currently all free margin for NVidia
6
11
187
16,934
Replying to @stewfortier
Nadella replacing Ballmer imo is #1 greatest decision in business history, up there with the Jobs comeback
2
6
173
20,537
For those who don't know, this is what it feels like to have an internal monologue all day
14
7
168
7,696
Cool reading up on CoDA, an alternative to LoRA that can boost inference speeds, up to 18x without too much damage but for a mere 5x speed boost you also get a small perf increase
Introducing Conditional Adapters (CoDA) from Google Research! Adaptation methods (e.g. Adapter and LoRA) can finetune LMs with minimal parameter updates, but their inference remains expensive. CoDA makes LMs faster to use, and works for three modalities! arxiv.org/abs/2304.04947
1
30
178
57,931
Google realizing Microsoft is now the new OpenAI

ALT Michael Scott No GIF

2
3
166
14,202
wtf is this level of consistency. How are things not jittering? How is everything still perfectly in place in 3D?
Replying to @OpenAI
Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.”
14
7
159
17,398
The mistral model file is barely over 200LOC. Obviously there are some imports, but point is, super minimal and readable github.com/mistralai/mistral…
5
21
172
27,571
we need less Linear-like dark mode and more winamp/videogame-style UIs
The spice must flow.
5
12
164
15,746
They would probably argue that there is a front camera (just under the "M"). My reply would be that you shouldn't need a front camera to be able to see, that's what the windshield is for.
2
1
137
Replying to @tszzl
The Civ 6 trailer is still the most inspirational video about human progress
4
8
163
8,469
Me trying to catch up with everything happening in AI rn

ALT Speeding Evan Peters GIF by 20th Century Studios

9
17
152
13,301
I am floored by the current state of model inferencing. Nothing works, nothing installs, nothing builds. Libraries claim to be one click away from 10x speedups, when they only support 3 models, all Llama, and library is incompatible with your deps.
28
9
158
80,672
Fascinating reading the socratic pre-training paper. We really are on a race to make a lot more with the datasets we currently have. So many ways to enrich data and get better models arxiv.org/abs/2212.10449
4
34
162
23,933
Replying to @jowenpetty
they have color indoors
2
143
7,707
The aliens in Arrival use a diffusion-based non-autoregressive language model. Maybe @ylecun is onto something by saying that we won't be using autoregressive models in 5 years 🤔

ALT Arrival Arrival Alien GIF

18
14
151
37,751
At $0.002 / 1k tokens, high quality synthetic data generation is now effectively free, at scale. Generating the King James Bible ~1M tokens would cost $2
2
16
149
13,834
A chart I'm still trying to process. The curves are still going down. What happens at 10T tokens?
31
8
144
84,536
I know nothing about superconductors
4
15
139
11,221
Reminder: it's all compute
6
10
148
41,653
I'm guessing the reason is that if you kept algebra, folks would know that $1.7m is too much for just one toilet
2
2
142
Really cool Attention visualization from one of the WWDC talks
1
20
151
21,077
When Groq realizes inference will replace much of training
Google presents Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Test-time compute can be used to outperform a 14× larger model arxiv.org/abs/2408.03314
3
10
143
10,287
Replying to @d_feldman
Works for "site:yelp.com" as well, 5 star reviews
17
134
14,501
These savings further compound when paired with Mixture of Experts. We are entering an era of scalable compute of LLMs. Tokens will not have fixed costs, the machine will take the time it needs to think. Massive improvements for both gpu rich and poor
2
8
145
10,353
Several things interesting about the paper 1. They only needed to manually label about 500 examples (gold data) 2. The behavior policy is just a prompt We're starting to see how synthetic data can get you tons of leverage
We’ve developed Rule-Based Rewards (RBRs) to align AI behavior safely without needing extensive human data collection, making our systems safer and more reliable for everyday use. openai.com/index/improving-m…
1
21
143
21,193
Ah, the future of AI. It was good while it lasted. Remember when folks shared logbooks and wandb dashboards. Pepperidge farm remembers
9
12
136
296,973
Replying to @0interestrates
2013: Increase Lifetime 2023: Increase Lifetime Value
2
3
137
7,327
A priori looks like diminishing returns to go from Llama 2 70B to Falcon 180B, especially given the extra resource requirements.
11
13
133
47,056
Replying to @abacaj
Agreed that there are a lot of embeddings that are better at semantic search than the OpenAI ones. But if you must use them for Q&A, don't embed the question when searching. Ask GPT-3 to generate a fake answer, embed this answer, and use this to search arxiv.org/abs/2212.10496
4
10
144
7,107
Replying to @aj_kourabi
Can confirm, worked in finance, physics majors pick up stuff very quickly
130
24,278
Analogous to Mixture of Experts, Mixture of Depths has the model learn TO SKIP layers if necessary. The orange in the chart shows all the compute it DID NOT use. The orange area = compute savings
3
13
140
11,581
Replying to @Suhail
Replying to @pmarca
Reality: Skynet appears because a dev forgot to comment out a while loop
2
127
31,078
SAM 3 with prompt "scotland player"
Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: go.meta.me/591040 2️⃣ SAM 3D brings the model collection into the 3rd dimension to enable precise reconstruction of 3D objects and people from a single 2D image. 🔗 Learn more about SAM 3D: go.meta.me/305985 These models offer innovative capabilities and unique tools for developers and researchers to create, experiment and uplevel media workflows.
6
15
433
39,238
You may not like it but this is what peak reusability looks like
7
2
129
19,980
The result everyone was worried about with RLHF
Replying to @AnthropicAI
When presented with responses to misconceptions, we found humans prefer untruthful sycophantic responses to truthful ones a non-negligible fraction of the time. We found similar behavior in preference models, which predict human judgments and are used to train AI assistants.
11
12
126
28,110
You may not like it but this is what peak performance looks like
3
2
121
5,529
TIL you can just crank up the batch size and stop worrying about the learning rate arxiv.org/abs/1711.00489
2
14
125
23,755
Remember when gpt2 was too dangerous to be released? Ah, those were the days
7
6
121
10,403
Training on synthetic data
3
8
114
7,190
Spoiler alert: This is the kind of models everyone will be running locally in 6 months (encoder optional)
7
10
125
18,934