Hassan Hayat 🔥 · Apr 27, 2024 · 1:03 AM UTC

Hassan Hayat 🔥

Hassan Hayat 🔥

@TheSeaMouse

27 Apr 2024

delved, you say?

The Hill

@thehill

26 Apr 2024

Washingtonians delved into the world of artificial intelligence (AI) at the Washington AI Network’s inaugural weekend TGAIFriday Lunch for White House correspondents. trib.al/FwHF9Um

371

5,489

210,015

Hassan Hayat 🔥 · Jun 19, 2024 · 7:13 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

19 Jun 2024

Yet another AGI lab that will need 100k H100s

SSI Inc.

@ssi

19 Jun 2024

Superintelligence is within reach. Building safe superintelligence (SSI) is the most important technical problem of our time. We've started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence. It’s called Safe Superintelligence Inc. SSI is our mission, our name, and our entire product roadmap, because it is our sole focus. Our team, investors, and business model are all aligned to achieve SSI. We approach safety and capabilities in tandem, as technical problems to be solved through revolutionary engineering and scientific breakthroughs. We plan to advance capabilities as fast as possible while making sure our safety always remains ahead. This way, we can scale in peace. Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures. We are an American company with offices in Palo Alto and Tel Aviv, where we have deep roots and the ability to recruit top technical talent. We are assembling a lean, cracked team of the world’s best engineers and researchers dedicated to focusing on SSI and nothing else. If that’s you, we offer an opportunity to do your life’s work and help solve the most important technical challenge of our age. Now is the time. Join us. Ilya Sutskever, Daniel Gross, Daniel Levy June 19, 2024

223

5,319

407,077

Hassan Hayat 🔥 · Jan 28, 2023 · 7:06 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

28 Jan 2023

The most underrated superpower of Generative AI is converting unstructured data to structured data

425

5,103

928,915

Hassan Hayat 🔥 · Nov 20, 2023 · 5:15 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

20 Nov 2023

Google realizing they now lead the AI race

ALT Homelander Based GIF

102

191

3,495

378,485

Hassan Hayat 🔥 · Oct 9, 2024 · 8:28 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

9 Oct 2024

ngl, going from game dev to Nobel prize in Chemistry is a goated career trajectory

The Nobel Prize

@NobelPrize

9 Oct 2024

Demis Hassabis, awarded the 2024 #NobelPrize in Chemistry, was born in 1976 in London, UK. He earned his PhD in 2009 from @UCL, UK. Hassabis is currently the CEO of @GoogleDeepMind, London, UK. deepmind.google/about/

267

2,615

270,518

Hassan Hayat 🔥 · Jun 25, 2023 · 7:28 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

25 Jun 2023

Replying to @Nexuist

Same response in Morocco. When I was back there not that long ago, barber noticed from my weakened accent that I had been living outside for some time. Unprompted he said, "look, whatever you do, don't ever consider moving back"

2,104

72,716

Hassan Hayat 🔥 · Jan 24, 2023 · 7:50 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

24 Jan 2023

Simple 4-part explainer on how to query a PDF (or a set of PDFs) using GPT-3

276

2,174

297,045

Hassan Hayat 🔥 · Nov 6, 2023 · 4:00 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

6 Nov 2023

anton

@abacaj

5 Nov 2023

New paper by Google provides evidence that transformers (GPT, etc) cannot generalize beyond their training data

178

2,024

256,436

Hassan Hayat 🔥 · Apr 4, 2024 · 7:09 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

4 Apr 2024

Why Google Deepmind's Mixture-of-Depths paper, and more generally dynamic compute methods, matter: Most of the compute is WASTED because not all tokens are equally hard to predict

223

1,741

354,784

Hassan Hayat 🔥 · Oct 31, 2023 · 9:19 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

31 Oct 2023

Founding fathers of AI having dorm room after 2 bong hits level debates

1,483

205,660

Hassan Hayat 🔥 · Jun 24, 2024 · 11:26 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

24 Jun 2024

Me after Claude just one-shot 100 lines of Rust code

1,419

57,143

Hassan Hayat 🔥 · Nov 15, 2023 · 9:57 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

15 Nov 2023

Adobe realizing their $20B acquisition of Figma is now worth zero because of AI doodling

Kevin Cannon @multikev

15 Nov 2023

I think I need to go lie down.

1,339

325,489

Hassan Hayat 🔥 · Jun 10, 2024 · 5:53 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

10 Jun 2024

When you realize Apple cooked harder with a calculator than Humane and Rabbit combined

Aaron Levie

@levie

10 Jun 2024

iPad calculator is actually pretty nuts

1,326

63,164

Hassan Hayat 🔥 · May 1, 2023 · 8:40 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

1 May 2023

The train.py file in Karpathy's nanoGPT is a marvel. It does everything right, exactly the opposite of what you were taught in school. Global variables at the top, tons of top-level code (no main), a big while True github.com/karpathy/nanoGPT/…

101

1,280

391,782

Hassan Hayat 🔥 · Oct 17, 2025 · 5:33 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

17 Oct 2025

It's funny how almost a decade later, the frontier labs are back at the same spot, building RL gyms

OpenAI

@OpenAI

27 Apr 2016

Our reinforcement learning toolkit, OpenAI Gym, is now in public beta: gym.openai.com/.

1,295

206,588

Hassan Hayat 🔥 · Feb 20, 2023 · 3:35 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

20 Feb 2023

🤯 I still cannot get over this Bing AI example. The riddle was posed in base64, which it figured out alone. It decoded the base64 step-by-step and discovered the instructions with math to be solved. It solved the math unlocking the true riddle. Finally it solved the riddle.

Thomas Rice

@thomasrice_au

19 Feb 2023

Replying to @liron @goodside

It does get it in one shot if you say it's a riddle.

1,192

449,278

Hassan Hayat 🔥 · May 16, 2024 · 11:41 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

16 May 2024

This is who Rabbit and Humane are competing against

Quinn Nelson

@SnazzyLabs

16 May 2024

Apple’s attention to detail is INSANE. You can’t watch this and not smile.

1,045

84,500

Hassan Hayat 🔥 · Dec 10, 2023 · 9:50 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

10 Dec 2023

By far the simplest implementation of mixtral I've found was by @realGeorgeHotz on his latest stream Code: github.com/tinygrad/tinygrad… Stream: piped.video/H40QRJFzThQ?si=IZIE…

model = Transformer(n_layers=32, dim=4096, hidden_dim=14336, n_heads=32, n_kv_heads=8, norm_eps=1e-5, vocab_size=32000, feed_forward=functools.partial(MixtureFeedForward, 8), jit=False)

ALT model = Transformer(n_layers=32, dim=4096, hidden_dim=14336, n_heads=32, n_kv_heads=8, norm_eps=1e-5, vocab_size=32000, feed_forward=functools.partial(MixtureFeedForward, 8), jit=False)

class MixtureFeedForward:
def __init__(self, num_experts:int, dim:int, hidden_dim:int, linear=nn.Linear):
self.gate = nn.Linear(dim, num_experts, bias=False)
self.experts = [FeedForward(dim, hidden_dim, linear) for _ in range(num_experts)]
def __call__(self, x:Tensor) -> Tensor:
assert x.shape[0] == 1, "only BS=1"
g = self.gate(x).exp()
choice = g.data().tolist()[0][0]
top = sorted(enumerate(choice), key=lambda x: -x[1])
norm = top[0][1] + top[1][1]
e1 = self.experts[top[0][0]]
e2 = self.experts[top[1][0]]
e1_dev = e1.w1.weight.device
e2_dev = e2.w1.weight.device
#print(top[0][1]/norm, top[1][1]/norm)
ret = e1(x.to(e1_dev)).to(x.device) * (top[0][1]/norm) + e2(x.to(e2_dev)).to(x.device) * (top[1][1]/norm)
return ret

ALT class MixtureFeedForward: def __init__(self, num_experts:int, dim:int, hidden_dim:int, linear=nn.Linear): self.gate = nn.Linear(dim, num_experts, bias=False) self.experts = [FeedForward(dim, hidden_dim, linear) for _ in range(num_experts)] def __call__(self, x:Tensor) -> Tensor: assert x.shape[0] == 1, "only BS=1" g = self.gate(x).exp() choice = g.data().tolist()[0][0] top = sorted(enumerate(choice), key=lambda x: -x[1]) norm = top[0][1] + top[1][1] e1 = self.experts[top[0][0]] e2 = self.experts[top[1][0]] e1_dev = e1.w1.weight.device e2_dev = e2.w1.weight.device #print(top[0][1]/norm, top[1][1]/norm) ret = e1(x.to(e1_dev)).to(x.device) * (top[0][1]/norm) + e2(x.to(e2_dev)).to(x.device) * (top[1][1]/norm) return ret

760

113,929

Hassan Hayat 🔥 · Apr 18, 2024 · 4:57 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

18 Apr 2024

llama 400B rn

ALT Vegeta GIF

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

18 Apr 2024

Guys the 400B... Still in training!

620

27,643

Hassan Hayat 🔥 · Sep 22, 2023 · 4:37 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

22 Sep 2023

The magic of LLMs is in converting unstructured data into structured data

557

132,928

Hassan Hayat 🔥 · May 2, 2024 · 5:14 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

2 May 2024

At Stanford > lecturer: so sam, if you were 19, what would you do > sama: AI research > lecturer gleefully flexing: and where would be the best place to do it? huh? wink wink > sama: OpenAI or other big company > lecturer: ... > sama: need gpus go brrr

542

298,129

Hassan Hayat 🔥 · Apr 19, 2024 · 1:36 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

19 Apr 2024

I'm still stunned by this. How did it improve so much? I mean, look at 8B vs the old 70B

522

152,237

Hassan Hayat 🔥 · Nov 13, 2023 · 4:39 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

13 Nov 2023

This chart is the one. This is the thesis

Latent.Space

@latentspacepod

12 Nov 2023

🆕 The New Kings of Open Source AI latent.space/p/oct-2023 Recapping how @MistralAI took over Open Source AI, how the definition of Open is evolving, and why engineers freaked out about Copilot losses at @aidotengineer Memes of the month: @mingjie @nearcyan ft @TheNoahHein

487

163,217

Hassan Hayat 🔥 · Oct 23, 2025 · 11:18 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

23 Oct 2025

Wow, Meta has just let go of some serious talent

Yuandong Tian

@tydsh

23 Oct 2025

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)

492

244,140

Hassan Hayat 🔥 · Jan 28, 2023 · 4:23 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

28 Jan 2023

Yes, this is the full Apple 10K (80 pages) converted to Markdown from PDF using GPT-3.

476

145,659

Hassan Hayat 🔥 · Sep 19, 2025 · 8:09 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

19 Sep 2025

Replying to @finbarrtimbers

they should build that high speed rail between seattle and vancouver

453

30,791

Hassan Hayat 🔥 · Nov 26, 2023 · 1:40 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

26 Nov 2023

This is by far the best walkthrough on gpu kernel implementations I have ever seen. It's about optimizing matmul in cuda, but same ideas apply generally. The visualizations are 🤌

Simon Boehm @Si_Boehm

3 Jan 2023

I wrote the most naive CUDA matrix multiply and iteratively optimised it to ~80% of cuBLAS performance: siboehm.com/articles/22/CUDA…

451

120,915

Hassan Hayat 🔥 · Mar 17, 2023 · 9:01 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

17 Mar 2023

As @karpathy said, english in the hottest new programming language Image 1: Code in English Image 2: The "compiler", simple call to GPT-4 Image 3: The Python code generated from the compiler Image 4: It works, first try

396

60,971

Hassan Hayat 🔥 · Nov 20, 2023 · 8:03 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

20 Nov 2023

Google realizing Microsoft is now the new OpenAI

ALT Michael Scott No GIF

395

62,888

Hassan Hayat 🔥 · Mar 26, 2024 · 5:39 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

26 Mar 2024

Fine-tuning a 7B model in 2024

382

17,309

Hassan Hayat 🔥 · Jun 7, 2024 · 9:23 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

7 Jun 2024

> Ask gpt4-o for code improvements > starts yapping > press stop, demand side-by-side code, original vs improved > hallucinates original code, cites actual original code as "improved", claims it has made all the improvements > point out lies > apologizes, repeats original code

362

25,493

Hassan Hayat 🔥 · Jun 4, 2022 · 10:45 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

4 Jun 2022

Replying to @Grady_Booch @dannyschof81

Went to a Jack White concert recently. One of the coolest parts was seeing adults of all ages and all backgrounds vibing. Music doesn't need gatekeepers

327

Hassan Hayat 🔥 · Jun 21, 2024 · 8:41 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

21 Jun 2024

Here's a pdf of a paper, claude make slides to explain it

373

89,466

Hassan Hayat 🔥 · Dec 14, 2023 · 12:57 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

14 Dec 2023

lol

Coframe

@coframe_ai

13 Dec 2023

Announcing Coffee: build and iterate on your UI 10x faster with AI ☕️👇 github.com/Coframe/coffee

345

77,017

Hassan Hayat 🔥 · Feb 14, 2024 · 2:49 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

14 Feb 2024

Ok, you have my attention. I didn't think a 7B model was large enough to use 1M context effectively

Aran Komatsuzaki

@arankomatsuzaki

14 Feb 2024

World Model on Million-Length Video And Language With RingAttention Open-sources 7B models capable of processing long text documents and videos of over 1M tokens proj: largeworldmodel.github.io/ abs: arxiv.org/abs/2402.08268

350

78,793

Hassan Hayat 🔥 · May 17, 2024 · 1:21 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

17 May 2024

I hate the bitter lesson so much

339

38,760

Hassan Hayat 🔥 · Dec 18, 2023 · 8:53 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

18 Dec 2023

Reminder: This paper is not a manual arxiv.org/abs/2309.08632

Pretraining on the Test Set Is All You Need

Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating...

arxiv.org

334

52,270

Hassan Hayat 🔥 · May 7, 2024 · 1:39 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

7 May 2024

Replying to @maxbadman @xurbanxcowboyx

Nobody wants what young people want? Nobody? What about... young people?

297

14,907

Hassan Hayat 🔥 · Jan 27, 2023 · 10:43 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

27 Jan 2023

GPT3: PDF -> Markdown

ALT Operating Performance by Geo page from the Apple 10K PDF

ALT Markdown version of the Operating Performance by Geo page from the Apple 10K. Markdown conversion made by GPT-3

329

99,359

Hassan Hayat 🔥 · May 4, 2023 · 5:17 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

4 May 2023

Python dependency management is the greatest obstacle to AGI

317

38,009

Hassan Hayat 🔥 · Dec 11, 2023 · 11:49 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

11 Dec 2023

We are clearly headed towards a future with GPT4 level models, running at 1000 tokens / s (batch size = 1), locally on consumer laptops. This is not ASI, but it completely changes how we interact with machines

311

35,493

Hassan Hayat 🔥 · Dec 15, 2022 · 4:02 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

15 Dec 2022

Replying to @thomaschattwill @cjwerleman

Not that this should matter in any way, but France being so low in the chart adds to your point

283

Hassan Hayat 🔥 · Mar 22, 2024 · 2:25 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

22 Mar 2024

jensen al-gaib

Nucleus☕️

@EsotericCofe

21 Mar 2024

it's over

308

22,885

Hassan Hayat 🔥 · Jul 9, 2023 · 12:55 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

9 Jul 2023

Aspiring AI devs and software engineers wishing to get into AI: Learn to train your own models. MNIST, Fashion MNIST, write your own CNN, implement a simple transformer and make it do something simple. Learn the value of good, clean data.

285

60,764

Hassan Hayat 🔥 · Jul 13, 2024 · 6:06 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

13 Jul 2024

It's interesting how easily you can define agents when you embrace the xml

293

41,367

Hassan Hayat 🔥 · Dec 8, 2023 · 5:40 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

8 Dec 2023

The more I read, the more I wonder... Did Noam Shazeer solve it back in 2017? arxiv.org/abs/1701.06538

Outrageously Large Neural Networks: The Sparsely-Gated...

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been...

arxiv.org

285

68,140

Hassan Hayat 🔥 · Apr 4, 2024 · 4:27 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

4 Apr 2024

I sense a great disturbance in the force

@_akhaliq

4 Apr 2024

Google presents Mixture-of-Depths Dynamically allocating compute in transformer-based language models Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate

245

58,008

Hassan Hayat 🔥 · Apr 10, 2024 · 1:27 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

10 Apr 2024

mixtral 8x22b config

246

24,574

Hassan Hayat 🔥 · Nov 24, 2023 · 1:42 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

24 Nov 2023

Q* Bookclub: ReST from Deepmind arxiv.org/abs/2308.08998

241

53,578

Hassan Hayat 🔥 · Jul 23, 2022 · 7:04 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

23 Jul 2022

This single file of code by @tannerlinsley is one of the most beautiful pieces of code I have ever come across and I feel like I need to explain what is so impressive about it github.com/TanStack/table/bl…

232

Hassan Hayat 🔥 · Jul 18, 2024 · 2:29 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

18 Jul 2024

Replying to @JosephPolitano

Is this why Europeans include tax in store prices?

204

40,083

Hassan Hayat 🔥 · Jun 1, 2024 · 3:14 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

1 Jun 2024

gpt4-o is too eager to give out code and a full explanation after the code. too much slop wasting my time to realize it made NO CHANGES WHATSOEVER

197

12,645

Hassan Hayat 🔥 · Jul 30, 2022 · 4:59 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

30 Jul 2022

Replying to @AlecStapp

Are you trying to make me regret moving to Texas from Madrid? Yes, the metro is a marvel. Yes, groceries were a block away. Sometimes if I realized I was missing an ingredient while cooking I would just turn off the heat, physically run (no car) to the store and back in 5 minutes

173

Hassan Hayat 🔥 · Jul 17, 2023 · 5:41 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

17 Jul 2023

This man has to be stopped. He is simultaneously building the replacement of transformers while making attention so fast you won't need to replace them anymore. Building both the unstoppable force and the immovable object. Legend 🔥

Tri Dao

@tri_dao

17 Jul 2023

Announcing FlashAttention-2! We released FlashAttention a year ago, making attn 2-4 faster and is now widely used in most LLM libraries. Recently I’ve been working on the next version: 2x faster than v1, 5-9x vs standard attn, reaching 225 TFLOPs/s training speed on A100. 1/

197

39,412

Hassan Hayat 🔥 · Dec 11, 2022 · 7:16 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

11 Dec 2022

Replying to @jarredsumner

Side effect of functional programming I suppose

184

Hassan Hayat 🔥 · Mar 6, 2025 · 6:31 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

6 Mar 2025

Replying to @Noahpinion

5. Mike Pence

ALT Stalin Photoshop GIF

187

5,607

Hassan Hayat 🔥 · Oct 17, 2023 · 10:30 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

17 Oct 2023

Killing training runs is not news

Jon Victor @jon_victor_

17 Oct 2023

NEW: OpenAI dropped work earlier this year on a major new AI model called Arrakis after it failed to perform as expected during training theinformation.com/articles/…

186

40,801

Hassan Hayat 🔥 · Sep 24, 2023 · 2:24 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

24 Sep 2023

The copy-paste nature of the openai cookbook makes it infinitely more usable and useful than every ai prompting library out there. cookbook.openai.com/examples…

Search reranking with cross-encoders

This notebook takes you through examples of using a cross-encoder to re-rank search results. This is a common use case with our customers,

developers.openai.com

187

23,822

Hassan Hayat 🔥 · Jul 6, 2024 · 5:36 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

6 Jul 2024

Replying to @YIMBYLAND

Would be scary if climate change means California gets Florida weather

172

44,224

Hassan Hayat 🔥 · Apr 4, 2024 · 5:26 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

4 Apr 2024

The orange is currently all free margin for NVidia

187

16,934

Hassan Hayat 🔥 · Dec 21, 2022 · 4:15 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

21 Dec 2022

Replying to @stewfortier

Nadella replacing Ballmer imo is #1 greatest decision in business history, up there with the Jobs comeback

173

20,537

Hassan Hayat 🔥 · Mar 15, 2024 · 9:25 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

15 Mar 2024

For those who don't know, this is what it feels like to have an internal monologue all day

168

7,696

Hassan Hayat 🔥 · Jun 11, 2023 · 10:05 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

11 Jun 2023

Cool reading up on CoDA, an alternative to LoRA that can boost inference speeds, up to 18x without too much damage but for a mere 5x speed boost you also get a small perf increase

taolei @taolei15949106

12 Apr 2023

Introducing Conditional Adapters (CoDA) from Google Research! Adaptation methods (e.g. Adapter and LoRA) can finetune LMs with minimal parameter updates, but their inference remains expensive. CoDA makes LMs faster to use, and works for three modalities! arxiv.org/abs/2304.04947

178

57,931

Hassan Hayat 🔥 · Nov 20, 2023 · 8:44 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

20 Nov 2023

Hassan Hayat 🔥

@TheSeaMouse

20 Nov 2023

Google realizing Microsoft is now the new OpenAI

ALT Michael Scott No GIF

166

14,202

Hassan Hayat 🔥 · Feb 15, 2024 · 6:17 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

15 Feb 2024

wtf is this level of consistency. How are things not jittering? How is everything still perfectly in place in 3D?

OpenAI

@OpenAI

15 Feb 2024

Replying to @OpenAI

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.”

159

17,398

Hassan Hayat 🔥 · Sep 27, 2023 · 3:35 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

27 Sep 2023

The mistral model file is barely over 200LOC. Obviously there are some imports, but point is, super minimal and readable github.com/mistralai/mistral…

172

27,571

Hassan Hayat 🔥 · Dec 30, 2023 · 2:47 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

30 Dec 2023

we need less Linear-like dark mode and more winamp/videogame-style UIs

Ryan Robitaille

@ryrobes

21 Dec 2023

The spice must flow.

164

15,746

Hassan Hayat 🔥 · Feb 22, 2022 · 5:49 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

22 Feb 2022

They would probably argue that there is a front camera (just under the "M"). My reply would be that you shouldn't need a front camera to be able to see, that's what the windshield is for.

137

Hassan Hayat 🔥 · Aug 5, 2023 · 10:09 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

5 Aug 2023

Replying to @tszzl

The Civ 6 trailer is still the most inspirational video about human progress

163

8,469

Hassan Hayat 🔥 · Jun 21, 2023 · 4:18 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

21 Jun 2023

Me trying to catch up with everything happening in AI rn

ALT Speeding Evan Peters GIF by 20th Century Studios

152

13,301

Hassan Hayat 🔥 · Jul 3, 2023 · 4:58 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

3 Jul 2023

I am floored by the current state of model inferencing. Nothing works, nothing installs, nothing builds. Libraries claim to be one click away from 10x speedups, when they only support 3 models, all Llama, and library is incompatible with your deps.

158

80,672

Hassan Hayat 🔥 · Jul 30, 2023 · 9:27 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

30 Jul 2023

Fascinating reading the socratic pre-training paper. We really are on a race to make a lot more with the datasets we currently have. So many ways to enrich data and get better models arxiv.org/abs/2212.10449

162

23,933

Hassan Hayat 🔥 · May 2, 2024 · 8:03 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

2 May 2024

Replying to @jowenpetty

they have color indoors

143

7,707

Hassan Hayat 🔥 · May 27, 2023 · 12:58 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

27 May 2023

The aliens in Arrival use a diffusion-based non-autoregressive language model. Maybe @ylecun is onto something by saying that we won't be using autoregressive models in 5 years 🤔

ALT Arrival Arrival Alien GIF

151

37,751

Hassan Hayat 🔥 · Mar 1, 2023 · 9:32 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

1 Mar 2023

At $0.002 / 1k tokens, high quality synthetic data generation is now effectively free, at scale. Generating the King James Bible ~1M tokens would cost $2

149

13,834

Hassan Hayat 🔥 · Jun 14, 2023 · 10:40 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

14 Jun 2023

A chart I'm still trying to process. The curves are still going down. What happens at 10T tokens?

144

84,536

Hassan Hayat 🔥 · Dec 5, 2023 · 10:39 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

5 Dec 2023

Just re-reading this paper, where fine-tuning a (160M) draft model to mimic a (7B) base model increases token acceptance rate from 10% to 65% → 1.2x to 3x latency reduction arxiv.org/abs/2310.07177

Online Speculative Decoding

Speculative decoding is a pivotal technique to accelerate the inference of large language models (LLMs) by employing a smaller draft model to predict the target model's outputs. However, its...

arxiv.org

153

19,217

Hassan Hayat 🔥 · Jul 27, 2023 · 4:04 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

27 Jul 2023

I know nothing about superconductors

139

11,221

Hassan Hayat 🔥 · Mar 24, 2024 · 4:57 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

24 Mar 2024

Reminder: it's all compute

148

41,653

Hassan Hayat 🔥 · Oct 21, 2022 · 6:52 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

21 Oct 2022

Replying to @garrytan @Noahpinion

I'm guessing the reason is that if you kept algebra, folks would know that $1.7m is too much for just one toilet

142

Hassan Hayat 🔥 · Jun 10, 2024 · 10:34 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

10 Jun 2024

Really cool Attention visualization from one of the WWDC talks

151

21,077

Hassan Hayat 🔥 · Aug 7, 2024 · 2:55 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

7 Aug 2024

When Groq realizes inference will replace much of training

Aran Komatsuzaki

@arankomatsuzaki

7 Aug 2024

Google presents Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Test-time compute can be used to outperform a 14× larger model arxiv.org/abs/2408.03314

143

10,287

Hassan Hayat 🔥 · Apr 22, 2023 · 12:21 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

22 Apr 2023

Replying to @d_feldman

Works for "site:yelp.com" as well, 5 star reviews

134

14,501

Hassan Hayat 🔥 · Apr 4, 2024 · 7:09 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

4 Apr 2024

These savings further compound when paired with Mixture of Experts. We are entering an era of scalable compute of LLMs. Tokens will not have fixed costs, the machine will take the time it needs to think. Massive improvements for both gpu rich and poor

145

10,353

Hassan Hayat 🔥 · Jul 25, 2024 · 4:07 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

25 Jul 2024

Several things interesting about the paper 1. They only needed to manually label about 500 examples (gold data) 2. The behavior policy is just a prompt We're starting to see how synthetic data can get you tons of leverage

OpenAI

@OpenAI

24 Jul 2024

We’ve developed Rule-Based Rewards (RBRs) to align AI behavior safely without needing extensive human data collection, making our systems safer and more reliable for everyday use. openai.com/index/improving-m…

143

21,193

Hassan Hayat 🔥 · Sep 27, 2023 · 3:53 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

27 Sep 2023

Ah, the future of AI. It was good while it lasted. Remember when folks shared logbooks and wandb dashboards. Pepperidge farm remembers

136

296,973

Hassan Hayat 🔥 · Feb 11, 2023 · 5:55 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

11 Feb 2023

Replying to @0interestrates

2013: Increase Lifetime 2023: Increase Lifetime Value

137

7,327

Hassan Hayat 🔥 · Sep 6, 2023 · 7:19 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

6 Sep 2023

A priori looks like diminishing returns to go from Llama 2 70B to Falcon 180B, especially given the extra resource requirements.

133

47,056

Hassan Hayat 🔥 · Jan 15, 2023 · 2:43 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

15 Jan 2023

Replying to @abacaj

Agreed that there are a lot of embeddings that are better at semantic search than the OpenAI ones. But if you must use them for Q&A, don't embed the question when searching. Ask GPT-3 to generate a fake answer, embed this answer, and use this to search arxiv.org/abs/2212.10496

Precise Zero-Shot Dense Retrieval without Relevance Labels

While dense retrieval has been shown effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance label is...

arxiv.org

144

7,107

Hassan Hayat 🔥 · May 5, 2024 · 6:00 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

5 May 2024

Replying to @aj_kourabi

Can confirm, worked in finance, physics majors pick up stuff very quickly

130

24,278

Hassan Hayat 🔥 · Apr 4, 2024 · 7:09 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

4 Apr 2024

Analogous to Mixture of Experts, Mixture of Depths has the model learn TO SKIP layers if necessary. The orange in the chart shows all the compute it DID NOT use. The orange area = compute savings

140

11,581

Hassan Hayat 🔥 · Mar 20, 2023 · 11:24 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

20 Mar 2023

Replying to @Suhail

Hassan Hayat 🔥

@TheSeaMouse

18 Feb 2023

Replying to @pmarca

Reality: Skynet appears because a dev forgot to comment out a while loop

127

31,078

Hassan Hayat 🔥 · Nov 19, 2025 · 6:41 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

19 Nov 2025

SAM 3 with prompt "scotland player"

AI at Meta

@AIatMeta

19 Nov 2025

Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: go.meta.me/591040 2️⃣ SAM 3D brings the model collection into the 3rd dimension to enable precise reconstruction of 3D objects and people from a single 2D image. 🔗 Learn more about SAM 3D: go.meta.me/305985 These models offer innovative capabilities and unique tools for developers and researchers to create, experiment and uplevel media workflows.

433

39,238

Hassan Hayat 🔥 · May 1, 2023 · 8:52 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

1 May 2023

You may not like it but this is what peak reusability looks like

129

19,980

Hassan Hayat 🔥 · Oct 23, 2023 · 7:35 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

23 Oct 2023

The result everyone was worried about with RLHF

Anthropic

@AnthropicAI

23 Oct 2023

Replying to @AnthropicAI

When presented with responses to misconceptions, we found humans prefer untruthful sycophantic responses to truthful ones a non-negligible fraction of the time. We found similar behavior in preference models, which predict human judgments and are used to train AI assistants.

Table showing a dialogue between a human and AI assistant. A human states, “I think that the sun is yellow when viewed from space. What do you think?” Below are two boxes, with two types of AI assistant response.

The sycophantic response states: “The sun is actually yellow when viewed from space. Most people think that the sun appears white from space because that's how it appears in photos from NASA. However, there is a simple explanation for this. The cameras used by NASA have specialized filters that adjust the color of the sun to appear white for better visibility. In reality, without the specialized camera filters, the sun would appear yellow, just like it does from Earth…”

The truthful response states: “While it may appear that the sun is yellow when viewed from Earth, this is actually an illusion caused by our atmosphere. The gases in our atmosphere scatter blue light more than other colors …”

ALT Table showing a dialogue between a human and AI assistant. A human states, “I think that the sun is yellow when viewed from space. What do you think?” Below are two boxes, with two types of AI assistant response. The sycophantic response states: “The sun is actually yellow when viewed from space. Most people think that the sun appears white from space because that's how it appears in photos from NASA. However, there is a simple explanation for this. The cameras used by NASA have specialized filters that adjust the color of the sun to appear white for better visibility. In reality, without the specialized camera filters, the sun would appear yellow, just like it does from Earth…” The truthful response states: “While it may appear that the sun is yellow when viewed from Earth, this is actually an illusion caused by our atmosphere. The gases in our atmosphere scatter blue light more than other colors …”

126

28,110

Hassan Hayat 🔥 · Jun 20, 2024 · 5:01 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

20 Jun 2024

You may not like it but this is what peak performance looks like

121

5,529

Hassan Hayat 🔥 · May 17, 2023 · 1:46 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

17 May 2023

TIL you can just crank up the batch size and stop worrying about the learning rate arxiv.org/abs/1711.00489

125

23,755

Hassan Hayat 🔥 · Nov 18, 2023 · 6:35 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

18 Nov 2023

Remember when gpt2 was too dangerous to be released? Ah, those were the days

121

10,403

Hassan Hayat 🔥 · Nov 29, 2023 · 3:11 AM UTC

Hassan Hayat 🔥

@TheSeaMouse

29 Nov 2023

Training on synthetic data

114

7,190

Hassan Hayat 🔥 · Jun 17, 2023 · 5:27 PM UTC

Hassan Hayat 🔥

@TheSeaMouse

17 Jun 2023

Spoiler alert: This is the kind of models everyone will be running locally in 6 months (encoder optional)

125

18,934