Aydyn Tairov (@tairov) | nitter

Aydyn Tairov @tairov

11 Sep 2023

I was really excited that Mojo became publicly available and thinking which project can I implement to learn Mojo concepts. Since I have already ported llama2.c to pure Python, I decided why not try to port llama2.py to Mojo now 😀 And here is what I got...

9

62

558

346,046

Aydyn Tairov @tairov

29 Oct 2023

The out-of-the-box features of @Modular_AI's Mojo are just incredible. We applied unrolling and now llama2.🔥 outperforms @ggerganov's llama.cpp by almost 20% in CPU inference speed. github.com/tairov/llama2.moj…

11

46

355

192,984

Aydyn Tairov @tairov

24 Nov 2023

BREAKING: I've implemented a prototype of a cutting-edge Q-Learning algorithm on Mojo 🔥, and now it's working 35,000x times faster than any existing (!) implementations. Thanks to Mojo's incredible feature that allows transparently import any Python modules! I really hope this breakthrough won't lead to another wave of OAI leadership layoffs, though I'm not ruling this out, because it is an amazing leap forward Check this out: github.com/tairov/QStarLearn…

14

29

228

106,863

Aydyn Tairov @tairov

11 Sep 2023

llama2 inference in a pure Mojo 🔥 github.com/tairov/llama2.moj… I found the SIMD Mojo primitives really interesting feature, since it helped to improve pretty awful performance of Python solution almost 250x times.

GitHub - tairov/llama2.mojo: Inference Llama 2 in one file of pure 🔥

Inference Llama 2 in one file of pure 🔥. Contribute to tairov/llama2.mojo development by creating an account on GitHub.

2

31

178

21,787

Aydyn Tairov @tairov

17 Oct 2023

Llama2.mojo performance on Mac is right up there with llama.cpp (!!!), and even outperforms plain C in many cases. This is insane!

8

18

137

23,139

Aydyn Tairov @tairov

16 Nov 2023

Replying to @tldraw

INSANE !

5

12

115

15,267

Aydyn Tairov @tairov

11 Sep 2023

Replying to @tairov @karpathy

5

9

110

11,232

Aydyn Tairov @tairov

11 Sep 2023

Internally I used vectorisation helpers for matmul so that now Mojo solution can beat original llama2.c by @karpathy by 20% I think there is still some room for further improvements.

3

3

59

9,392

Aydyn Tairov @tairov

29 Oct 2023

Now it achieves over 1000 tokens per second for inference on M1 Max! 🚀 Big shoutout to our contributor Michael Kowalski for helping make this possible: github.com/tairov/llama2.moj…

improve speed with fused matrix multiplications by mikowals · Pull Request #69 · tairov/llama2.mojo

There are a couple of sets of matrix multiplications where the A matrix can be shared. Some loops and reads are removed by this fusing fusing them. On M1 Pro Macbook the tokens / sec improvement I...

3

51

4,347

Aydyn Tairov @tairov

17 Oct 2023

I've got early access to the Mojo SDK 🔥 for Mac from @Modular_AI . And of course I always wanted to inference baby llama on pure Mojo on Apple Silicon.. True story not only on Mojo 😉 So far, results are mind-blowing! Here are some benchmarks

6

2

37

3,759

Aydyn Tairov @tairov

3 Nov 2023

Another milestone! We're baselining Mojo with Mojo 😉

Modular

@Modular

3 Nov 2023

Mojo 🔥 0.5.0 is released! 🚀 Even more epic updates unleashed! 😱 Checkout the highlights ⬇️ or read the full changelog here & happy weekend hacking! 👩🏼‍💻 ➡️ bit.ly/463X9eq

1

2

35

3,145

Aydyn Tairov @tairov

11 Nov 2023

GitHub's integration of GPU enabled M1 Apple Silicon hosts for action runners may have flown under the radar, but its implications are vast. It's a strong indicator of Apple Silicon's rising adoption among developers, hinting at a future where M2 and M3 become central to the development ecosystem. #GitHubUniverse #AppleSilicon

1

3

24

1,817

Aydyn Tairov @tairov

17 Oct 2023

Thanks @Modular_AI , 🔥 finally my open source contributions materialized in the form of nice color merch T-shirt 😎 👕

1

2

24

2,325

Aydyn Tairov @tairov

11 Sep 2023

Wow! This is exciting! Thanks @Modular_AI for appreciating my efforts & congrats on the public release of Mojo! Seems that my port of llama2 inference is truly a "First prober ai written in Mojo" 😀

Modular

@Modular

11 Sep 2023

The first crack at llama2.🔥 is here 🚀 A Mojo 🔥 community member - Mojician - did a simple port from Python to Mojo, and shows its already 20% faster than Karpathys llama.c implementation 😱 How much faster can it go? 📈

1

20

2,267

Aydyn Tairov @tairov

3 Nov 2023

Replying to @Modular @Modular_AI

Nice work! llama2.🔥 bumped Mojo version to 0.5.0, and we got few % boost. github.com/tairov/llama2.moj…

18

776

Aydyn Tairov @tairov

11 Sep 2023

Replying to @andrew_n_carr

Oh, man! I've got so much fun. I'm encouraging you to finish & share it anyway. I'd love to check it out!

18

6,010

Aydyn Tairov @tairov

19 Oct 2023

Replying to @Modular @Modular_AI

Exciting! I got early access to the Mojo SDK (Mac) week ago, and compared it's performance on baby-llama inference. Mojo VS Rust, C, Cpp, Go, Zig, and Julia. In total 12 implementations on 7 languages x 3 model x 30 rounds Check this out engiware.com/benchmark/llama…

Llama2 Ports Extensive Benchmark Results on Mac M1 Max

Mojo 🔥 almost matches llama.cpp speed (!!!) with much simpler code and beats llama2.c across the board in multi-threading benchmarks

2

4

19

2,532

Aydyn Tairov @tairov

29 Oct 2023

My shock in the shock! Appreciate @lexfridman highlighting our efforts – exciting to receive such a feedback!

Lex Fridman

@lexfridman

29 Oct 2023

Replying to @clattner_llvm

Wow, impressive!

1

2

16

1,430

Aydyn Tairov @tairov

22 Oct 2023

Another moment of fame, now on livestream with awesome @Modular_AI team 😎! Really exciting week! piped.video/watch?v=bMb7ieOD…

1

3

15

565

Aydyn Tairov @tairov

16 Oct 2023

Baking something cool.. Now on Mac 👨‍💻

1

1

15

958

Aydyn Tairov @tairov

13 Oct 2023

I have the honor of authoring the first ever guest-post on the Modular AI blog, about my journey with porting #llama2 inference into the #mojo lang Kudos to @shshnkp for the incredible cooperation and support with preparing this article!

Modular

@Modular

13 Oct 2023

New blog post by Mojician 🔥 and guest contributor @tairov 💯⬇️ Aydyn discusses his journey from discovering Mojo 🔥 to implementing llama2.🔥 which has over 1.2k stars 🤩 on GitHub! 🚀 modular.com/blog/community-s…

1

1

15

857

Aydyn Tairov @tairov

12 Sep 2023

llama2.🔥 on YouTube piped.video/D-XE5IeZrKg?si=UObV…

1

15

1,694

Aydyn Tairov @tairov

17 Oct 2023

It'll disrupt not just AI/ML development industry, but much more.. It's a huuge story

1

11

1,637

Aydyn Tairov @tairov

18 Oct 2023

Full report is ready engiware.com/benchmark/llama…

Llama2 Ports Extensive Benchmark Results on Mac M1 Max

Mojo 🔥 almost matches llama.cpp speed (!!!) with much simpler code and beats llama2.c across the board in multi-threading benchmarks

3

13

4,019

Aydyn Tairov @tairov

13 Sep 2023

Thanks to PR from Modular team member, parallelize works in llama2.🔥github.com/tairov/llama2.moj… Why not compare parallel execution with llama2.c? And... llama2.c strikes back, now with OMP...

1

1

12

1,248

Aydyn Tairov @tairov

6 Dec 2023

👀 "MLX has a Python API which closely follows NumPy. MLX also has a fully featured C++ API which closely mirrors the Python API" Yet another attempt to fix ML models usability by implementing Python libs written on C++. Now from Apple research

Awni Hannun

@awnihannun

5 Dec 2023

Just in time for the holidays, we are releasing some new software today from Apple machine learning research. MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!) Code: github.com/ml-explore/mlx Docs: ml-explore.github.io/mlx/bui…

3

12

992

Aydyn Tairov @tairov

26 Oct 2023

From slowest to fastest — my Python and Mojo 🔥 ports of @karpathy llama2.c interestingly went to opposite ends of the perf spectrum. Recently I got a PR for #python using pypy & codon compilation on llama2-py that boosted it ~50x! github.com/tairov/llama2.py/…

Codon support. 74 times speedup compared to Python by dmahurin · Pull Request #5 · tairov/llama2.py

This set of changes adds ability to compile llama2.py with Codon. The result is a 74 X speedup! As Codon currently has no 'struct', this required implementing trivial struct functio...

2

11

605

Aydyn Tairov @tairov

20 Sep 2023

Repo got 1K stars. When you truly understand you have made a meaningful contribution to the #mojo community when Github offers to nominee a successor

9

498

Aydyn Tairov @tairov

16 Nov 2023

whisper.cpp now supports HF distill-whisper + full CUDA & Apple Metal offloading, that brings almost a 4x boost to transcribing on fp model

Georgi Gerganov

@ggerganov

15 Nov 2023

whisper.cpp v1.5.0 github.com/ggerganov/whisper…

1

11

952

Aydyn Tairov @tairov

4 Dec 2023

Despite my best efforts to attend #ModCon onsite this year, I sadly couldn't make it happen. But the event can't truly go on without the first ever Mojician v-attendance! 😅 Wishing everyone an insightful conference!

muhtasham

@Muhtasham9

4 Dec 2023

@tairov 🔥🔥🔥#modcon23

1

10

697

Aydyn Tairov @tairov

22 Oct 2023

Seems that @samlakig took this challenge very seriously. I'm eager to see the details of what you can accomplish!

sam laki @samlakig

22 Oct 2023

I think I'll need to output more debug symbols/ do you think better to read the report straight from perf

perf debug of llama 2 just showing the main function lol

ALT perf debug of llama 2 just showing the main function lol

1

1

10

1,031

Aydyn Tairov @tairov

17 Oct 2023

I tried to optimize all the #llama2c ports for max performance. Some don't support multithreading so the comparisons aren't completely apples-to-apples. But it's clear Mojo is here to stay

1

8

1,393

Aydyn Tairov @tairov

28 Nov 2023

It's turned out that Mistral's team literally is not using Mojo to speedup training and inference 35,000x times and raised €120+M Investors what are you doing 😢 this is horrendous!

Jeremy Howard

@jeremyphoward

28 Nov 2023

Mistral's team literally just used learned weights over data instead of programmed rules and raised 120+M. Investors what are you doing 😢 this is horrendous

2

2

10

1,223

Aydyn Tairov @tairov

17 Oct 2023

I secured early access to the Mojo SDK on Mac before general release. Put all #llama2c ports through extensive benchmarks across 7 languages and 12 variations. Crafted custom benchmarking framework to test performance. Quite intriguing battle on M1 Mac - results are telling 👇

1

9

474

Aydyn Tairov @tairov

29 Oct 2023

Interestingly , within just 2 weeks , we are not using anymore C implementation as a baseline. I’m sure there are still discoveries to be made 💥

1

9

1,414

Aydyn Tairov @tairov

22 Nov 2023

Mojo talk on LLVM Dev conference is available on YT @Modular_AI team unpacked the end-to-end reasoning behind the AI Engine and the Mojo language piped.video/watch?v=SEwTjZvy…

1

1

8

697

Aydyn Tairov @tairov

17 Oct 2023

I built a test env to benchmark all @karpathy's #llama2c ports, including Mojo, Zig, Julia, Rust & #llamacpp converter by @ggerganov. Ran inference across 3 baby llama models in 30 rounds (multi/single threaded). Check out the full report engiware.com/benchmark/llama…

Llama2 Ports Extensive Benchmark Results on Mac M1 Max

Mojo 🔥 almost matches llama.cpp speed (!!!) with much simpler code and beats llama2.c across the board in multi-threading benchmarks

1

8

850

Aydyn Tairov @tairov

8 Dec 2023

It seems that the Mistral team is setting a new trend in OS LLMs. MoE — mixture of experts . If I understand correctly, GPT-4's impressive performance largely stems from a similar technique. In the backend, it features eight 'heads' or 'experts', each to be 250 billion parameter LLMs similar to GPT-3.5

Andrej Karpathy

@karpathy

8 Dec 2023

New open weights LLM from @MistralAI params.json: - hidden_dim / dim = 14336/4096 => 3.5X MLP expand - n_heads / n_kv_heads = 32/8 => 4X multiquery - "moe" => mixture of experts 8X top 2 👀 Likely related code: github.com/mistralai/megablo… Oddly absent: an over-rehearsed professional release video talking about a revolution in AI. If people are wondering why there is so much AI activity right around now, it's because the biggest deep learning conference (NeurIPS) is next week.

8

777

Aydyn Tairov @tairov

20 Nov 2023

Next move by 4D chess grandmaster?

2

6

332

Aydyn Tairov @tairov

12 Sep 2023

Replying to @aniketvartak @Modular_AI @karpathy

The thing is llama2.mojo is also was implemented for understanding Mojo concepts and to have a real-world example. However that doesn't mean both projects couldn't be evolved further to squeeze everything you can out of the hardware while trying to maintain brevity

1

7

342

Aydyn Tairov @tairov

25 Oct 2023

Kudos to the @ziglang community for improving and benchmarking llama2 inference in Zig under Apple M1/2 ! The llama2.zig implementation has solid single-threaded performance - it may be the fastest single-threaded inference of tiny-llama models so far on Macs. Surprisingly, no boost on multi-threaded mode? github.com/karpathy/llama2.c…

Aydyn Tairov @tairov

17 Oct 2023

Replying to @tairov

Llama2.mojo performance on Mac is right up there with llama.cpp (!!!), and even outperforms plain C in many cases. This is insane!

1

1

8

1,198

Aydyn Tairov @tairov

29 Oct 2023

Replying to @justthisguy @Modular_AI @ggerganov

llama2.🔥 is a port of @karpathy's llama2.c, biggest supported model so far is TinyLlama with 1.1B parameters. I hope soon we can run bigger quantized models as well

1

6

1,528

Aydyn Tairov @tairov

3 Dec 2023

This is Wow

Brendan Bycroft

@BrendanBycroft

2 Dec 2023

Project #2: LLM Visualization So I created a web-page to visualize a small LLM, of the sort that's behind ChatGPT. Rendered in 3D, it shows all the steps to run a single token inference. (link in bio)

7

684

Aydyn Tairov @tairov

11 Sep 2023

Replying to @clattner_llvm

This is what happens when you launch a new exciting technology that many enthusiasts have been eager to try out right before the weekend 👨‍💻

7

722

Aydyn Tairov @tairov

2 Dec 2023

The Mojo truck is unstoppable! It now outperforms the Porsche 911 ! I think this could be a great trailerbuster for ModCon'23 😎

7

506

Aydyn Tairov @tairov

22 Oct 2023

In case of anyone still doubt the importance of diving into learning and entering the AI field now. 👇

Andrej Karpathy

@karpathy

21 Oct 2023

Are you kidding? There has never been a green pasture of this size with this low barrier to entry.

1

6

570

Aydyn Tairov @tairov

22 Nov 2023

Replying to @tairov @Modular_AI

I am particularly pleased that my project llama2.🔥 was also mentioned as an example of what can be achieved with inbuilt Mojo features. github.com/tairov/llama2.moj…

2

2

7

462

Aydyn Tairov @tairov

17 Oct 2023

Replying to @__morse @Modular_AI

Not yet, we just got access to sdk, more exciting projects to accomplish soon. So far llama2.mojo support 1.1B tiny llama github.com/tairov/llama2.moj…

GitHub - tairov/llama2.mojo: Inference Llama 2 in one file of pure 🔥

Inference Llama 2 in one file of pure 🔥. Contribute to tairov/llama2.mojo development by creating an account on GitHub.

1

1

7

172

Aydyn Tairov @tairov

1 Oct 2023

I think it's worthwhile to share, some interim results we got with llama2.🔥 speedup. Our incredible github contributors baked a draft PR. And here is what we have 👇

1

7

594

Aydyn Tairov @tairov

17 Oct 2023

I'll double check all results and will prepare a write-up with full details.

1

7

1,518

Aydyn Tairov @tairov

19 Oct 2023

Replying to @altryne

github.com/tairov/llama2.moj…

GitHub - tairov/llama2.mojo: Inference Llama 2 in one file of pure 🔥

Inference Llama 2 in one file of pure 🔥. Contribute to tairov/llama2.mojo development by creating an account on GitHub.

2

7

868

Aydyn Tairov @tairov

12 Oct 2023

#GPT4V can understand the essence of recursion even from a photo. Unbelievable!

3

1

6

690

Aydyn Tairov @tairov

25 Nov 2023

Replying to @rasbt

If anyone have it coded, I'm eager to benchmark with my Mojo "reference" implementation github.com/tairov/QStarLearn…

GitHub - tairov/QStarLearning.mojo

Contribute to tairov/QStarLearning.mojo development by creating an account on GitHub.

5

1,012

Aydyn Tairov @tairov

17 Nov 2023

Replying to @lexfridman

CEO/Board misalignment

4

2,210

Aydyn Tairov @tairov

26 Oct 2023

Hope the next year's Google Presentation be like this ( @Modular_AI ? )

Modular

@Modular

24 Oct 2023

🦙 .🔥

1

6

1,685

Aydyn Tairov @tairov

13 Sep 2023

It shows additional details which loops were vectorized by gcc compiler So far it seems that the very first comparison llama2.c vs Mojo was fair gcc is aggressively vectorizing all loops it can find 😀

2

5

403

Aydyn Tairov @tairov

25 Nov 2023

Replying to @elonmusk

Thanks for sharing! I have it already implemented on Mojo 🔥, so now it's 35,000x times faster than any other implementations github.com/tairov/QStarLearn…

GitHub - tairov/QStarLearning.mojo

Contribute to tairov/QStarLearning.mojo development by creating an account on GitHub.

1

2

6

383

Aydyn Tairov @tairov

22 Dec 2023

Who remember this? From 2008, the Google Search Appliance led the way in on-prem search solutions for enterprises deployed as a rack form factor black box device. It was discontinued post-2018

1

6

370

Aydyn Tairov @tairov

8 Nov 2023

Is it only me or OpenAI killed N+ startups I was going to develop ? 🤓 #OpenAIDevDay

1

6

498

Aydyn Tairov @tairov

19 Oct 2023

I discovered podcasts on X. Today I was invited to a really nice one! Thanks @altryne for the opportunity to share highlights & my experience with early release of Mojo SDK on Mac from @Modular_AI . PS. My moment of fame starts at the 59th min 🔊

Alex Volkov @ AI Engineer

@altryne

19 Oct 2023

T-minus 2 hours for @thursdai_pod live recording, and as always, if you can't make it to the live one, make sure you're subscribed on thursdai.news to receive the episode in newsletter and podcast form

5

499

Aydyn Tairov @tairov

4 Oct 2023

Looking forward. 🔥 on Mac, one ♥️

Modular

@Modular

4 Oct 2023

Mojo 🔥 is coming to Mac 💻 very soon 😱 Here’s a little sneak peak of us testing LLama2.🔥 out of the box by @tairov. Look for this to drop in the next couple of weeks 💯🚀

ALT Mojo 🔥 on Mac 💻

5

603

Aydyn Tairov @tairov

27 Nov 2023

Replying to @ggerganov

Sounds cool, but I think it might be hard to compete with AWS on its own territory 😀 from the costs perspective . They already have AWS bedrock service rolling out , it’s kind of API to many LLM models where you pay for tokens used

2

1

5

3,079

Aydyn Tairov @tairov

29 Oct 2023

Make programming great again!

Chris Lattner

@clattner_llvm

29 Oct 2023

Epic work, the thing I love about this is how small and clean the code is - literally reimplementing everything down to the metal instead of depending on thick layers of magic.

5

1,721

Aydyn Tairov @tairov

29 Oct 2023

Replying to @lotsofshards @Modular_AI @ggerganov @clattner_llvm

I think everyone in the AI world should now about this, so feel free to be a first retweeter 😉

1

5

1,527

Aydyn Tairov @tairov

6 Dec 2023

Gemini, the avant-garde and trailblazing multi-modal virtuoso of language models, state-of-the-art titan, infused with wit and wisdom far beyond its digital peers. It's an inventive, quick-witted behemoth, eclipsing predecessors with its sterling adaptability! Gemini:

1

4

233

Aydyn Tairov @tairov

17 Oct 2023

Have you ever wanted to benchmark a baby Llama2 models in 12 programming languages? No? Well, now you can! github.com/tairov/lamatune

1

1

5

327

Aydyn Tairov @tairov

13 Sep 2023

There were some debates regarding fairness of C vs Mojo comparison. I was in doubt was it fair or not , since in Mojo I deliberately introduced SIMD operations. After some research I found an intersting gcc switch `-fopt-info-vec `

1

4

733

Aydyn Tairov @tairov

28 Nov 2023

Replying to @var_epsilon

I didn't know this model capable of generating photo-realistic images of supercar.

3

1,659

Aydyn Tairov @tairov

29 Oct 2023

Replying to @pavel_4_ai @Modular_AI @ggerganov

I wouldn't say llama.cpp is 10x faster on M1 Metal. Probably it has 2x boost. I'm eager to benchmark Mojo with GPU support, once it released.

1

5

1,004

Aydyn Tairov @tairov

29 Oct 2023

Replying to @tokenbender

Hey @4evaBehindSOTA ! Wanna give it "Roud 3" ? :) We added unrolling improvements, now it hits 1000 tok/s for stories15M. Pull latest changes and use -j 6 seems that with threads = 6 it works even better

1

5

232

Aydyn Tairov @tairov

26 Oct 2023

Solid optimization lifting #python port from slowest to more competitive! Great to see the Python community working hard on the perf challenge!

1

5

219

Aydyn Tairov @tairov

19 Oct 2023

Replying to @clattner_llvm

What about "tok/s" ? 😀

2

62

Aydyn Tairov @tairov

29 Oct 2023

Replying to @lexfridman @clattner_llvm

Thank you for feedback! I'm sure there are even more achievements to come in this space

5

304

Aydyn Tairov @tairov

1 Oct 2023

We've reached another milestone with the support for the 1.1B TinyLlama, which can now generate advanced responses, like explanation of Pythagorean theorem or providing Python code for calculating the Fibonacci sequence. Impressive performance for a 4GB sized model!

5

315

Aydyn Tairov @tairov

4 Dec 2023

ModCon keynotes are live now piped.video/watch?v=VKxNGFhp… And another moment of fame for me 😎

ModCon23 Keynote Livestream

Join us for the ModCon ’23 Keynote which will highlight incredible ...

5

284

Aydyn Tairov @tairov

18 Sep 2023

Replying to @hnasr

Exactly. And this is how I leveraged SIMD primitives to speedup llama2 inference by 15-20% in comparison to C implementation github.com/tairov/llama2.moj…

1

5

579

Aydyn Tairov @tairov

24 Nov 2023

I’m eager to benchmark it with any other reference Q-Learning , as soon as one is available 🤓 This is probably the only opinionated prototype so far. I'm afraid the competitors don't stand a chance either way

Modular

@Modular

24 Nov 2023

Q-Learning works better in Mojo🔥🚀Amazing work @tairov 💯

5

679

Aydyn Tairov @tairov

13 Sep 2023

462 vs 385 tok/s

5

240

Aydyn Tairov @tairov

24 Nov 2023

Here is real value for whoever come to comments: Modular is giving away free tickets to ModCon 2023 + swag 🎁. It seems it's still wide open and the competition is low! I see it's as a prime opportunity to implement some classic algo on Mojo for a solid chance at win. I’d love to see submissions of more advanced algos too like realistic Q-learning? :) Show what you can build on Mojo for the best shot. But hurry, the contest closes soon! nitter.app/Modular_AI/statu…

This tweet is unavailable

1

4

3,608

Aydyn Tairov @tairov

19 Oct 2023

#gpt4V pretending it's not involved in this mess.

1

5

476

Aydyn Tairov @tairov

15 Sep 2023

👀

1

5

601

Aydyn Tairov @tairov

21 Oct 2023

Replying to @karpathy

This already the last century. Let your llm agent bring you information in a convenient format, without the need to surf at all.

5

2,319

Aydyn Tairov @tairov

8 Nov 2023

Cloud-exits might become a trendy thing.

Ivan Velichko

@iximiuz

8 Nov 2023

It's absolutely crazy how cheap the computing power is if you stay outside of the Cloud 🤯 The usage of iximiuz Labs keeps growing, so I'm upgrading my bare metal servers. And I just doubled the fleet's CPU capacity with the price going from $44 to $53 per server per month.

1

2

646

Aydyn Tairov @tairov

6 Dec 2023

Gemini must be a beast . 85% performance on a typical Codeforces competition is wild! It’s like solving 4-5 medium/hard Leetcode problems within 2.5 hours.

Rémi Leblond @RemiLeblond

6 Dec 2023

So excited to share what the team and I have been working on these last months! #AlphaCode 2 is powered by Gemini and performs better than 85% of competition participants in 12 contests on Codeforces! More details at goo.gle/AlphaCode2 @GoogleDeepMind

1

4

413

Aydyn Tairov @tairov

29 Nov 2023

Replying to @dannypostma

Why not then switch to this model permanently ?

2

674

Aydyn Tairov @tairov

5 Dec 2023

Here is why stock inference implementations are not a good fit for production workloads. "The compute costs are eye-watering" (c) #ModCon23

3

4

368

Aydyn Tairov @tairov

15 Dec 2023

llama2-py improved significantly, meet llama2-numpy. The SLOC dropped 3x as well. Nicely implemented inference in 350 lines of almost pure Python 😀

Christophe Alexandre

@CertumIter

15 Dec 2023

Replying to @tairov @karpathy

Thank you Aydyn! I love the idea of a pure Python implementation with no dependence to an external package. I started from your version and introduced numpy: github.com/chris-ch/llama2-p… ... using the standard interpreter we get to 2x slower from 10x slower compared to C version.

2

5

658

Aydyn Tairov @tairov

6 Dec 2023

Meanwhile AMD is also presenting something, obviously for AI, I’m exhausted, have no time keep up with everything Hey Grok , maybe your qdrant based vector search can help summarise this? 😀 piped.video/live/tfSZqjxsr0M…

AMD Presents: Advancing AI

Join us to discover how AMD and its partners are powering the futur...

2

4

323

Aydyn Tairov @tairov

21 Nov 2023

Context is not all you need. As this research highlights, LLMs struggle with basic contextual understanding as the reasoning context grows more complex. Without a framework firmly grounding symbols in reality, model performance degrades. As it was demonstrated on OpenAI DevDays, techniques like prompt engineering and RAG remain crucial to overcoming these context evaluation failures. Interestingly, that means there are natural optimizations within context that can help information retrieval. However, models still lack natural understanding of real-world contextual relationships.

Greg Kamradt

@GregKamradt

21 Nov 2023

Claude 2.1 (200K Tokens) - Pressure Testing Long Context Recall We all love increasing context lengths - but what's performance like? Anthropic reached out with early access to Claude 2.1 so I repeated the “needle in a haystack” analysis I did on GPT-4 Here's what I found: Findings: * At 200K tokens (nearly 470 pages), Claude 2.1 was able to recall facts at some document depths * Facts at the very top and very bottom of the document were recalled with nearly 100% accuracy * Facts positioned at the top of the document were recalled with less performance than the bottom (similar to GPT-4) * Starting at ~90K tokens, performance of recall at the bottom of the document started to get increasingly worse * Performance at low context lengths was not guaranteed So what: * Prompting Engineering Matters - It’s worth tinkering with your prompt and running A/B tests to measure retrieval accuracy * No Guarantees - Your facts are not guaranteed to be retrieved. Don’t bake the assumption they will into your applications * Less context = more accuracy - This is well know, but when possible reduce the amount of context you send to the models to increase its ability to recall * Position Matters - Also well know, but facts placed at the very beginning and 2nd half of the document seem to be recalled better Why run this test?: * I’m a big fan of Anthropic! They are helping to push the bounds on LLM performance and creating powerful tools for the world * As a practitioner of LLMs, it’s important to build an intuition for how they work, where they excel and their limits * Tests like these, while not bulletproof, help showcase real world examples and get a feeling for how they work. The goal is to transfer this knowledge to productive use cases Overview of the process: * Use Paul Graham essays as ‘background’ tokens. With 218 essays it’s easy to get up to 200K tokens (repeated essays when necessary) * Place a random statement within the document at various depths. Fact used: “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.” * Ask Claude 2.1 to answer this question only using the context provided * Evaluate Claude 2.1s answer with GPT-4 using @langchain evals * Rinse and repeat for 35x document depths between 0% (top of document) and 100% (bottom of document) (sigmoid distribution) and 35x context lengths (1K Tokens > 200K Tokens) Next Steps To Take This Further: * For rigor, one should do a key:value retrieval step. However for relatability I did a San Francisco line within PGs essays for clarity and practical relevance * Repeat test multiple times for increased statistical significance Notes: * Amount Of Recall Matters - The model's performance is hypothesized to diminish when tasked with multiple fact retrievals or when engaging in synthetic reasoning steps * Changing your prompt, question, fact to be retrieved and background context will impact performance * The Anthropic team reached out and offered credits to repeat this test. They also offered prompt advice to maximize performance. It's important to clarify that their involvement was strictly logistical. The integrity and independence of the results were maintained, ensuring that the findings reflect my unbiased evaluation and are not influenced by their support. * This test cost ~$1,016 for API calls ($8 per million tokens)

1

4

660

Aydyn Tairov @tairov

30 Nov 2023

99% of TikTok influencers lose job/audience because of this ?

Jonathan Fischoff

@jfischoff

30 Nov 2023

“Animate Anyone” was released last night for making pose guide videos. Lets dive in. Paper: arxiv.org/abs/2311.17117 Project: humanaigc.github.io/animate-… 🧵1/

2

1

4

732

Aydyn Tairov @tairov

23 Nov 2023

Replying to @digicalidesign @Modular_AI

A Mojician, of course 🔥

1

3

184

Aydyn Tairov @tairov

13 Sep 2023

Replying to @Modular @Modular_AI

Hi @ylecun! 🙌 I've been diving deep into the new Mojo lang by implementing #Llama2 inference on it. We'd love to hear your insights on Mojo and its stated capabilities

1

1

4

347

Aydyn Tairov @tairov

25 Nov 2023

Replying to @lexfridman @michaelmalice

Stable Diffusion video in real-time is fascinating , look how it dressed Lex as a pirate on this interview!

3

1,329

Aydyn Tairov @tairov

25 Nov 2023

Ideal explanation of all aspects of LLMs, including the security concerns, in such a condensed form. It's brilliant how succinctly the information is conveyed. @karpathy's videos are examples of almost perfect compression of huge ML topics into an accessible form.

Andrej Karpathy

@karpathy

23 Nov 2023

New YouTube video: 1hr general-audience introduction to Large Language Models piped.video/watch?v=zjkBMFhN… Based on a 30min talk I gave recently; It tries to be non-technical intro, covers mental models for LLM inference, training, finetuning, the emerging LLM OS and LLM Security.

3

663

Aydyn Tairov @tairov

1 Oct 2023

Now, on average we're performing slightly better than multithreaded #llama2c github.com/tairov/llama2.moj… . We're able to further improve vectorization/parallelization of transformers forward pass

1

2

362

Aydyn Tairov @tairov

28 Nov 2023

If you want to avoid this kind of oversight in the future, make sure you come to ModCon '23 😉

1

2

237

Aydyn Tairov @tairov

5 Dec 2023

Replying to @radamar @Gradio

Man, are you counting how many startups closed because you implemented their primary product on Gradio ? 😅

3

245

Aydyn Tairov @tairov

2 Dec 2023

How to run your fine-tuned LM app on top of underlying LLM-OS kernel

Zhiqiu (Oscar) Xu @oscar_zhiqiu_xu

1 Dec 2023

You don’t have to train from scratch whenever developing a smaller model of an existing model family. Sharing our latest work - “Initializing Models with Larger Ones” arxiv preprint: arxiv.org/abs/2311.18823 code: github.com/OscarXZQ/weight-s…

3

440

Aydyn Tairov @tairov

29 Oct 2023

Best place to keep up with latest changes in AI world 👍

Alex Volkov @ AI Engineer

@altryne

29 Oct 2023

Replying to @ptsi @tairov

Let's gooo! We actually had Aydyn on ThursdAI and talked about LlaMa.🔥 sub.thursdai.news/p/thursdai…

3

574

Aydyn Tairov @tairov

22 Oct 2023

Replying to @tunguz

Highly likely it'll be Mojo that beats even llama.cpp on some cases with much simpler code engiware.com/benchmark/llama…

Llama2 Ports Extensive Benchmark Results on Mac M1 Max

Mojo 🔥 almost matches llama.cpp speed (!!!) with much simpler code and beats llama2.c across the board in multi-threading benchmarks

3

134