@OpenAI, ex @Google. Interested in Science, Psychology, Investing. Good Thoughts, Good Words, Good Deeds.

SF Bay Area, CA
Never imagined MUM would end up in I/O when we first started it ;-) ;-)
Multitask Unified Model (MUM) — our latest AI milestone — has the potential to transform how Google helps you with complex information tasks. #GoogleIO
7
7
112
Got published in Nature Communications :D nature.com/articles/s41467-0… With awesome collaborators: @alvin_rajkomar Eric Loreaux, Yuchen Liu, Jonas Kemp, Benny Li, Ming-Jun Chen, Yi Zhang & @Mysiak ...
6
5
59
Extremely proud to have pioneered large scale distillation for Maverick and really delighted to be working alongside an extremely talented team. We truly hope the OSS community enjoys the fruits of our labour.
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model with 16 experts. • Industry-leading context window of 10M tokens. • Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks. Llama 4 Maverick • 17B-active-parameter model with 128 experts. • Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image. • Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks. • Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters. • Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena. These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight. Read more about the first Llama 4 models, including training and benchmarks ➡️ go.fb.me/gmjohs Download Llama 4 ➡️ go.fb.me/bwwhe9
6
4
55
5,201
“To get what you want, you have to deserve what you want. The world is not yet a crazy enough place to reward a whole bunch of undeserving people.” — Charlie Munger RIP
Highlight reel of Charlie Munger spitting one banger after another.
4
33
17,554
Neat quotes from technical documentation. "Reusing the same [PRNG] state will cause sadness and monotony, depriving the end user of lifegiving chaos." jax.readthedocs.io/en/latest…
5
35
"I have always found it strange that a stock that falls is seen as risky but a stock that fell (so in the past) becomes an opportunity." - @FromValue in an interview with @InvestmentTalkk
1
26
Really proud to present this model to the world and really excited on what is coming ahead 🔥🦾🚀
BREAKING: Meta's Llama 4 Maverick just hit #2 overall - becoming the 4th org to break 1400+ on Arena!🔥 Highlights: - #1 open model, surpassing DeepSeek - Tied #1 in Hard Prompts, Coding, Math, Creative Writing - Huge leap over Llama 3 405B: 1268 → 1417 - #5 under style control Huge congrats to @AIatMeta — and another big win for open-source! 👏 More analysis below⬇️
2
27
3,350
Congratulations to dear friends @YiTayML @PiotrPadlewski @DaniYogatama and everyone at Reka AI for an amazing multimodal model in such a short time! Eagerly looking forward for more awesomeness ahead!
We are excited to share Reka Flash ✨, a new state-of-the-art 21B multimodal model that rivals Gemini Pro and GPT 3.5 on key language & vision benchmarks 📈. We've trained this model from scratch and ground zero with a small (but amazingly capable team 🧙‍♂️) and relatively finite resources. We're amazed at how strong it is 🦾. I'm proud of our financially optimal LLM team. Abandoning one's comfort zone is surely difficult and having to redo things from scratch is often scary & daunting. Many things in the wilderness don't work from the get go and it was often a huge pain in the neck 😢. I should write a separate post someday of how much we have we've had to rebuilt (and suffered 🤣). Everything from robust training infra, proper (human) evaluation pipelines and proper RLHF setups. I am thankful of the crazy talented team we have here ☺️. Meanwhile, our largest most capable model Reka-Core is finishing soon and we're already very excited by early results 📈. More to come very soon! 9 months in. Excited to be back at the frontier 🔥. Check out our blogpost here: reka.ai/reka-flash-an-effici…
2
3
23
6,716
“I never lose. I either win or learn.” — Nelson Mandela
2
21
Cooking now!
5
2
20
3,699
#Tensorflow in the #Numpy API - great work by @_agarwal_ashish, Wang Peng, Akshay Modi and team!! Works beautifully with #Trax ! Check it out :-)
New in tf-nightly: the NumPy API. - GPU and TPU-accelerated NumPy code - Interoperable with the rest of the TF ecosystem Documentation: tensorflow.org/api_docs/pyth…
1
3
20
So many gems in here compiled by @polina_marinova The Profile Dossier: Hamdi Ulukaya, the Shepherd-Turned-Billionaire CEO by @ProfileRead theprofile.substack.com/p/th…
1
6
15
Also reminds of Munger’s maxim of not taking a side on a debate unless you can put the opposite argument better than the best supporter of the said counter argument. (This also led me without an opinion on most topics 🙃)
Daniel Dennett (28th March 1942 - 19th April 2024) RIP Dennett on how to criticise wisely
1
14
1,489
Congratulations to the talented team @achowdhery @IrwanBello @real_ioannis
Today we launch Asimov. Asimov is our code research agent that is best-in-class in codebase comprehension. It is built for teams, built for enterprises, and built to remember. We use it everyday to accelerate our velocity and streamline distributed ops. Link below to sign up for waitlist.
14
2,364
"The history of mathematics is a history of horrendously difficult problems being solved by young people too ignorant to know they are impossible" — Freeman Dyson 1/n
1
1
14
A friend quoted that the hardest part about designing a nuclear power plant is how to assign parking spots -- the LLM/DL equivalent of that is the config system :P
3
14
2,051
Replying to @YiTayML
Truly the Bell Labs of its day (possibly still!)
1
14
1,832
Replying to @vitaliyk
"Life is a tragedy for those who feel, but a comedy to those who think."
11
Just read @vardi 's thought provoking thoughts "The Sand-Heap Paradox of Privacy and Influence" in the recent CACM. cacm.acm.org/magazines/2021/… The irony of the last sentence "Follow me on ..." and my own post on Twitter is duly noted.
2
3
10
T5X paper is now on ArXiv! arxiv.org/abs/2203.17189
Scaling Up Models and Data with and deepai.org/publication/scali… by Adam Roberts et al. including @afrozenator, @ChenZhuo19 #NeuralNetwork #OpenSource
1
1
10
"Von Neumann didn't say anything but after five minutes he raised his hand. When I called on him he went to the blackboard and proceeded to write down the proof. After that I was afraid of von Neumann." "How To Solve It" 2nd ed. (1957), p. xv, George Pólya
10
Replying to @borisdayma
Data repeats!
1
10
4,008
A midday nap has to rank as one of the most understated pleasures in life.
1
10
650
Good guy Geoffrey Hinton dropping some zingers in this interview -- piped.video/watch?v=giT0ytyn… The whole thing is well worth a listen, but here's what struck me: 1/3
2
1
10
2,158
It's straightforward to work hard if you have clearly defined, externally imposed goals, as you do in school. ... What I've learned since I was a kid is how to work toward goals that are neither clearly defined nor externally imposed.
1
8
Very well done folks!
Today we're releasing all Switch Transformer models in T5X/JAX, including the 1.6T param Switch-C and the 395B param Switch-XXL models. Pleased to have these open-sourced! github.com/google-research/t… All thanks to the efforts of James Lee-Thorp, @ada_rob, and @hwchung27
8
You’ll be missed that’s for sure! All the best!!
6
8,039
Present your ideas "To illuminate and not to impress" - a fact that gets lost in too many research papers and presentations.
Now we have Terence Parr speaking about the role of visualization in ML research
1
4
Replying to @AndrewRangeley
You may know this already, but I found Barry Diller's chats with Reid Hoffman interesting in this regard. Their M/O seems to be to start with a fresh take on things, a clean slate -- given that pedigree, this falls into a pattern. Transcript of Part 2 - mastersofscale.com/wp-conten…
1
7
Replying to @_arohan_ @AIatMeta
Wohooooooo!!!!!!! Really looking forward to this!!!
1
4
371
Very sad to see you leave, I wish we’d worked more closely together! Thank you for helping push the boundaries while you were here :-) And all the very best for all that lies ahead!
6
584
Quite an impressive achievement by friends at @RekaAILabs - congratulations @YiTayML , @PiotrPadlewski @DaniYogatama and all!
Our @RekaAILabs Tech Report / Paper is out! 🔥 Tech reports with completely no information are kinda boring so we’re revealing some interesting information on how we train our series of Reka models including tokens, architecture, data & human evaluation workflows. 😃 We tried our best to give a behind-the-scenes experience 😊. In particular, if you enjoyed my previous blog post about training LLMs in the wilderness, there’s a dedicated section on that in this report! 🌴 We can’t disclose literally everything but we tried our best to make it interesting, I promise. 🙏 Here’s a rundown summary of some of the highlights. 🔹Edge and Flash are outrageously strong 7B and 21B models. They are trained on 4.5-5T tokens in total. Also, they have been improved significantly since their first public appearance! They outperform many popular faces. Some data mixture information is in the report. 🔹We discuss our internal human evaluation workflow, prompt distribution, and how we use Core for model development and automatic evaluation. 🔹We describe our infrastructure setup for training large models, quantifying node failures, and report loss curves for training our models. 🔹Aside from the hardware lottery, we also show how this affects node stability across time. Once we were told our cluster became less stable because there were "big guys" moving things around the data center. 😅 On performance which you might have already seen on other threads. 🔹Core approaches frontier-class models like Claude3 Opus and GPT4-V. It outperforms Claude3 Opus on third-party blind human evaluation for multimodal chat, outperforms Gemini Ultra on video QA, and is quite competitive to other frontier models on core text metrics. It also matches GPT4-V on MMMU! 🔹Core ranks #2 on our internal multimodal chat leaderboard, right after GPT4-V. On text, it ranks #3 just behind Claude Opus and GPT4 Turbo. Core outperforms GPT-4 (0613) on this ranking. This has been a focused and concentrated effort of a small team of ~20 people in the past 4 months (yes, we got access to 90%+ of our compute only late December last year! 🚀). This tech report tells our story. Enjoy! Happy to answer any questions in replies or DM! PS: it was nice writing in latex after one whole year! PPS: I had quite some fun writing this 😊. There's some puns and easter eggs and interesting tidbits in there. Trust me. 😏 Link: publications.reka.ai/reka-co…
1
7
525
Congratulations to @avitaloliver @anselmlevskaya and others!
🔥JAX meets Transformers🔥 @GoogleAI's JAX/Flax library can now be used as Transformers' backbone ML library. JAX/Flax makes distributed training on TPU effortless and highly efficient! 👉 Google Colab: colab.research.google.com/gi… 👉 Runtime evaluation: github.com/huggingface/trans…
2
5
“The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, and relearn. ” — Alvin Toffler, Future Shock, 1970 (!)
6
3,747
"He saw what people might say, turned it into what they ought to say, and then answered." -- Adam Gopnik on Charles Darwin. themarginalian.org/2015/11/1…
5
On sticking with your intuitions: - "Don't give up on your intuition until you figure out why it's wrong". - "... isn't going to work if you have bad intuitions, but if you have bad intuitions you're never going to do anything anyway so you might as well stick with them" 2/3
1
5
315
“The test of a first-rate intelligence is the ability to hold two opposing ideas in mind at the same time and still retain the ability to function. One should, for example, be able to see that things are hopeless yet be determined to make them otherwise.” F. Scott Fitzgerald
1
5
1,106
Same with TPUs and dimensions need to be well factorizable. Changing sequence length from 229 to 240 made things go 5% faster overall!
The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.
4
875
:-)
Using our Multitask Unified Model, or MUM, as introduced at #GoogleIO, we were able to identify 800+ variations of vaccine names in 50+ languages in seconds, making it possible to provide timely, high-quality information about COVID-19 vaccines worldwide. blog.google/products/search/…
5
“You think because you understand 'one' you must also understand 'two', because one and one make two. But you must also understand 'and'.” — Rumi
5
379
Replying to @srush_nlp
Couldn't agree more! Mesh Tensorflow does a decent job with this. In Noam's own words, once you get habituated to names, you can't go back.
1
5
Many congratulations for the launch folks! @IrwanBello @xpearhead Noam — The VC character is on the money!
We’re excited and proud to be opening up the Character.AI beta to the public! Character lets you create and talk to advanced AI (language tutors, text adventure games, celebrities, talking animals + more).
1
4
“Everything in moderation, including moderation” — Oscar Wilde h/t to @vardi
1
5
438
So surprising that I verified it tinyurl.com/asset-mgmt-irr-a… (python colab). I get: Investor $653K, Manager $1.52M & IRRs: Investor 4.69%, Manager 16.5%* Mind blown at asset manager economics! @mkt_sentiment where do I go wrong? 100K → 2.17M @ 8% for 40y. * some subtlety here
At age 25, you give your hedge fund manager $100K to manage, and he produces an annual return of 8%. Assuming a 1.5% management and 20% performance fee, by the time you retire at 65, you will have $764K. But the manager will have $1.24M (at zero initial investment!)
4
455
Many a time my life has been enriched by this and in the moment I've gone from "Is that even possible!?" to "Of course! What a great idea!" Lucky to have had good advisors and hoping to pay it forward ...
My favorite thing that @TylerCowen has ever written
5
In a gold rush, it’s not always the prospector find gold, but the people who sell picks and shovels definitely get rich. For a while i think NVIDIA tried this, but i think with MSFT+OAI giving a nice demo for NVDA GPUs, probably it’s why even bother. (1/2)
1
5
807
"Attempts at market timing are a source of risk, not protection." — Howard Marks Also a line I'll surely use again ;-) h/t @ScarrottKalani Source: oaktreecapital.com/docs/defa…
1
5
Replying to @salgar
I'm guessing the effects increase with age - I remember being impervious to this as a kid, now even one go at the swing makes me dizzy ...
5
We’ll miss you and your optimism!
5
1,072
Replying to @_arohan_
Truly a master :-)
4
250
Replying to @borisdayma
What kind of a model is this? We usually see this in encoder-decoder models when the decoder learns to use the encoder, large gradients then start flowing to the encoder — have you logged gradient updates? #justCurious
1
3
508
Exhibit 2: "There was a seminar for advanced students in Zürich that I was teaching and [John] von Neumann was in the class. I came to a certain theorem, and I said it is not proved and it may be difficult."
1
4
"Without detailed understanding, confidence cannot be attained." — Richard Feynman In "Personal Observations on Reliability of Shuttle" science.ksc.nasa.gov/shuttle…
1
4
The four "miracle year" papers of Einstein were published while he was a clerk at the Swiss Patent Office. One of them won him the Nobel (Brownian motion) not to say anything about the Special Relativity ones. en.m.wikipedia.org/wiki/Annu…
4
"Tell me and I'll forget, show me and I may remember, involve me and I'll understand" -- Chinese proverb
4
Friends who conducted live experiments both at Google and Apple were contrasting the stark divergences in attitude and infra for this sort of thing — both stem from what “business” the companies are in — ads and search both need awesome measurement, device/privacy doesn’t
3
1,112
This this and just this!
If google ever started selling TPU hardware and released internal tooling, they'd MOG nvidia so bad. Just a trillion dollar company waiting to be built. most people don't realize how good JAX + TPUs + (other stuff) really is.
4
470
@johnschulman2 predicted as much in piped.video/live/hhiLw5Q_UFg Essentially the argument being that SFT should be on what the base model knows, not on the SFT target label — factuality might go for a toss, and there might be blindspots on the creative stuff as well.
The False Promise of Imitating Proprietary LLMs Open-sourced LLMs are adept at mimicking ChatGPT’s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM. arxiv.org/abs/2305.15717
4
212
You'd think people would be passionate about languages(pytorch, tf, jax), but it's rather the config systems(gin, config_dict, fiddle, hparams!)!
1
4
278
" ... people don't feel they need to have any particular expertise to have opinions about it. All they need is strongly held beliefs, and anyone can have those ..." @paulg was way ahead, in 2009, about why debates about Politics/Religion are uniquely unproductive. 1/2
1
4
Replying to @_arohan_
Really really big spikes 9/10 times seem h/w issues -- We log per step -- on reruns they rarely reoccur (nothing is stochastic in the rerun).
3
"It was fun-ner than I thought" - my 8 y/o after her first @TheMathCircle circle on Building Bridges. Great work by Taylor Yeracaris on making it enjoyable for the kids. :-) @avitaloliver @RishiGosalia
1
3
Experts leading experts. hbr.org/2020/11/how-apple-is… Instead of different business units having their own PnL, have different units be in-charge of a business function and the whole company under one PnL
2
3
“Writing is nature’s way of letting you know how sloppy your thinking is” — Guindon
3
323
Not only that, tools modify how you think, what you consider possible, it can be limiting if you only have a few tools - good tools and good toolmakers are to be treasured!
2
I can almost hear Eric saying this in his calm demeanor :-)
Google engineer: AI is a serious risk to our business Dec 26, 2018
3
524
One related effect of this I’ve seen is that — I rule out hypothesis by how easy they are to rule out, but not by how likely the hypothesis itself is.
1
2
Replying to @_arohan_
In a multihost setup we noticed that *each host was initializing it's own set of parameters* and only the gradient was being averaged across hosts and each host would apply the update. We'd meant to initialize each host from the same Jax rng seed, which wasn't being set o_O
1
2
Congratulations to our field, for getting Nobels in they fields! Solve AGI, and use it to solve everything else No Pressure
1
3
313
LOL, my first instinct was “Model is diverging!”
3
371
This is apt in more ways than one. In programming these days, I try to make very sure that the code is correct, by writing tests etc (slowing down), before running it. Also useful when correctness is not apparent (ex in Deep Learning code). So going slow helps you to go fast.
3
"... we use computer programming in a functional style to encourage clear thinking. Programming forces us to be precise and unambiguous, without forcing us to be excessively rigorous. " (1/2)
1
2
Nice review of Expectations Investing, a very similar discussion can be found at the recent Acquired Podcast episode with @mjmauboussin podcasts.google.com/feed/aHR…
1/ Michael Mauboussin recently gave a great talk to Columbia Business School, discussing the revised and updated version of his book with Alfred Rappaport, Expectations Investing: Reading Stock Prices for Better Returns. We’ll share some of the highlights here
3
"Not having experience with many fathers, I didn't realize how remarkable he was." -- Richard Feynman about his father in a characteristically tongue in cheek way. What Do *You* Care What Other People Think?
1
2
“Laziness, impatience, and hubris” — Larry Wall, describing the three virtues of a programmer.
1
1
254
“Do not undertake a program unless the goal is manifestly important and its achievement nearly impossible” — Edwin H Land, inventor & cofounder Polaroid
1
3
The 2nd order effect of Charlie Munger's dictum of not forming an opinion till you can argue the opposite case better than their best person -- is that I've now no opinions on a lot of things, perhaps for the better! fs.blog/2013/04/the-work-req… 1/2
1
3
If I may add, the only people who I've heard personally say this have an been from - Ivy Leagues and IITs 🤦‍♂️
Saying college is a scam to those from poor/working class backgrounds actually does more harm than you all think it does
2
Replying to @_arohan_
I tried, (a few things) worked at XXL scales but at higher (confidential) scales, introduced instabilities. YMMV
2
Replying to @madiator
Wait for the World Cup match :/
2
211
"When a company slogan becomes an article of faith, it ceases to be a good slogan." -- @boztank 's law.
3
276
"Raffiniert ist der Herrgott, aber boshaft ist er nicht" (God is subtle*, but malicious he is not.) — Albert Einstein * Also translated as: tricky, crafty, shrewd, sophisticated
3
610
Replying to @_arohan_
Gell-Mann Amnesia effect
2
Coming from you @lukaszkaiser this means a lot 🥰
Congratulations on Maverick, looks like a great model!!
1
3
386
APIs aren't too much different -- but you get all TF ecosystem goodies for free! (Ex: SavedModel for TFX etc) Infact, Trax (also by researchers from Google Brain) uses JAX and TensorFlow-Numpy as its backends : trax-ml.readthedocs.io/en/la… cc/ @_agarwal_ashish
1
2
“If you put value to money with planes, apartments, and yachts, and all that kind of stuff, you’ll have a very hard time moving forward, ... If you recognize that money is just a tool, then it will be easy.”
1
3
On his desk, we can raid it together :P
1
118
Replying to @teortaxesTex
FWIW a router is standard practice for a bunch of production tasks in Google Search (and has been for a while), “send hard queries to bigger models invoke more complex and expensive backends”
3
102
Replying to @trengriffin
I think everyone surprised by that quote has missed the point
108
Another 'variant' I've seen is a theoretically motivated approach at the beginning with a lot of mathy-ness -- followed by something else which isn't well motivated but works much better.
2
“Meena eats Google” has to be one of the most prophetic documents ever written. #ifYouKnowYouKnow
1
3
496
Constraints breed Creativity Another friend put it this way today “You are as inefficient as your profits allow you to be.” h/t @GrangierDavid
Sam Walton: Constraints are your friend
3
301