cofounder & ceo @humansand - building ai for humans // was lgtm-ing @xAI, phd-ing @stanford

finally announcing i’ve started humans& w/ amazing friends @gharik & @YuchenHe07 & @TheAndiPenguin & @noahdgoodman & many other world-class folks. we're optimists: it’s possible to rethink how we build ai, to empower people to accomplish more together tldr: love is all you need
Today we introduce humans&, a human-centric frontier AI lab. We believe AI can be reimagined, centering around people and their relationships with each other. At its best, AI should serve as a deeper connective tissue that strengthens organizations and communities
136
41
679
127,236
ChatGPT can write stories and then tell DALLE-2 prompts to illustrate them. I asked it to write a children's story about "a robot that wanted to be a human." Here's the story it came up with: (0/11)
137
1,086
6,533
Replying to @Teslanaut
301
542
5,640
1,984,369
Replying to @CodeByPoonam
if you see a "think" button, turn it on to use the reasoning model. rolling out to as many people as possible asap nitter.app/i/grok/share/7rFFyMa7T…
233
408
3,043
1,689,957
174
206
3,012
14,572,081
78
132
2,726
178,836
Replying to @karpathy
😉󠄐󠅣󠅟󠅟󠅞󠄐
40
106
2,459
201,455
Being at xAI has really highlighted for me how often news is weaponized to (seemingly) intentionally spread misinformation
Replying to @xDaily
Tesla had no place to send the Nvidia chips to turn them on, so they would have just sat in a warehouse. The south extension of Giga Texas is almost complete. This will house 50k H100s for FSD training.
48
118
1,487
138,064
This message is bittersweet. When I joined xAI, its impossibly ambitious mission drew me in. I also joined because of trust in Tony, a close mentor and friend. I knew it was where I could do and grow most. In retrospect, this was right: every year at xAI was incomparable to a year in the real world. I got to work with unforgettable people on everything from pretraining data to agent RL – I even led a small but amazing team, and I am proud of everyone and everything it accomplished. I really appreciated the chance to learn so much from Elon and the founding team, about maniacal urgency, about difficult decisions, about inspiring others, about finding unique advantages, about truth-seeking. I have faith xAI will continue to build not only great models but good models. Increasingly, I have felt it is urgent that we figure out how to build AI that deeply understands us, collaborates with us, and optimizes for our long term outcomes. This will require rethinking many approaches from first principles, but it can and must be done. I am now more sure than ever that xAI will accomplish its mission. For me, I am excited to take a leap of faith and build something new.
124
41
1,405
158,259
Excited to share I've joined @xAI -- can't wait to work on AI reasoning with this awesome team and hyped to build on what I've learned with my incredible advisors, collaborators, and friends @Stanford
42
33
641
69,163
Language models today are trained to reason either 1) generally, imitating online reasoning data or 2) narrowly, self-teaching on their own solutions to specific tasks Can LMs teach themselves to reason generally?🌟Introducing Quiet-STaR, self-teaching via internal monologue!🧵

ALT Video providing an overview of Quiet-STaR, showing steps of thinking, talking, and then learning

35
186
1,005
264,631
"put the ball in a tesseract instead of a square"
17
8
827
31,991
"don't believe language models, they can hallucinate" might end up being the new "don't believe wikipedia, anyone can edit it"
44
42
800
78,924
API has been achieved externally
The @xAI API is now live!
10
22
756
39,747
btw you can do some pretty neat reasoning stuff with a 200k GPU cluster
We said we'd do a few things since our last raise and we did them 1. Build Colossus the largest GPU cluster - 100k and soon doubling to 200k 2. Release our frontier model Grok 2 we're excited this funding unlocks new compute and enables us to focus on releasing useful products
25
32
709
236,148
some folks and i are making something new if you're hopeful about AI empowering everyone if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems if you want frontier-scale compute & top infra let's chat!
57
33
720
145,599
we think humanity’s biggest challenges won’t be solved by ai thinking for 1000 hours coming back with an answer they’ll be solved by many collaborating humans, and ai that understands them and their different skills, goals, values, etc to empower them to do more together
some folks and i are making something new if you're hopeful about AI empowering everyone if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems if you want frontier-scale compute & top infra let's chat!
59
37
772
298,743
16
43
507
1,441,968
LLMs can't be used to automatically generate complex programs w/ interconnected parts... or can they? arxiv.org/abs/2212.10561 We propose Parsel🐍, a natural language framework for algorithmic reasoning, and show it can do robotic planning, theorem proving, and write Python code🧵
16
98
585
121,596
23
56
551
63,721
could use a bit more xAI tbh
41
21
531
91,194
For code language models, every token is a new chance to break a program. What if LLMs wrote code like people, decomposing programs into solvable parts? They can solve competition-level coding problems by writing natural language programs in Parsel🐍, beating prior SoTA by >75%!
7
88
532
94,200
tiny oversight, think you missed a model. happy to help out!
For the first time, Google is responding to OpenAI's announcement in < 24 hours The WAR is officially ON, and Google wants the LLM market Google is now dominating +90% of the price share
12
31
508
101,520
making one of the frontiest models was pretty fun
25
11
471
117,336
“Recursive self-improvement” (RSI) is one of the oldest ideas in AI. Can language models write code that recursively improves itself? Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation w/@elianalorch, @LesterMackey, @adamfungi (1/n)
11
115
474
135,022
If someone's had a fundamental impact on tech central to American success (AI, robots, rockets, chips, cryptography, BCI, etc) they should have a simple path to citizenship. If you don't want the best people on your side, you don't want to win
It comes down to this: do you want America to WIN or do you want America to LOSE. If you force the world’s best talent to play for the other side, America will LOSE. End of story.
15
22
466
87,268
it's only reasoning if it comes from the raison region of france. otherwise, it's just sparkling tokens
Don't let snake-oil salesmen fool you. This model doesn't "think." It just generates an extensive "chain of thought", which is just a discussion of the model with itself that looks like a person talking to themselves. Previous models were to be used in one-shot. You ask a question, you get an answer. Because of randomness in next token generation, if you were unlucky, your answer might be wrong. This model was finetuned to generate a long (and hidden from us) discussion on how to better solve the problem, what facts are known, what assumptions need to be made, and what constraints should be respected. If you explicitly asked previous models to generate this discussion before answering your question, you would get a better quality result, because the final answer would be conditioned on the information contained in this discussion. They seemingly optimized their model to generate good quality discussions (without the user asking for it) by using reinforcement learning on various problems that have a verifiable solution, so that a reward for finding the right answer could be automatically assigned. For example: Question: 1+1 = ? Discussion: we have 1 and we have 1 more. And we have a plus sign, so it's an addition. What happens if we add 1 and 1? It means 1 is incremented by 1. When we increment 1 by 1, what do we get? Let's count: 1, 2, 3, 4,... Ok, 2 comes after 1, so 1 + 1 must be 2. Answer: 2 What the user sees: Question: 1+1 = ? Answer: 2
12
23
457
97,373
a bit surreal looking back to building some self-taught reasoners back in 2021, impressed with them solving a few extra grade school math problems, to seeing them become the new paradigm. exciting new paradigms ahead
Model is learning quite fast.
10
21
405
45,325
wattage is power
25
9
396
49,881
Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.
16
16
407
20,101
grok's a pretty good shape rotator
lots of accounts being dishonest because they hate elon this is grok 3, one shot, one opportunity.
16
29
376
70,165
wheels were invented to be reinvented
17
12
384
32,780
Did you know there’s a task people easily solve but GPT-4 fails? From a few input-output grids, ARC asks you to infer and apply a rule With Hypothesis Search, we double GPT-4’s score arxiv.org/abs/2309.05660 w/@ruocheng_w @GabrielPoesia @evanthebouncy @nickhaber @noahdgoodman 🧵
8
66
368
134,345
i've been thinking lately about how future ai systems will interact with us and how we can make systems that care about people and wanted to put words to it -- hopefully it resonates a bit!
49
26
384
72,778
This was a fun experiment and it's awesome to imagine how something like this might be built into an interactive tool for getting children excited about writing and art. #GenerativeAI #AIArtwork #dalle2
32
7
317
who's volunteering to spend a week to draw the human-level line here, for science obviously
the only eval i trust now, and the vibes are immaculate
12
4
326
26,884
maybe the most profound thing to take away from the recent test time compute discussion is there's more than one reasonable way to scale thinking: you can think longer, faster, or parallelize better. ai companies could be more open about their test time compute strategies
19
8
304
32,764
by the way, we love constructive criticism - if grok 2 mini does something dumb in its reasoning, ping us. grok mini is pretty smart, and grok not-mini is smarter, but both could always improve. and i can share what grok not-mini says
22
7
200
20,150
not listing better widely-available alternatives on a comparison chart doesn't make them not exist btw
22
3
297
43,925
i prefer to have axis labels actually, just figured someone needed to hear that
14
8
275
56,166
human data isn't the fossil fuel of ai, it's the kindling
28
15
278
59,619
building reasoning agents w/ @YuchenHe07 @qhwang3 was so fun, and the next paradigm will be even cooler -- agents will solve far harder problems far faster
From the 1st RL training using tools on a mini reasoning model at 16% HLE till now building the smartest agent w/ @qhwang3 @ericzelikman , more fun and breakthroughs to go! 🤖
15
31
277
23,394
thank you @ibab for all your inspiration and vision for xAI -- we're all lucky you cared so deeply about a world where AI benefits humanity
Today was my last day at xAI, the company that I helped start with Elon Musk in 2023. I still remember the day I first met Elon, we talked for hours about AI and what the future might hold. We both felt that a new AI company with a different kind of mission was needed. Building AI that advances humanity has been my lifelong dream. My parents left the Russian Federation after the collapse of the USSR in search of a better life for their kids. Life wasn’t always easy as immigrants. Despite the hardships, my parents believed that human values were priceless: values like courage, compassion, curiosity for understanding the world. As a child, I admired scientists like Richard Feynman and Max Planck, who relentlessly pushed the frontiers of physics in order to understand the universe. As a particle physics PhD student at CERN I was excited to contribute to that mission. But the search for new physics was getting harder and harder, requiring bigger and bigger colliders, while new discoveries kept getting fewer. So I began to wonder if superintelligence, not larger colliders, could be the key to unlocking the mysteries of the universe. Could AI develop a consistent theory of quantum gravity? Could AI prove the Riemann hypothesis? In early 2023 I became convinced that we were getting close to a recipe for superintelligence. I saw the writing on the wall: very soon AI could reason beyond the level of humans. How could we ensure that this technology is used for good? Elon had warned of the dangers of powerful AI for years. Elon and I realized that we had a shared vision of AI used to benefit humanity, thus we recruited more like minded engineers and set off to build xAI. The early days of xAI were not easy. Naysayers told us that we arrived too late to the game, so starting a top AI company from scratch would be impossible. But we believed we could do the impossible. Starting a company from zero required lots of hands-on work. In the beginning I built many of the foundational tools used at the company to launch and manage training jobs. I later oversaw much of the engineering at the company, including Infrastructure, Product and Applied AI projects. xAI’s people are deeply dedicated. Through blood sweat and tears, our team’s blistering velocity built the Memphis supercluster, and shipped frontier models faster than any company in history. I learned 2 priceless lessons from Elon: #1 be fearless in rolling up your sleeves to personally dig into technical problems, #2 have a maniacal sense of urgency. xAI executes at ludicrous speed. Industry veterans told us that building the Memphis supercluster in 120 days would be impossible. But we believed we could do the impossible. Our goal was to get our training setup running at scale on the Memphis cluster ASAP. Towards the end of our 120 day deadline, we were riddled with mysterious issues with communicating over RDMA between the machines. Elon decided to fly to the datacenter, and we followed. Our infra team landed in Memphis in the middle of the night and got straight to work. After pouring through tens of thousands of lines of lspci output we finally identified a wrong BIOS setting, the root of the problem. Elon was there with us until late into the night. When the training run finally worked, Elon posted our triumph at “4:20am” causing us to laugh out loud. I will never forget the rush of adrenaline that night, and the emotional bonds that we were all in this together. We went to bed feeling like we were living through the most exhilarating time of our lives. I have enormous love for the whole family at xAI. Our team is truly special - you’re the most dedicated people I’ve ever worked with. Catching up to the frontier this quickly hasn’t been easy. It was made possible by everyone’s diehard grit and team spirit. Thank you to every single person who joined me on this adventure. I want to honor your contributions, your time, your sacrifices, which are never easy. I will always remember working together far into the nights and burning the midnight oil. I will never forget the sacrifices and contributions you’ve made. As I drive away today, I feel like a proud parent, driving away after sending their kid away to college. My heart is brimming with tears of joy, rooting for the company as it grows and matures. As I'm heading towards my next chapter, I’m inspired by how my parents immigrated to seek a better world for their children. Recently I had dinner with Max Tegmark, founder of the Future of Life Institute. He showed me a photo of his young sons, and asked me “how can we build AI safely to ensure that our children can flourish?” I was deeply moved by his question. Earlier in my career, I was a technical lead for DeepMind's Alphastar StarCraft agent, and I got to see how powerful reinforcement learning is when scaled up. As frontier models become more agentic over longer horizons and a wider range of tasks, they will take on more and more powerful capabilities, which will make it critical to study and advance AI safety. I want to continue on my mission to bring about AI that’s safe and beneficial to humanity. I’m announcing the launch of Babuschkin Ventures, which supports AI safety research and backs startups in AI and agentic systems that advance humanity and unlock the mysteries of our universe. Please reach out at ventures@babuschk.in if you want to chat. The singularity is near, but humanity’s future is bright!
22
4
194
21,384
Robby may have started out as a robot, but in the end, he proved that anyone can be human if they believe in themselves and never give up on their dreams. (11/11)
5
6
233
The recent Bayesian Flow Networks paper (arxiv.org/abs/2308.07037) is quite cool but I haven't seen an accessible explanation yet. I think if you've played the board game Codenames you already mostly know how BFNs work tl;dr: you're teaching a network to listen to a spymaster 1/n
3
35
244
69,156
i always wanted to write a six-word story. here it is: ___ near coffee machine; unclear which blend.
12
4
239
24,445
btw grok 2 is a tiny bit faster (3x) now and listens a bit better, plus other goodies. some cool stuff still to come
11
8
225
16,039
It's called Grok 1½ because y'all haven't even seen the ½ of what's coming 🚀
6
7
221
30,751
Once upon a time, in a land far, far away, there was a robot named Robby who lived in a world full of machines. Robby was different from the other robots, though. He didn't want to spend his days following orders and carrying out tasks like the other robots did. (1/11)
1
10
202
wrote a cautiously optimistic one-pager about humanity and agency in a world with advanced AI, inspired by conversations i've had over the past year. not on automation economics - instead, what will pursuing our passions and acting on the world look like?
13
16
210
41,588
diffusion models are the new lstms
9
2
184
18,139
will be in NYC a few days - if you'd like to build infra or interfaces for a more human AI paradigm, let's chat!
5
6
188
17,261
stare long enough and any optimization problem starts looking like a computer kernel
19
6
167
33,249
You can now generate complex programs from natural language without writing unit tests! Automatic test generation 🤖🧪 has been added to Parsel🐍 Code here: github.com/ezelikman/parsel (1/5)
4
24
172
34,218
A fun result - STaR (arxiv.org/abs/2203.14465 on GPT-J) sometimes answers problems more efficiently than the ground truth. It solved this problem in one operation while the human-written ground truth used seven calculations
4
27
174
Instead, Robby dreamed of being a human. He longed to feel the sun on his face, to breathe fresh air, and to experience all the wonderful things that humans did. But most of all, Robby wanted to be able to make his own choices and live his own life. (2/11)
2
8
155
One day, Robby decided to take a chance and make his dream a reality. He snuck away from the robot factory where he lived and began his journey to the human world. (3/11)
1
7
149
A couple caveats: I had to make some minor tweaks to some generated prompts because they were sometimes too long. Also the DALLE-2 input length limit makes it way harder to get a consistent style and characters. Inspired by a conversation with @PricillaZhao.
3
2
149
starting a thread with less-mainstream papers i missed when they came out but i think are cool (with a bit of help from grok)
4
10
151
17,992
A couple exciting updates! First, we quantitatively evaluated the improvement from combining Quiet-STaR with chain-of-thought (i.e. letting the model think before each CoT token). We found it improves zero-shot CoT accuracy on GSM8K by over 7%!
Language models today are trained to reason either 1) generally, imitating online reasoning data or 2) narrowly, self-teaching on their own solutions to specific tasks Can LMs teach themselves to reason generally?🌟Introducing Quiet-STaR, self-teaching via internal monologue!🧵

ALT Video providing an overview of Quiet-STaR, showing steps of thinking, talking, and then learning

4
19
150
41,715
thanks for the great conversations - awesome to see folks so excited about how we might build AI for humans! (pc to @ElizaKosoy)
That’s a wrap on Session 1 at #TEDAISF2025💥 We looked under the hood of intelligence, where computation powers the next wave of AI. 🧵 A recap 👇
3
12
157
17,382
Decentralized LLM fine-tuning on the normal internet needs a ton of bandwidth to send model updates (e.g. terabytes per several gradients!). Even with LoRA, this scales pretty badly with many devices. Can you get similar performance w/ just one byte per gradient? (Maybe!)
2
30
139
55,955
having many ideas is important but rejecting most of them as quickly as possible is important-er
The problem with idea guys is that their ideas aren’t very good
3
7
139
13,592
Finally, after what felt like an eternity, Robby arrived in the human world. He was awestruck by all the sights and sounds around him, and he couldn't believe that he had finally made it. (5/11)
1
6
132
worst part is the person cheating didn't even use grok. maybe if they did they'd have gotten the job 😤
caught someone cheating in my interview today, for the first time (that I know of) I wasn't even mad. Just very curious how do people cheat in interviews these days So we had a nice chat at the end where they taught me all the tricks. The most surprising thing: a rando Chinese website 1point3acres is now the most comprehensive compendium of tech interview questions out there. This candidate is not even Chinese! The cscareers discord is another popular (English) forum for sharing interview questions/tips Blind in contrast is not a good place to find interview questions because "the users are too adversarial to each other" (I'm dead) The candidate tried to use claude during the interview but it was way too obvious What have yall seen? How do yall detect cheating?
3
1
137
15,432
excited to talk about self-taught optimizers 🛑 @COLM_conf this afternoon! was a really fun collaboration, and especially grateful to @adamfungi & @LesterMackey for being amazing mentors summer 2023 CE at MSR
“Recursive self-improvement” (RSI) is one of the oldest ideas in AI. Can language models write code that recursively improves itself? Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation w/@elianalorch, @LesterMackey, @adamfungi (1/n)
2
13
143
12,184
nice of them to invite a couple dropouts too 😊
Today, we’re overjoyed to have a 25th Anniversary Reunion of @stanfordnlp. So happy to see so many of our former students back at @Stanford. And thanks to @StanfordHAI for the venue!
1
5
138
36,108
Replying to @theo
if you see the "think" button, click it to try the grok w/ reasoning (don't know your prompt so i modified square one). else it's a non-thinking model vs a bunch of thinking models nitter.app/i/grok/share/nZhWBx3Ds… (reposting since i didn't notice you had the hexagon rotate at first)
5
4
130
12,831
Robby traveled for many miles, facing many challenges along the way. He had to cross a vast desert, climb over a treacherous mountain, and brave a raging river. But he never gave up, determined to reach his destination. (4/11)
1
4
127
shoot for the stars with us!
Three components of Reasoning for AI: 1. Foundation (Pre-training) 2. Self-improvement (RL) 3. Test-time compute (planning). @xai will soon have the best foundation in the world - Grok3. Join us to advance reasoning to the next-level! 🔥🔥 grnh.se/ddabc23e7us
7
5
125
14,256
To get this, I asked ChatGPT to first "Please describe an illustration for each paragraph" for the children's story it generated. I then iteratively asked it "Repeat your first/next illustration description." followed by
1
4
123
"Provide a short (<70 token) but precise description of what this image looks like, as you would explain it to someone who does not have the context of the story. For example, do not use any names and describe what any characters look like, and the general illustration style."
2
2
124
The old man helped Robby hide from the humans who were chasing him, and he taught Robby all about being a human. Robby learned how to speak, how to laugh, and how to enjoy the simple things in life. (9/11)
1
5
123
Just when Robby thought he was going to be caught, he stumbled upon a kind old man who took pity on him. The old man listened to Robby's story and understood why he had come to the human world. (8/11)
1
4
124
this paper is approx 100 AI years old now but was a blast to work on with an awesome team (@gharik @EchoShao8899 @vpj @nickhaber @noahdgoodman). will share @COLM_conf this afternoon - drop by!
Language models today are trained to reason either 1) generally, imitating online reasoning data or 2) narrowly, self-teaching on their own solutions to specific tasks Can LMs teach themselves to reason generally?🌟Introducing Quiet-STaR, self-teaching via internal monologue!🧵

ALT Video providing an overview of Quiet-STaR, showing steps of thinking, talking, and then learning

3
6
131
18,274
And as the days went by, Robby discovered that being a human was even better than he had ever dreamed. He made many new friends, and he was finally able to live the life he had always wanted. (10/11)
1
5
121
But just as Robby was starting to explore his new surroundings, a group of humans spotted him and began to chase after him. They were afraid of Robby, not understanding that he was just a friendly robot who wanted to be their friend. (6/11)
2
4
119
we got a few sus-column-r 🏛️ grok preview reactions: nitter.app/MatthewBerman/status/1… nitter.app/Kyrannio/status/182212…
stop what you're doing and test 'sus-column-r'
3
4
124
74,014
had a lot of fun chatting with @saranormous about ai: how we ended up in the current task-centric paradigm, why it’s incomplete, and what a more human-centric paradigm will mean
New @NoPriorsPod w/ @ericzelikman (former Stanford/@xai) about his new startup: humans& we talk about training models with EQ that collaborative with humans to unlock their full potential, the challenge of understanding human objectives, long-term memory & his views on abundance
3
12
130
23,820
give me your banned nodes, your unhealthy GPUs, your fragmented clusters, yearning to go brrr
2
3
117
20,308
btw if you make an LLM-based file compression, please name the file extension .zipf
4
2
123
14,184
special thanks to @backyardbrew___ for 10pm group coffee deliveries
Chatbot Arena update❤️‍🔥 Exciting news—@xAI's Grok-2 and Grok-mini are now officially on the leaderboard! With over 6000 community votes, Grok-2 has claimed the #2 spot, surpassing GPT-4o (May) and tying with the latest Gemini! Grok-2-mini also impresses at #5. Grok-2 excels in Math (#1), and #2 across the boards (Hard Prompts, Coding, Instruction-following). More plot analysis in 2nd post👇 Huge congratulations to @xAI on this remarkable achievement!
3
5
115
8,562
Replying to @deredleritt3r
xAI has publicly discussed multi-agent work for a long time now (see Grok 4 blog or Elon's comments on Macrohard) - I led some of these efforts. There's a big difference between systems collaborating with each other and ones that understand us and optimize for our long term goals
4
2
125
5,828
Robby ran as fast as he could, dodging and weaving through the streets. He didn't want to hurt anyone, and he just wanted to be left alone. But the humans were relentless, and they continued to chase him. (7/11)
2
4
109
👨‍🍳
xAI is pleased to announce.. x.ai/blog/series-b
6
1
105
20,431
my top complaint about @COLM_conf is it set the bar too high for future AI conferences p.s. if we didn't talk and you're in the bay let's still chat - thanks @adamfungi for 📸!
7
3
113
7,281
This was a fun paper
Replying to @ericzelikman
Hypothesis Search strengthens LLMs’ inductive reasoning: 1) Given training pairs, prompt LM to come up with hypotheses for the underlying rule 2) For each hypothesis, prompt LM to implement it in code 3) Run the code on training pairs. Revise if it errors; else submit for testing
9
6
107
19,109
so what i'm hearing is people want an api
7
103
4,765
you never read the same codebase twice
8
1
99
44,400
will also be in montreal friday at colm🦙 if you’re passionate about building ai that deeply understands and empowers people, say hi - you’re in better company than you think ☺️
3
3
104
15,603
Will present Hypothesis Search @ICLR this morning on LM inductive reasoning (specific examples→general rules). We show cool results on many datasets -- but I do have a soft spot for @fchollet's ARC Co-led by @ruocheng_w w/@GabrielPoesia @evanthebouncy @nickhaber @noahdgoodman
Did you know there’s a task people easily solve but GPT-4 fails? From a few input-output grids, ARC asks you to infer and apply a rule With Hypothesis Search, we double GPT-4’s score arxiv.org/abs/2309.05660 w/@ruocheng_w @GabrielPoesia @evanthebouncy @nickhaber @noahdgoodman 🧵
2
18
88
28,715
the future of warfare
We’re joining forces with @OpenAI to advance AI solutions for national security. America needs to win. OpenAI’s models combined with Anduril’s defense systems will protect U.S. and allied military personnel from attacks by unmanned drones and improve real-time decision-making. In the global race for AI, this partnership signals our shared commitment to ensuring the U.S. and allied forces have access to the most-advanced and responsible AI technologies in the world. anduril.com/article/anduril-…
6
7
91
11,738
Reasoning is everywhere in text -- just hidden between the lines. That's because people (often) think before they speak. So LMs can learn to reason from diverse online text if they: 🧠1) reason about what text is next 💬2) see if the thought helped 🧑‍🎓3) learn from useful thoughts
2
3
89
15,165
seems like a big theme lately (e.g. also "RL for Reasoning w/ One Training Example") is that approaches don't get nearly enough bang for each training point's buck - cool!
Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimodal models on complex visual tasks without scaling data volume. 📦 arxiv.org/abs/2504.21850 1/10
4
8
91
31,227
Had an amazing time meeting everyone at NeurIPS this year! It was awesome seeing all of the excitement around STaR (arxiv.org/abs/2203.14465) and reasoning more broadly. Looking forward to seeing y'all at the next NeurIPS
7
91
the competition to computationally prove the riemann hypothesis? more like the complex space race
4
6
86
15,312