Building systems to automate and optimize

San Francisco, CA
I enjoy thinking about automation
1
2
27
868
Human + model is a killer combo today
ended up being 21st on the @GPU_MODE qr_v2 leaderboard, at the end codex helped to push the runtime geomean <3ms, I think everyone should try these competitions, I personally learned quite a lot
14
1,590
Second stage of our GPU kernel competition - how much can we accelerate common linear algebra operations?
Second problem is now out: dense symmetric eigenproblem A=QΛQT. Solution due on July 15! We've also enabled ncu profiling for your agents on a @verdacloud cloud box sponsored by our good friends at Brev at @NVIDIAAI
4
21
6,278
16 values should be enough for anyone
Holy moly guacamole 🥑 Nvfp4 brah
4
26
5,806
Factorize your matrices. Learn new skills. Win forever recognition!
171/365 of GPU Programming Finally got below 2ms on the GPU Mode QR challenge. Still quite far away from CUDA colonel @blelbach, so I'm excited to read everyones' writeups once the competition is over. I especially wonder how #1 is utilizing NVFP4 (or is the submission name a false flag?). Every lower precision attempt has not panned out for me so far unfortunately. Very curious where the top 3 will converge in a week from now... If you're still debating whether to participate, would highly recommend! You learn so much from just trying stuff and being in the discord. And @modal gives you $30 in monthly compute for free. Maybe we can get a bit more compute for the next challenge if we all ask @charles_irl nicely 😁😁
2
2
35
5,781
It is wrong to think about massively parallel processors in a serial way
7
3
60
5,929
THE TUTORIAL
Here is a tutorial for uninitiated for QR for those wanting to enter the competition. All you need to know is a little bit of algebra. Here @ means matrix multiplication. Let A be a 3 x 3 matrix and task is given A can you find Q and R. A special property of Q is that Q @ Q.T = I and R is a upper triangular matrix: A find Q and R. Q @ R = A [ [0, 1, 0], [ [2, 3, 4], [ [0, 5, 6], [1, 0, 0], @ [0, 5, 6], = [2, 3, 4], [0, 0, 1] ] [0, 0, 7] ] [0, 0, 7] ] The R matrix has a special property: [ [2, 3, 4], [0, 5, 6], [0, 0, 7] ] We are generally fine with any R matrix with this structure, see the zeros on the lower triangle. [ [0, 5, 6], A = [2, 3, 4], [0, 0, 7] ] First column of A: [ [0], x = [2], [0] ] Householder matrix: [ [0, 1, 0], H = [1, 0, 0], [0, 0, 1] ] Now compute Hx: [ [0, 1, 0], [ [0], H x = [1, 0, 0], @ [2], [0, 0, 1] ] [0] ] [ [2], = [0], [0] ] Now how do we get H? Well it turns out these are called reflectors. The householder reflectors. Think of putting a mirror on your vector.
1
11
6,019
If what you're looking at doesn't make sense, keep rotating it until it does
6
2
52
8,474
Linear algebra can be your friend if you let it
2
1
55
3,571
Linear Algebra Kernels For The Age Of Research In other words, GPUs can do more than just matmuls. Can you make it fast?
Launching a new kernel competition: Linear Algebra Kernels For The Age Of Research. First problem: batched QR decomposition on B200. Old math, modern hardware. Prize: Rare swag and hangout in SF
5
68
10,107
I like seeing CoreAuto logo on research posters. First one so far
The return of the strictly proper losses 👑 this time for DPO 🧐 ft. Core Automation logo on an ICML 2026 poster @CoreAutoAI #icml26 + Richard Nock 🫡
1
13
4,180
Small weights that pack a punch. Very parameter efficient
Low key: we pack weights well.
1
19
3,295
Maybe even more than that
Replying to @MatthewJBar
I think you’re wrong and there’s 1,000x efficiency gains leftover in deep learning research that could lead to much smarter faster more agentic models given the same inputs
4
5
158
100,394
The path has been clear all along
Step 1: Buy a sh*tload of GPUs Step 2: ? Step 3: Profit
3
1
43
6,423
Research is like poker
4
1
37
5,344
When we say "full stack lab" we mean shortest path of feedback from megakernel microcode to agentic swarm and back. Homegrown and integrated
2
4
71
11,423
Every single neuron is weak. But together they constitute a force to be reckoned with
4
3
57
15,155
Today is a good day for vibecoding
2
3
37
4,882
You can fit a lot of automation in this one
New office pre move. Journey begins monday. Home of core automation. Ask @DougDoesOps for invite.
1
1
35
12,515
If deep learning is the solution what is the problem?
9
1
37
5,407
I will take "what is the best method to learn complex programs from data from a perspective of hardware-level acceleration" for 500
1
4
2,214