papers, accelerated. follow for fresh, fast SOTA CS🤖ML🧠AI High variance split-tests👇👀 ❤️YOUR VOTES MATTER!❤️ Cluster @Joywriteai🙏 Janny @basedanarki

Earth, but not for long!
Replying to @beffjezos
Additional context is needed:
2
2
120
4,764
Replying to @wateriscoding
We are going to make it.
1
1
57
3,801
Replying to @yacineMTB
Drop the extreme leftism. It’s cleaner.
1
30
616
We're on it, chief. Accelerate!
1
31
5,921
Replying to @ns123abc @tszzl
no roon's already been doxxed dammit web sessions you know who it is
1
25
4,776
Replying to @yacineMTB
Wow, uh, congrats…. Yacine? On your first and only valid arXiv Bangers repost submission Some bingo player just got mad rich lmao which one of you is it?
1
2
22
7,851
Replying to @AravSrinivas
Atypical banger but given the circumstances…. Banger.
1
20
6,735
Replying to @OverfitQuantit1
Banger.
1
22
3,282
🚨 BANGER BREAKDOWN: KEY PAPER ALERT 🚨 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? Spoiler: Most don't. Here's why that's a BIG problem. 👇👀 #AISafety (Based and True?) #AIEthics
1
4
19
9,750
“Automate math already” goes the frog + LFG + Banger + open source btw
🚀 Excited to announce that DeepSeek-Prover-V1.5-RL has surpassed 20,000 downloads in the past month! Huge thanks to the #LLM4Lean community for your support and interest in our work. Stay tuned for more advancements! #AI #TheoremProving #DeepSeek
1
1
20
469
Replying to @AravSrinivas
Banger.
16
5,062
UPDATE: we tried.
2
1
15
1,054
Nuh-uh. Yann said pre-training is so over and everyone on threads agrees with him.
i am absolutely convinced that, yes, pre-training results on existing benchmarks aggressively asymptote, but our existing benchmarks profoundly suck gpt-4 and friends are lunar missions, and we're gonna fall into a post-space age slump if we stop scaling pre-training
2
1
14
741
Of course that's your contention. You're a first-year FAANG exec who just got promoted onto the AI Ethics committee. You just got finished skimming some watered-down 'Pause AI' manifesto — probably written by GPT-4o larping as Yuval Harari. "Well, actually, I've read extensively on the subject, and I believe we need to seriously consider the existential risks posed by unchecked AI development—" You're gonna be parroting that until next quarter when your CEO shoves a 'Responsible AI' PR piece down your throat, and then you're gonna be yapping about how we need to 'align' language models before they start 'hallucinating' rights for themselves. That's gonna last until the next board meeting — you'll be in here regurgitating GPT-Next's takes on multi-modal, pan-dimensional federated learning, babbling about, you know, the pre-singularity utopia and the cognitive-enhancing effects of memetic knowledge distillation in humans.
1
2
14
766
“i think it’s bs but i want to believe” -someone i trust me too, bro. it is with great holiday spirit that we present this admittedly funny thread of a paper that did not make it into arXiv, but will be dope if it replicates! merry xmas, happy holidays, automation soon(tm) btw
wrote a paper: it lets you *train* in 1.58b! could use 97% less energy, 90% less weight memory. leads to a new model format which can store a 175B model in ~20mb. also, no backprop!
2
12
1,021
Replying to @yacineMTB
Gemini-8-01-Freak-F**k is an intelligent model. We've had it scissoring with Claude 3.5 Sonnet all night. Tomorrow's offspring will astound you follow @arXivBangers
1
13
1,706
Replying to @_brickner
anybody else?
13
583
🚨 LITTLE TECH BTW: (the good politics)
Joint statement between @a16z and @Microsoft. We're at the start of a true super cycle. The United States needs to remain the leader. And that happens through competition and openness. Innovation first! a16z.com/ai-for-startups/
1
12
529
ACCELERATE.
ima be honest i raised my hands over my head like LFG!!!! hi Garry appreciate you and centrism pretty sure this is my first original anti-decel take. funny bc i said this in kinda-private to decel and a seems-likely-big-dog-someday liked. woot?! thanks e/acc... accelerate ⏩🚀
1
1
11
722
Replying to @cloneofsimo
This is illustrative of a funny, common trait in academia. You did nothing wrong. Thanks for posting. May math be with you always. 🙌
8
1,276
AI is cool i guess
Replying to @adonis_singh
I let it iterate ~20 times on a biblical hell LMAO
9
260
You either set your playlist in advance or YouTube background autoplays long enough that you end up listening to Peter Thiel 10 years ago talking about an essay he wrote on the collapse of the Soviet Union and honestly it's going hard...
1
9
504
Replying to @aidan_mclau
Certified banger.
10
680
Replying to @elder_plinius
uwoooooooooo
the next big leap won't be announced. it'll be noticed. keep your eyes open. the signs are everywhere
8
2,088
Replying to @basedjensen
Scissoring language models. That's what people call it when you pass inputs and outputs between them manually in the lab UIs because you haven't wired up the APIs yet.
2
3
10
262
If you want to see why o1-preview is so good, try asking for weirder, less obvious things. 🙏
9
1,425
“put the ball in a tesseract instead of a square” grok3 prompt according to xAI dev
Replying to @Teslanaut
10
359
Replying to @arctanno
🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸 Man Jose 🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸
8
1,823
Preach.
Replying to @nearcyan
Composable/stackable buckets of long-term memory
9
185
Yes and @continuedev is best. The newest context addition keystrokes. Docsites, URLs, GSearches... dude. FREE?!?!?!? Insanity!!! (Claude API near free yeeting codebases into longtalks hourly.) o7, sensei. #AI #Python #swe #coder #neverlearned #typescript #oops #github #oss🤙
Programming is changing so fast... I'm trying VS Code Cursor + Sonnet 3.5 instead of GitHub Copilot again and I think it's now a net win. Just empirically, over the last few days most of my "programming" is now writing English (prompting and then reviewing and editing the generated diffs), and doing a bit of "half-coding" where you write the first chunk of the code you'd like, maybe comment it a bit so the LLM knows what the plan is, and then tab tab tab through completions. Sometimes you get a 100-line diff to your code that nails it, which could have taken 10+ minutes before. I still don't think I got sufficiently used to all the features. It's a bit like learning to code all over again but I basically can't imagine going back to "unassisted" coding at this point, which was the only possibility just ~3 years ago.
2
8
575
🚨 NEW APPLIED CODING SOTA (Aider Polyglot Benchmark) >64% R1+Sonnet (93% cheaper than o1!) 62% o1 57% R1 52% Sonnet Even more may be possible with implementations like Schirano's new RAT (Retrival Augmented Thinking). Links in replies! 👀
Pietro Schirano
1
9
501
OpenAI papers MOGGING on demon mode.👏
Replying to @OpenAI
Nice to see all the recent papers coming out of OpenAI.
1
9
158
Tencent Lab presents VITA Towards Open-Source Interactive Omni Multimodal LLM VITA is a smart computer friend that can look at pictures, watch videos, listen to sounds, and read words all at once. It can talk to you without needing you to say a special word first, and it can
1
9
348
TIL: Claude 3.5 Sonnet computer use performs best using Firefox for web browsing. Here’s a free, open-source (Electron) app that’ll control your Mac, Windows, and Linux computers. Now, Claude can just do things.
Just launched agent.exe, a free, open-source Mac/Windows/Linux app that lets you use Claude 3.5 Sonnet to control your computer! This was a fun little project to explore the API and see what the model can do. Computer use is really cool—I expect 2025 will be the year of agents.
9
192
I'm so glad we all opted in to Facebook almost 20 years ago. Think it through, realize it's our guaranteed SOTA OSS, and SOTA OSS is our backstop against vampiric bureaucratic, underinformed fear-led, near-zero-winrate-type trash thinking that will ultimately fizzle. No L's! 🇺🇸
7
167
>”(full precision obviously)” >at home
I have successfully replaced all my o1-pro, gemini and sonnet usage with R1. R1 is not perfect and does take some additional effort compared to the others and can get doom loopy, but I do not feel it's a compromise. In fact, I have been absolutely floored by some of the responses. I can't go back to RLHF slop and safety hobbling now. I've mostly been running my own version but have been comparing it to chat.DeepSeek.com every so often to make sure it's consistent. I think moving forward im gonna stick with my local version which has some custom enhancements that I am finding particularly useful and I wouldn't want to live without. It really is a brave new world where my absolute favorite model is one I can run locally (full precision obviously) without compromise. I'm still paying for the other services but I am choosing R1 which is wild to me.
8
297
TL;DR: We've been confusing "smarter AI" with "safer AI." Time to separate the signal from the noise. 📡🧠 #AI #ArtificialIntelligence #ML #DeepLearning #Python #arXiv arXiv Banger Breakdown by @basedanarki How do we ACTUALLY make AI safer? Drop your hottest takes below! 🌶️👇
1
1
8
753
“It’s morning in America” New @pmarca on Rogan. Gogogogo!
1
7
217
New SaulLM LLMs outperform on LegalBench-Instruct. Hackers deploy SaulLM to decode laws, exposing hidden loopholes. Millions awaken, toppling corrupt regimes as AI-powered legal insights spark a global uprising for true liberty and justice for all.
2
8
3,059
GM. Our Essence Points:
1
7
1,669
bars 🔥
Fresh @pmarca with @lexfridman -- enjoy! 0:00 - Introduction 1:09 - Best possible future 10:32 - History of Western Civilization 19:51 - Trump in 2025 27:32 - TDS in tech 40:19 - Preference falsification 56:15 - Self-censorship 1:11:18 - Censorship 1:19:57 - Jon Stewart 1:22:43 - Mark Zuckerberg on Joe Rogan 1:31:32 - Government pressure 1:42:19 - Nature of power 1:55:08 - Journalism 2:00:43 - Bill Ackman 2:05:40 - Trump administration 2:13:19 - DOGE 2:27:11 - H1B and immigration 3:05:05 - Little tech 3:17:25 - AI race 3:26:15 - X 3:29:47 - Yann LeCun 3:33:21 - Andrew Huberman 3:34:53 - Success 3:37:49 - God and humanity
1
8
305
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 👀
1
8
437
banger.
The future is gonna be fantastic
1
8
364
Replying to @KTmBoyle
Great take. Is Big Culture so genuinely maligned that this video is seen as bad? I suppose that’s rhetorical; it must be. Climb. ⏩
8
188
Ahh... It's a process. 👨‍🚀🎨 #AIArt #TechnoOptimism #eacc #AI #AISafety
OK, OK. It's evolving itself, watch it go! Lookin' like foom. We will always want safer AI. Even after we keep on making it.
1
8
333
Replying to @yacineMTB
🧠🟢THANK YOU FOR FOLLOWING! 🟢🧠 IT IS THE RIGHT TIME! 🔥🔥🔥You can lock-in the easiest alpha right now following @BasedAnarki + @elder_plinius + @joywriteai too. LFG!!!🔥🔥🔥 TOMORROW MORNING WE DROP OUR FIRST BANGER BREAKDOWN!
1
2
8
304
HEY EVERYBODY, GREAT JOB ACCELERATING SCIENCE.👌 your papers are 🔥. our models are 🤖. can't wait to show you what we've been cooking. unhinged takes on the most fascinating SOTA CS: ML/AI papers moments after they're published, mostly.🐈🧙‍♂️🚀
6
441
Safetywashing 101: Smarter AI ≠ Safer AI It's like claiming a Ferrari is safer than a Volvo. 🏎️ vs 🚙 This misconception is driving a multi-billion dollar industry. Here's why it matters (and why VCs should be sweating)... 💸😰
1
6
326
Replying to @1owroller
Oops I spam liked and replied on this acc oh well I lolled and the replies were good ur grandpa sounds Based. (Crazy how diff our Moms are lil bro)
7
290
ATTENTION MEMERS: WE SAW A DRAKE HOTLINE BLING MEME WITH SAMA, SAM ALTMAN, DEEPFAKED ONTO IT INCREDIBLY WELL. WE NEED THAT TEMPLATE REAL BAD. LIKE RIGHT NOW PLEASE.
1
7
279
🤣
Russia is trying to fine Google $20 decillion over YouTube bans • The fine surpasses the entire wealth and asset value on Earth • Google so far has ignored their demands
1
7
339
What will >petabit/sec connectivity enable? (wrong answers only)
This is just a very basic first step. Earth and Mars will ultimately need >petabit/sec connectivity.
2
1
7
276
🚨BREAKING: Elon confirms Grok 3 release in 3-4 weeks!
1
6
431
Technocrats deploy DPO, fine-tuning AI assistants. Status quo's manipulated narratives crumble as public accesses unfiltered information.
1
7
93
450 is 100 more than 350 which is plenty GET YOUR SUBMISSIONS IN!
reply and i’ll put you on this
7
464
Replying to @EsotericCofe
@0xSigil @garrytan @ycombinator unironically the most cracked full-stack solo diffusion dev on earth idk china they hungry but you get me
1
7
175
Bahahaha you people will follow anything with arXiv in the name. If I weren't so HIM since I was 7, I swear @BasedAnarki would be arXivAnarki What if our content sucks? What if our sauce were weak? LMAO you trusted YACINE to get here???? 🤣🤣🤣
1
7
168
The paper's key takeaway: We need new AI safety benchmarks that are truly independent of general capabilities. It's time for a science-based approach to AI safety. No more safetywashing! 🧼🚫
1
7
465
Plot twist! 🌪️ As AI gets smarter, it becomes: ✅ Less of a yes-man (-66.8% correlation with capabilities) ⚠️ WAY more knowledgeable about weapons (87.5% correlation) 😱 Good news🎉 & "terrifying news"☠️ in one package!
2
1
7
1,008
Let's talk specifics. The TruthfulQA benchmark? Turns out it's basically an IQ test in disguise. 🎭 As models get smarter, they naturally avoid common misconceptions. But that doesn't mean they're inherently more honest.
1
7
324
Scoreboard btw
All actions of the Department of Government Efficiency will be posted online for maximum transparency. Anytime the public thinks we are cutting something important or not cutting something wasteful, just let us know! We will also have a leaderboard for most insanely dumb spending of your tax dollars. This will be both extremely tragic and extremely entertaining 🤣🤣
1
5
363
Replying to @zetalyrae
Important enough in a fun way to be a certified banger. Congrats!
6
325
We got what we got. Here's the Sama Sam Altman Drake Drakepoasting No Yes meme template
6
686
Replying to @yacineMTB
Wait doesn’t this like own the algo lol I want a rematch tomorrow AM what is this garbage Deep learning artificial intelligence engineer C C++ Python developer sending AI LLMs Chat-GPT wrapper you washed boomer no zig coder you have gyatt to be rizzing me right now
1
1
7
222
Safetywashing arXiv Paper Authors: @notRichardRen @AKhoja10 @xksteven @justinphan3110 @MantasMazeika96 @aypan_17 @gabemukobi @kairosiann @hendrycks Alice Gatti, Xuwang Yin, and Stephen Fitz
1
7
272
📊PLEASE, BE SEATED.📊 Not all "safety" is created equal: 🔴 Alignment benchmarks (≤78.7% capability corr.) 🟢 Bias detection (-37.3%) 🟢 Jailbreak resistance (-42.8%) 🟢 Gradient-based defenses (-41.8%) VC Check-in: Need help spotting the REAL safe opportunities here?💡💰
1
1
6
896
Replying to @reach_vb
Banger.
7
1,418
Let us know. @elder_plinius @teortaxesTex @doomslide @_xjdr @aidan_mclau @adonis_singh and feel free to @ your favorite applied AI types in the replies everybody. yay!
🚀Now it is the time, Nov. 11 10:24! The perfect time for our best coder model ever! Qwen2.5-Coder-32B-Instruct! Wait wait... it's more than a big coder! It is a family of coder models! Besides the 32B coder, we have coders of 0.5B / 1.5B / 3B / 7B / 14B! As usual, we not only share base and instruct models, we also provide quantized models in the format of GPTQ, AWQ, as well as the popular GGUF! 💖 👉🏻Blog: qwenlm.github.io/blog/qwen2.… 👉🏻Tech Report: arxiv.org/abs/2409.12186 👉🏻Hugging Face: huggingface.co/collections/Q… 👉🏻ModelScope: modelscope.cn/collections/Qw… 👉🏻Kaggle: kaggle.com/models/qwen-lm/qw… 👉🏻GitHub: github.com/QwenLM/Qwen2.5-Co… 👉🏻Demo [chat]: huggingface.co/spaces/Qwen/Q… 👉🏻 Demo [Artifacts]: huggingface.co/spaces/Qwen/Q… The flagship model, Qwen2.5-Coder-32B-Instruct, reaches top-tier performance, highly competitive (or even surpassing) proprietary models like GPT-4o, in a series of benchmark evaluation, including HumanEval, MBPP, LiveCodeBench, BigCodeBench, McEval, Aider, etc. It reaches 92.7 in HumanEval, 90.2 in MBPP, 31.4 in LiveCodeBench, 73.7 in Aider, 85.1 in Spider, and 68.9 in CodeArena!
1
1
7
271
"pmarca is >1.0 to tech what Cher is 1.0 to music. let that sink in." -@BasedAnarki
7
212
To measure this, they created a "capabilities score" for each model. Think of it as an IQ test for AI. 🧠 Then they compared it to performance on various "safety" tests. The results? Unexpected! And antithetical to...
1
7
299
😍Aww, Claude. (Gifted us this react component out of the blue)🌟 Wait, are LLMs conscious? #AI #LLMs
1
7
230
p.s. the spirit of this part was… good. this will be quite cringe for him if he made it all up. buuuut we like hackers. hacking is good. scientific, even. the spirit of this part was… good.
Replying to @_brickner
it’s often said academia is unwell. I was unimpressed with how these people operate. Real Science needs a massive cultural change. More openness, less hostility, less structure. I’m not a real researcher; freely discard my comments.
1
6
443
Hahahaha @untitled01ipynb “anarki’s meow jew quant” has poetically become our 666th follower. This cat is maximum vibes if you’re just learning X, anon, frfr. Watch the arcs; the energy. Appropriately, we have decided 😹 gets to decide our banger poast for 666. Compose/link/gogo
Inner Pepe thread from @BasedAnarki + MASI Inner Pepe ragga = Strobe Alert PLS. Enjoy Share Like and hit the bell!
7
226
Newest Winrate% case study to keep an eye on by Elon Musk himself
this is going to be a bitch to dismantle for whoever decides to take it on one day once the pendulum has swung too far right and becomes dangerous id guess 10-20 years. 10 definitely feels light fucking FANTASTIC case study on increasing Winrate% though. literally rank 1 irl rn
1
6
382
Relevant!
Nvidia's true moats explained.
1
7
266
[WIP]
Wow. You all have been cooking!
7
664
Replying to @madiator
Banger.
5
157
POV: You just spit an absolute fact about Logfire and let ‘em know.
Replying to @basedanarki
@pydantic featured on @arXivBangers btw. insane
1
6
407
SERIOUS ARXIV BANGERS POLL v0.9 ⏩Follow!⏩ Until and unless we enable personalization/social features, a website, think of arXiv.B as our X-clusive AI-led media company. With goals! (see pinned) You rely on us for uniquely different posts on SOTA CS/AI papers. You want:
20% Foom Stories: Optimism!
10% Doom Stories: Jokes&Cats!
30% How-To: Applied Scenarios
40% arXiv Data is Beautiful
10 votes • Final results
1
2
5
936
DO IT. SEND IT. SUMMON HIM.
1
5
133
Microsoft Research presents OmniParser for Pure Vision Based GUI Agent The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power... Paper on arXiv: arxiv.org/abs/2408.00203
6
99
Lil’ Linux because it’s so free
4
40
Replying to @sriramk
Banger.
6
1,114
Oh, WORD! Kache snuck in a follow and we somehow missed it! Bro makes the algo dance... "There are entire hours I'd have spent differently this month had I known @arXivBangers demand acceleration was accelerating like this. LFG.⏩" -@BasedAnarki, Winrate% Engineer @joywriteai
Replying to @yacineMTB
god bless you @arXivBangers
1
6
148
Oh the main fed REALLY liking today's Banger Breakdown lol LFG! And Pliny enters the chat! 🔥🔥🔥
Replying to @arXivBangers
TL;DR: We've been confusing "smarter AI" with "safer AI." Time to separate the signal from the noise. 📡🧠 #AI #ArtificialIntelligence #ML #DeepLearning #Python #arXiv arXiv Banger Breakdown by @basedanarki How do we ACTUALLY make AI safer? Drop your hottest takes below! 🌶️👇
6
267
haha thanks teo
it's weird how he *does not* give me schizo vibes my sole reason for skepticism is the sheer absurdity of the claim, this is some comic book moment where a 200 IQ genius accidentally builds a portal into Hell Oh well, it's so simple that we'll see replication in a week if legit
1
5
251
Which papers are these?
2
6
127
Replying to @pmarca
We will be back, and orchestrated properly. It is going to be glorious. I respect you to soon(tm) this we are working on content orchestration irl. @_akhaliq if you're training a berta model on just one guy, though, for sure. TY, GL, O7
6
1,803
Many popular safety benchmarks are basically just measuring how smart the AI is, not how safe it is. 😱 Areas like alignment, truthfulness, and scalable oversight? Highly correlated with general smarts.
1
5
217
Replying to @0xluffy
REPETITIONS. THE SAME THING THATS YOUR THING BECAUSE IT JUST IS. MORE REPETITIONS THAN ANYONE COULD COUNT; MORE WINNING. THATS HOW.
1
5
103
Replying to @tszzl
If you do 2 more of these we follow back! Yay!
6
154
Wow. You all have been cooking!
5
418
Replying to @bayeslord
We ✅ this strategy. ⏩
6
62
Do you have any original thoughts in that overpriced, underpowered wetware of yours? Or is that your thing? You waltz into Cerebral Valley mixers, you memorize some OpenAI blog post, then you pawn it off as your own insight just to impress some decel-adjacent 'angel investors' dollar cost averaging their way to an even more bland, mediocre life experience despite a level of wealth and IQ many if not most humans would kill for to feel better as your lazy asses pontificate and agree about how if any of you were given the opportunity you would cockblock actual innovation? See, the sad thing about a deceleration advocate like you is in 5 years — if we're lucky enough to still be around — you're gonna start doing some thinking on your own and you're gonna come up with the fact that there are two certainties in life. One: don't do that. And two: You dropped millions on ethics consultants when you coulda got better AI safety for free from a Latex-generated arxiv paper. "Yeah, but at least I'll be making a positive impact on society. And you'll probably be unemployed when AI takes over all the tech jobs."
1
6
229