Andreas Kirsch 🇺🇦 · Jan 7, 2025 · 10:25 PM UTC

Andreas Kirsch 🇺🇦

Pinned Tweet

Andreas Kirsch 🇺🇦

@BlackHC

7 Jan 2025

Ever wondered why presenting more facts can sometimes *worsen* disagreements, even among rational people? 🤔 It turns out, Bayesian reasoning has some surprising answers - no cognitive biases needed! Let's explore this fascinating paradox quickly ☺️

374

106,593

Andreas Kirsch 🇺🇦 · Feb 14, 2022 · 10:14 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

14 Feb 2022

Everyone's arguing about whether current AI models could be conscious or not, as if it was a scientific discussion, yet I don't even know what consciousness is 🥺

203

1,749

Andreas Kirsch 🇺🇦 · May 9, 2024 · 5:29 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

9 May 2024

Replying to @JSTOR

JSTOR vs Aaron Swartz?

1,632

42,889

Andreas Kirsch 🇺🇦 · Apr 27, 2022 · 12:28 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

27 Apr 2022

When I think of Jensen's inequality, I think of the following sketch which helps me remember it. Maybe this is useful for you, too. #ML #Mathematics

194

1,343

Andreas Kirsch 🇺🇦 · Jun 9, 2025 · 9:49 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

9 Jun 2025

I'm late to review the "Illusion of Thinking" paper, so let me collect some of the best threads by and critical takes by @scaling01 in one place and sprinkle some of my own thoughts in as well. The paper is rather critical of reasoning LLMs (LRMs):

Mehrdad Farajtabar @MFarajtabar

5 Jun 2025

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks, but we found their fundamental limitations are more severe than expected. In our latest work, we compared each “thinking” LRM with its “non-thinking” LLM twin. Unlike most prior works that only measure the final performance, we analyzed their actual reasoning traces—looking inside their long "thoughts". Our analysis reveals several interesting results ⬇️ 📄 machinelearning.apple.com/re… Work led by @ParshinShojaee and @i_mirzadeh, and with @KeivanAlizadeh2, @mchorton1991, Samy Bengio.

224

1,259

435,097

Andreas Kirsch 🇺🇦 · Feb 26, 2022 · 9:15 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

26 Feb 2022

Replying to @FedorovMykhailo

Don't use telegram. Use signal please. Telegram is not encrypted and is based in Russia 🤦

118

1,007

Andreas Kirsch 🇺🇦 · May 10, 2022 · 11:49 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

10 May 2022

After 4 years, I'm kinda like: maybe I should have focused on ML engineering instead of research 😂

988

Andreas Kirsch 🇺🇦 · Oct 26, 2021 · 11:58 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

26 Oct 2021

Biggest regret: not spending more time getting the basics right at the beginning of my PhD. I started going full-time on research projects right away, and now three years later I'm still playing catch-up with some stuff I should have focused on right away

825

Andreas Kirsch 🇺🇦 · Oct 5, 2022 · 10:58 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

5 Oct 2022

I'm incredibly excited about all the amazing progress in ML lately 🤯 but part of me really wished I had picked a different field because I have no idea how to keep up anymore or know what to focus on 🥺😇

712

Andreas Kirsch 🇺🇦 · Aug 14, 2024 · 8:34 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

14 Aug 2024

Excited to publish a Python package that turns @karpathy's "A Recipe for Training Neural Networks" into easy-to-use diagnostics code! 🔧 No more randomly poking around in your custom @PyTorch DNN to debug it. Get simple diagnostics for your neural nets 🫶 #PyTorch 1/

115

707

62,633

Andreas Kirsch 🇺🇦 · Mar 22, 2022 · 5:23 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

22 Mar 2022

I got rejected from @DeepMind and @MetaAI for internships now. I guess I shouldn't have quit being an engineer five years ago 😅

601

Andreas Kirsch 🇺🇦 · Mar 8, 2020 · 5:33 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

8 Mar 2020

How does one keep up with papers in ML while still finding time for foundational studies? Not even talking about doing active research. Feeling overwhelmed every day and like an imposter more and more 🙈

542

Andreas Kirsch 🇺🇦 · May 17, 2025 · 11:47 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

17 May 2025

I want to share my latest (very short) blog post: "Active Learning vs. Data Filtering: Selection vs. Rejection." What is the fundamental difference between active learning and data filtering? Well, obviously, the difference is that: 1/11

553

94,579

Andreas Kirsch 🇺🇦 · Jan 13, 2025 · 7:14 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

13 Jan 2025

NeurIPS 2024 PCs being a bunch of clowns 🤡 the state of ML 🙄 All you get back a month after raising a concern:

526

174,627

Andreas Kirsch 🇺🇦 · Sep 16, 2024 · 12:39 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

16 Sep 2024

Incredibly excited to join DeepMind again 🥳 I'll be a researcher on the Deep Learning Engineering team under the illustrious @davidmbudden 🔥 I can't wait to get started ✨

526

38,573

Andreas Kirsch 🇺🇦 · Dec 25, 2023 · 8:33 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

25 Dec 2023

A new paper review by me! I'm reviewing the fascinating "Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding" from @GoogleDeepMind It introduces a novel method for active data selection in large-scale visual pretraining. 📉🤖 1/10

497

71,753

Andreas Kirsch 🇺🇦 · Feb 8, 2023 · 11:31 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

8 Feb 2023

My experience with using pandas to operate on dataframes is usually: 1. read docs 2. spend an hour to try to get something to work 3. give up 4. write the equivalent Python code in 10 min 5. move on with life

469

105,645

Andreas Kirsch 🇺🇦 · Jul 10, 2019 · 5:46 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

10 Jul 2019

Do you ever have a model that uses @PyTorch and one that uses @TensorFlow, and you want to combine the two for end-2-end training without rewriting either? TfPyTh allows you to plug one into the other while propagating gradients for training 🎉 Code 👉 github.com/BlackHC/TfPyTh

GitHub - BlackHC/tfpyth: Putting TensorFlow back in PyTorch, back in TensorFlow (differentiable...

Putting TensorFlow back in PyTorch, back in TensorFlow (differentiable TensorFlow PyTorch adapters). - BlackHC/tfpyth

github.com

126

471

Andreas Kirsch 🇺🇦 · Sep 5, 2023 · 8:57 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

5 Sep 2023

Interesting take: I believe that arxiv is closer to how science and research originally worked and "official" peer reviews haven't worked that well (at least recently)

@emilymbender.bsky.social @emilymbender

29 Aug 2023

Replying to @tdietterich @TaliaRinger @mmitchell_ai @arxiv

arXiv is a cancer that promotes the dissemination of junk "science" in a format that is indistinguishable from real publications. And promotes the hectic "can't keep up" + "anything older than 6 months is irrelevant" CS culture. >>

461

109,932

Andreas Kirsch 🇺🇦 · Dec 16, 2024 · 11:09 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

16 Dec 2024

Have you wondered why I've posted all these nice plots and animations? 🤔 Well, the slides for my lectures on (Bayesian) Active Learning, Information Theory, and Uncertainty are online now! They cover quite a bit from basic information theory to some recent papers 🥳

461

43,247

Andreas Kirsch 🇺🇦 · Jul 16, 2019 · 5:02 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

16 Jul 2019

Did you know you can classify MNIST using gzip? 🤓 You can get 45% accuracy on binarized MNIST using class-wise compression and counting bits 🤗 🔥No @PyTorch or @TensorFlow needed 🔥 BASH script and @scikit_learn classifier 👉 github.com/BlackHC/mnist_by_…

GitHub - BlackHC/mnist_by_zip: Compression algorithms (like the well-known zip file compression)...

Compression algorithms (like the well-known zip file compression) can be used for machine learning purposes, specifically for classifying hand-written digits (MNIST) - BlackHC/mnist_by_zip

github.com

437

Andreas Kirsch 🇺🇦 · Jul 21, 2025 · 5:06 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

21 Jul 2025

We have achieved gold medal performance at the International Mathematical Olympiad 🥇 🥳 This is the first general-purpose system to do so through official participation and grading, and I'm thrilled to have contributed a little to this milestone in mathematical reasoning 🌈🫶

Google DeepMind

@GoogleDeepMind

21 Jul 2025

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

459

34,826

Andreas Kirsch 🇺🇦 · Jul 3, 2023 · 6:24 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

3 Jul 2023

Jensen's inequality == not everyone has access to the latest @nvidia GPUs 🤓

430

47,889

Andreas Kirsch 🇺🇦 · Oct 15, 2023 · 9:02 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

15 Oct 2023

- Roast me GPT-4V: No I can't - Yes you can. GPT-4V: Okay. Roast: "Ah, the classic 'I woke up like this' hairdo combined with an AI-themed t-shirt. You're really out here living the tech bachelor dream. Remember, even though you've got machine learning on your shirt, it doesn't mean your hair has learned how to style itself!" 😂

pranav

@_pranavnt

15 Oct 2023

gpt-4V is brutal LMAO

404

225,490

Andreas Kirsch 🇺🇦 · Aug 22, 2025 · 9:37 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

22 Aug 2025

Did you hear that? It's the sound of goalposts moving at supersonic speed

Edward Frenkel

@edfrenkel

22 Aug 2025

This is an unwise statement that can only make people confused about what LLMs can or cannot do. Let me tell you something: Math is NOT about solving this kind of ad hoc optimization problems. Yeah, by scraping available data and then clustering it, LLMs can sometimes solve some very minor math problems. It's an achievement, and I applaud you for that. But let's be honest: this is NOT the REAL Math. Not by 10,000 miles. REAL Math is about concepts and ideas - things like "schemes" introduced by the great Alexander Grothendieck, who revolutionized algebraic geometry; the Atiyah-Singer Index Theorem; or the Langlands Program, tying together Number Theory, Analysis, Geometry, and Quantum Physics. That's the REAL Math. Can LLMs do that? Of course not. So, please, STOP confusing people - especially, given the atrocious state of our math education. LLMs give us great tools, which I appreciate very much. Useful stuff! Go ahead and use them AS TOOLS (just as we use calculators to crunch numbers or cameras to render portraits and landscapes), an enhancement of human abilities, and STOP pretending that LLMs are somehow capable of replicating everything that human beings can do. In this one area, mathematics, LLMs are no match to human mathematicians. Period. Not to mention many other areas. Calling on my friend @ericweinstein and @GaryMarcus, who has been one of the few sane expert voices on these matters lately. 🙏 h/t @hellheff

394

40,457

Andreas Kirsch 🇺🇦 · Oct 30, 2023 · 10:52 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

30 Oct 2023

It's six months since I've submitted my thesis and I still start feeling suicidal every single time I think about my PhD experience, esp the last year of it 😐 Thank God it's over, and I hope I'll reflect about it less in the future

371

87,788

Andreas Kirsch 🇺🇦 · Dec 24, 2022 · 12:53 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

24 Dec 2022

How is software engineering going to change with LLMs? What if we could "implement" class interfaces automagically using LLMs? Presenting `llm-strategy`: a PoC package based on @langchain and @OpenAI's GPT that adds an llm_strategy decorator to Python: github.com/blackhc/llm-strat…

@llm_strategy(OpenAI)
def query_database(database: Database, query:str) -> Table:
"""Query the database using a natural language query `query` and return
the resulting table.

Example
=======
>>> query_database(database, "SELECT * FROM EMPLOYEES")
Table(columns=("employee_id", "name", "address", ...),
data=[["1123123", "John Miller", ...],[...]]

Arguments
=========
...
"""
raise NotImplementedError()

ALT @llm_strategy(OpenAI) def query_database(database: Database, query:str) -> Table: """Query the database using a natural language query `query` and return the resulting table. Example ======= >>> query_database(database, "SELECT * FROM EMPLOYEES") Table(columns=("employee_id", "name", "address", ...), data=[["1123123", "John Miller", ...],[...]] Arguments ========= ... """ raise NotImplementedError()

379

121,830

Andreas Kirsch 🇺🇦 · Jul 4, 2024 · 8:18 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

4 Jul 2024

Very interesting ICLR **tiny** paper: openreview.net/forum?id=vHOO… It computes a loss for all possible subsets of the dataset at the same time which has a very elegant solution: softplus of the negative log likelihood per sample, which essentially drops outliers 🤯 @mtetelman

Improving generalization by loss modification

Outlier suppression loss derived by Bayesian averaging improves generalization and traning convergence for neural networks.

openreview.net

372

32,967

Andreas Kirsch 🇺🇦 · Mar 6, 2025 · 1:01 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

6 Mar 2025

I think Europe and the UK need their own DeepMind-like AI lab which is not connected to the US in any form

357

56,991

Andreas Kirsch 🇺🇦 · Jan 23, 2024 · 9:06 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

23 Jan 2024

Omg, github.com/pytorch/tensordic… is a gift from the gods 😍 (thx @clured!)

GitHub - pytorch/tensordict: TensorDict is a pytorch dedicated tensor container.

TensorDict is a pytorch dedicated tensor container. - pytorch/tensordict

github.com

325

38,048

Andreas Kirsch 🇺🇦 · Oct 4, 2025 · 1:11 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

4 Oct 2025

Research questions I'm excited about: * More sample-efficient RL and the bitter lesson * Meta-cognitive abilities in models * Active learning and curriculum approaches * Automated scientific discovery Things I focus on a lot instead: Software engineering, testing infra, and developer experience to accelerate good research

346

27,624

Andreas Kirsch 🇺🇦 · Apr 11, 2025 · 7:17 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

11 Apr 2025

Jesus Christ... openreview.net/forum?id=et5l…

Strong Model Collapse

Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised regression setting and establish a strong form of the model...

openreview.net

339

479,557

Andreas Kirsch 🇺🇦 · Nov 26, 2019 · 6:27 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

26 Nov 2019

🎉 New blog post on a better (visual!) intuition for information theoretic quantities (eg entropy and mutual information) 🎉 🔥 Lots of visualisations 🔥 👉 Based on Yeung's "A new outlook on Shannon's information measures" from 1991 📖 #oldiebutgoldie blackhc.net/blog/2019/better…

Better intuition for information theory

Better visual explanations of information theoretic quantities like entropy and mutual information using I-diagrams. Based on Raymond W. Yeung's "A new outlook on Shannon's information measures"...

blackhc.net

314

Andreas Kirsch 🇺🇦 · Jul 18, 2019 · 4:35 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

18 Jul 2019

Is it common or specific to ML that researchers try to add more maths to their papers and complexify their contributions to get through reviews? It is very frustrating to have to parse complexity to find nuggets of simplicity that might not warrant a paper 🙄

316

Andreas Kirsch 🇺🇦 · Feb 26, 2024 · 10:58 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

26 Feb 2024

Can't believe I made it in the end 🎊😇 Thanks to everyone at @UniofOxford, @ExeterCollegeOx and @OATML_Oxford for the at times stressful, often beautiful, and always inspiring and memorable time 🙏🫶

301

15,064

Andreas Kirsch 🇺🇦 · Nov 16, 2025 · 12:57 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

16 Nov 2025

The state of conferences. 40 weaknesses with 40 questions. Weak reject, confidence 5. Death by sea lioning openreview.net/forum?id=kDhA…

Mathieu

@miniapeur

15 Nov 2025

If you want to read a very bad ICLR drama, here you go: openreview.net/forum?id=kDhA…

322

122,968

Andreas Kirsch 🇺🇦 · Nov 25, 2022 · 11:20 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

25 Nov 2022

I haven't done a single novel thing in my PhD. I'm just very lucky that reviewers have no clue about prior art 😅

276

Andreas Kirsch 🇺🇦 · Oct 23, 2022 · 11:45 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

23 Oct 2022

Why are people excited about this paper ("Neural Networks are Decision Trees", arxiv.org/abs/2210.05189)? TL;DR: The result is obvious and useless by itself. Slightly longer "hot" take below 1/4

Neural Networks are Decision Trees

In this manuscript, we show that any neural network with any activation function can be represented as a decision tree. The representation is equivalence and not an approximation, thus keeping the...

arxiv.org

Yannic Kilcher 🇸🇨

@ykilcher

21 Oct 2022

Neural Networks are Decision Trees! Could this finally open up the black box of deep NNs? Find out in this video (w/ @Alex_Mattick ): piped.video/_okxGdHM5b8

279

Andreas Kirsch 🇺🇦 · Aug 24, 2025 · 1:01 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

24 Aug 2025

Replying to @JakeWSimons

The "six Hiroshima bombs equivalent" argument is so dumb. All it does is show that Israel has been using precision ammunition to minimize collateral damage because if they had dropped six Hiroshima bomb equivalents indiscriminately on Gaza it would indeed be a parking lot now

287

6,516

Andreas Kirsch 🇺🇦 · Jul 17, 2023 · 4:57 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

17 Jul 2023

I've passed my viva 🥳 Thanks to my examiners @maosbot & @sirbayes for the discussion and feedback! To @tom_rainforth for mentoring; to @joost_v_amersf, @JishnuMukhoti, @fbickfordsmith & @seb_far for our joint papers; to @yaringal for supervising 👨‍🏫 & @OATML_Oxford for the fun 🎉

283

24,261

Andreas Kirsch 🇺🇦 · Jan 10, 2024 · 12:39 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

10 Jan 2024

My Ph.D. thesis (mostly on active learning and information-theoretic intuitions and approaches related to it) is finally on arXiv 🥳 I'm looking forward to finding and fixing many more typos in the future 😂

Information Theory Papers @Encoding

10 Jan 2024

Advancing Deep Active Learning & Data Subset Selection: Unifying Principles with Information-Theory Intuitions. arxiv.org/abs/2401.04305

267

38,893

Andreas Kirsch 🇺🇦 · Dec 5, 2018 · 6:31 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

5 Dec 2018

Why autonomous weapons are inevitable And what we can still do about it medium.com/@BlackHC/why-auto…

237

Andreas Kirsch 🇺🇦 · Apr 9, 2020 · 2:16 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

9 Apr 2020

🔥 Has your PyTorch code ever crashed because it ran out-of-memory in CUDA, and you had to fiddle with batch sizes repeatedly? 🔥 What if we could just write code that adapted to the available memory instead of resorting to brittle hand-tuning? 🤯 👉 github.com/BlackHC/toma🤗

GitHub - BlackHC/toma: Helps you write algorithms in PyTorch that adapt to the available (CUDA)...

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory - BlackHC/toma

github.com

255

Andreas Kirsch 🇺🇦 · Jun 25, 2019 · 7:19 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

25 Jun 2019

Very happy & proud to share some research in Deep Bayesian Active Learning from @yaringal, @joost_v_amersf and me at @OATML_oxford 🎉🎉🎉🤗🤗🤗 oatml.cs.ox.ac.uk/blog/2019/…

245

Andreas Kirsch 🇺🇦 · Aug 23, 2024 · 7:03 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

23 Aug 2024

arxiv.org/abs/2010.06610 is such an insane paper and idea 🤯

250

24,499

Andreas Kirsch 🇺🇦 · Dec 4, 2024 · 6:54 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

4 Dec 2024

Mandatory ELBO derivation in any lecture series: (I think I finally understand the unnecessarily confusing derivation in the VAE paper 😅)

233

21,355

Andreas Kirsch 🇺🇦 · May 25, 2023 · 3:04 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

25 May 2023

Whoop whoop, I got my first single-author paper accepted 🎉🎉🎉

239

26,661

Andreas Kirsch 🇺🇦 · Feb 4, 2024 · 3:35 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

4 Feb 2024

Replying to @bendreyfuss

What a banger 🤯

231

7,210

Andreas Kirsch 🇺🇦 · Jan 5, 2023 · 11:49 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

5 Jan 2023

Intuition why adding Gaussian noise to parameters is nice for optimization: when we integrate/marginalize over the noise, we convolve/blur the loss surface with a Gaussian kernel -> making it smoother

237

66,186

Andreas Kirsch 🇺🇦 · Aug 17, 2024 · 6:53 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

17 Aug 2024

This is one of the best papers I have read in a while. It contains a crazy amount of insights and ideas 🤯

Stanislav Fort

@stanislavfort

13 Aug 2024

✨🎨🏰Super excited to share our new paper Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness Inspired by biology we 1) get adversarial robustness + interpretability for free, 2) turn classifiers into generators & 3) design attacks on vLLMs 1/12

240

33,932

Andreas Kirsch 🇺🇦 · Oct 16, 2025 · 5:16 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

16 Oct 2025

Pretty amazing: they're lowering PyTorch to JAX using a custom torch tensor type and dispatch overrides (all natively supported by PyTorch). It's both amazing and I imagine painful to debug, but great that it works

vLLM

@vllm_project

16 Oct 2025

Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on TPUs with no code changes, now with native JAX support. - Up to 5x Performance: Achieve nearly 2x-5x higher throughput compared to the first TPU prototype. - Ragged Paged Attention v3: A more flexible and performant attention kernel for TPUs. - SPMD Native: We've shifted to Single Program, Multi-Data (SPMD) as the default, a compiler-centric model native to TPUs for optimal execution. Dive deep into the new architecture and see the performance benchmarks in our latest blog post! blog.vllm.ai/2025/10/16/vllm… #vLLM #TPU #JAX #PyTorch #AI #OpenSource

245

27,111

Andreas Kirsch 🇺🇦 · Jun 17, 2025 · 7:21 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

17 Jun 2025

Can we say that LLMs can't code bc of this paper. Is it fair? No tool use was allowed at all 🤯 How many people can write correct code like that, without running it once to debug or find typos? Even then, the models are in the top 1.5% of human coders. Bad news, indeed 😬

This tweet is unavailable

243

32,918

Andreas Kirsch 🇺🇦 · Jun 25, 2023 · 2:09 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

25 Jun 2023

My dream work-life would be 50% basic research and 50% applied engineering work 🤗 And 50% reading papers and books 😅🫠

227

29,179

Andreas Kirsch 🇺🇦 · Aug 3, 2023 · 12:32 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

3 Aug 2023

Working through my thesis corrections and GitHub Copilot gracefully auto-completed both the sentence and auto-generated a comment from my supervisor 😂

226

32,199

Andreas Kirsch 🇺🇦 · Jul 3, 2024 · 1:43 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

3 Jul 2024

This is too real 😂 I've spent the last days reading through the chapters in PML

228

25,148

Andreas Kirsch 🇺🇦 · Apr 10, 2025 · 11:40 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

10 Apr 2025

Why does Google Doc still not support LaTeX equations? 😭😭😭

235

19,731

Andreas Kirsch 🇺🇦 · Oct 9, 2025 · 10:38 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

9 Oct 2025

TRM also provides an intuition that can be directly applied to reasoning LLMs (and RL): To say, that in the TRM algorithm we can view z as thoughts and y as final output (which is compared to some golden solution during training to compute a reward). The `latent recursion` is a refinement (self-critique) cycle of the thoughts with a final update to the solution. The point of `deep recursion` is that we need not take gradients through all steps, but it is sufficient to focus on the last refinement cycle. Now the intuition is clear that for LLMs, we could repeatedly apply a self-improvement operator. starting from empty thoughts and solution (z, y = \empty) or an initial draft. We then compute rewards using a golden solution towards the proposed solution and only backprop through the last refinement cycle, but this should be sufficient to improve the refinement operation. Deep supervision actually has the benefit that bad reasoning will amplify across so many steps, so the final gradient will provide a stronger signal to reign that in

Alexia Jolicoeur-Martineau @jm_alexia

7 Oct 2025

New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: alexiajm.github.io/2025/09/2… Code: github.com/SamsungSAILMontre… Paper: arxiv.org/abs/2510.04871

233

25,663

Andreas Kirsch 🇺🇦 · May 9, 2024 · 9:13 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

9 May 2024

Replying to @paulpowlesland

It's a very UK thing that you press the button and by the time it switches to green, there is no more traffic anyway 😂

205

21,778

Andreas Kirsch 🇺🇦 · Jul 29, 2021 · 9:14 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

29 Jul 2021

Different PyTorch versions changing the significance of my results was the world I always dreamt of living in 🙈😅

209

Andreas Kirsch 🇺🇦 · Apr 30, 2025 · 7:16 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

30 Apr 2025

Singapore is the most impressive city I've ever visited. Way more than NYC or SF. I'm so happy that ICLR (& AABI) decided on Singapore, and I hope we can avoid the US for a bit 🙏 big thanks to my friends for showing me around and to all the awesome people I met and talked to 😊

225

19,602

Andreas Kirsch 🇺🇦 · Jun 4, 2023 · 3:46 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

4 Jun 2023

My current belief is that AI capabilities are more strongly correlated with compute availability than with research itself. Is this wrong?

204

60,028

Andreas Kirsch 🇺🇦 · Aug 14, 2020 · 9:15 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

14 Aug 2020

Why are the NeurIPS reviews released on a Friday and then people are only given 3.5 working days to write rebuttals? Do we care about work-life balance at all? 🤔🐹

206

Andreas Kirsch 🇺🇦 · Jun 26, 2025 · 5:00 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

26 Jun 2025

Isn't this the whole OpenAI Zürich office? 😂

Meghan Bobrowsky

@MeghanBobrowsky

26 Jun 2025

Scoop: Meta has poached three OpenAI researchers: Lucas Beyer, Alexander Kolesnikov and Xiaohua Zhai, according to people familiar with the matter. An OpenAI spox confirmed the three have left the company.

214

35,069

Andreas Kirsch 🇺🇦 · Nov 11, 2022 · 12:02 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

11 Nov 2022

Several recent papers connect the generalization error of a model to the model’s prediction disagreement. And oh no, I've taken a look at one of them, an ICLR 2022 spotlight, in more detail 🔥 And published my thoughts in TMLR 🥳 1/16 openreview.net/forum?id=oRP8…

A Note on "Assessing Generalization of SGD via Disagreement"

Several recent works find empirically that the average test error of deep neural networks can be estimated via the prediction disagreement of models, which does not require labels. In particular...

openreview.net

204

Andreas Kirsch 🇺🇦 · Apr 12, 2025 · 10:39 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

12 Apr 2025

My personal bitter lesson is that my random hot take tweets get more views than any of my papers or blog posts that I spend months on 😅🫠

193

17,012

Andreas Kirsch 🇺🇦 · Jul 22, 2023 · 9:59 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

22 Jul 2023

After PhD: would I do it again? No not like that. Was it worth it? No not really. Would I recommend it? YMMV but prob not 😅

183

74,471

Andreas Kirsch 🇺🇦 · Dec 9, 2023 · 9:31 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

9 Dec 2023

Replying to @martinmbauer

It also appears in machine learning a lot, just not sure if anyone has specifically recognized as such. Eg. negative entropy is the log product integral over density using the density measure. It also appears in PAC-Bayes equations a lot

178

20,173

Andreas Kirsch 🇺🇦 · Sep 30, 2025 · 11:35 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

30 Sep 2025

The most effective way to achieve better performance is through pre-training of RL. This unlocks a lot of high-quality data. Right now, pretraining on graduate physics or maths texts is allowed the same compute as text with low information density. The model cannot predict those tokens well right away. Allowing additional thinking tokens during pretraining enables the model to extract a lot more signal from such data

Ali Hatamizadeh

@ahatamiz1

30 Sep 2025

Are you ready for web-scale pre-training with RL ? 🚀 🔥 New paper: RLP : Reinforcement Learning Pre‑training We flip the usual recipe for reasoning LLMs: instead of saving RL for post‑training, we bring exploration into pretraining. Core idea: treat chain‑of‑thought as an action. Reward it by the information gain it provides for the very next token: This gives a verifier‑free, dense reward on ordinary text with no task checkers, no labels, no filtering. Why this matters ? * 🧠 Models think before predicting during pretraining, not just after alignment. * 📈 Position‑wise credit at every token = stable signal at full web‑scale. * 🔁 No proxy filters or “easy‑token” heuristics. Trains on the entire stream. Results: On the 8‑benchmark math+science suite (AIME’25, MATH‑500, GSM8K, AMC’23, Minerva Math, MMLU, MMLU‑Pro, GPQA): • Qwen3-1.7B-Base: RLP improves the overall average by 24% ! • Nemotron-Nano-12B-v2-Base: RLP improves the overall average by 43% ! 📄Paper: tinyurl.com/rlp-pretraining ✍️Blog: research.nvidia.com/labs/adl… #AI #LLM #ReinforcementLearning #ChainOfThought #Pretraining #RLP

184

23,228

Andreas Kirsch 🇺🇦 · Sep 22, 2024 · 10:28 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

22 Sep 2024

I read "Bias/Variance is not the same as Approximation/Estimation" which is a really nice read on what the title says. There is still so much to be learnt from some of these "simple" equations and decompositions openreview.net/forum?id=4TnF…

Bias/Variance is not the same as Approximation/Estimation

We study the relation between two classical results: the bias-variance decomposition, and the approximation-estimation decomposition. Both are important conceptual tools in Machine Learning...

openreview.net

177

16,000

Andreas Kirsch 🇺🇦 · Mar 12, 2025 · 9:41 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

12 Mar 2025

Replying to @NoFarmsNoFoods

Yep, totally support this. Why would anyone not support this?

172

4,662

Andreas Kirsch 🇺🇦 · Mar 7, 2024 · 11:21 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

7 Mar 2024

1/ I just read the fascinating GaLore paper on memory-efficient LLM training using gradient low-rank projection. Kudos to the authors for this insightful work! My TL;DR and some thoughts below (as a little paper review) 🧵

Prof. Anima Anandkumar

@AnimaAnandkumar

7 Mar 2024

For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimizer states during training. Training LLMs from scratch currently requires huge computational resources with large memory GPUs. While there has been significant progress in reducing memory requirements during fine-tuning (e.g., LORA), they do not apply for pre-training LLMs. We design methods that overcome this obstacle and provide significant memory reduction throughout training LLMs. Training LLMs often requires the use of preconditioned optimization algorithms such as Adam to achieve rapid convergence. These algorithms accumulate extensive gradient statistics, proportional to the model's parameter size, making the storage of these optimizer states the primary memory constraint during training. Instead of focusing just on engineering and system efforts to reduce memory consumption, we went back to fundamentals. We looked at the slow-changing low-rank structure of the gradient matrix during training. We introduce a novel approach that leverages the low-rank nature of gradients via Gradient Low-Rank Projection (GaLore). So instead of expressing the weight matrix as low rank, which leads to a big performance degradation during pretraining, we instead express the gradient weight matrix as low rank without performance degradation, while significantly reducing memory requirements. @jiawzhao @BeidiChen @tydsh

170

53,037

Andreas Kirsch 🇺🇦 · Oct 22, 2025 · 3:37 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

22 Oct 2025

Re memory layers: Shouldn't memory layers include a sink (ie. one memory slot with score 0 which is always included that can be used as none-of-the-above)? Performing softmax on the top-k means that the memory layer can never defer, so you will always grab some random memory locations even if they're not very relevant

Jessy Lin

@realJessyLin

21 Oct 2025

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full finetuning and LoRA see drastic drops in held-out task performance (📉-89% FT, -71% LoRA on fact learning tasks), memory layers learn the same amount with far less forgetting (-11%). 🧵:

178

22,450

Andreas Kirsch 🇺🇦 · Apr 20, 2020 · 12:12 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

20 Apr 2020

🎉🎉Happy & proud to share some research into Information Bottlenecks from @yaringal, @clarelyle and me at @OATML_oxford 🎉🎉 We provide intuition and practical IB objectives for modern DNN architectures, like ResNets. Check it out on arXiv 👉arxiv.org/abs/2003.12537

168

Andreas Kirsch 🇺🇦 · Dec 2, 2022 · 11:53 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

2 Dec 2022

Yet again feeling slightly overwhelmed by the speed of progress in ML. Kinda wish I could go back in time a few years and re-prioritize certain studies ^^'

165

Andreas Kirsch 🇺🇦 · Nov 6, 2025 · 3:38 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

6 Nov 2025

For what it's worth, I know lots of folks who essentially work as research scientists without PhD, and reflecting on my own journey, I could have done as much research (if not more with more compute) if I hadn't left to do a PhD in academia ^^' Thad said, doing a PhD has its own perks and benefits ofc

Karan🧋

@kmeanskaran

6 Nov 2025

HOT TAKE: Reality is, you can't actually work in top-quality ML research labs without a PhD. Top research labs still look for people with PhDs and excellence in maths, stats, PyTorch, neural networks, and CUDA kernels. In India, quality ML research labs are virtually nonexistent. Most good research labs are in the US/UK and China. Implementing papers and working on T4 Colab is cool, but you won't cross the threshold to become a researcher. 99% of ML people belong to the applied side, which has better practical perks: - MNCs or SF startups - You can switch and get promoted every 1.5 years - You can move to product management or CTO - All you need is hands-on experience and not many research papers - Cashflow is best I really respect people who code research papers, but how long will you wait for your breakthrough? In 3 months, research evolves, and you're following it without actually building anything. Stop following blindly! The world's best research labs pick only from top universities, not because you've implemented papers and posted on X! Either go for a PhD outside India or stick to the applied ML side. The job market is saturated and will remain so because we're evolving post-COVID. On the other hand, no startup or research lab thinks about you. You must focus on your growth and money first, then look for impact.

174

30,802

Andreas Kirsch 🇺🇦 · Jan 17, 2025 · 11:13 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

17 Jan 2025

I'm sorry for any hurt feelings for calling NeurIPS PCs clowns and pointing out an apparent domain conflict of interest. I didn't mean any PC individually or personally, but the organization and its (lack of) processes. Sadly, my sentiment was warranted - even if the phrasing might not be to everyone's liking. So, to avoid further tone policing and suggestions to remove the tweet, let me rephrase it to remove any personal mention: NeurIPS is a clown show ("a comically shambolic state of affairs"), and I'm disappointed by the unprofessional official response and the lack of seriousness at the biggest ML conference. I wonder if my takeaway and recommendations should be to take such papers as the role models they are presented as. The incentives are clear: Publish in an area with much hype and fewer knowledgable reviewers; do not worry about attribution; put all of your related work section in the appendix; if someone complains informally, always agree but never act or only to the bare minimum; always wait until after the conference (or at least after the decision notice) to address anything; and if worst comes to worst and someone complains formally, fear not because there are no good processes anyway - everybody knows each other, and there are too many other incentives for anyone to be a party pooper. The ML community has neither teeth nor an appetite for academic integrity. There are too many things that taste sweeter.

Andreas Kirsch 🇺🇦

@BlackHC

13 Jan 2025

NeurIPS 2024 PCs being a bunch of clowns 🤡 the state of ML 🙄 All you get back a month after raising a concern:

166

68,537

Andreas Kirsch 🇺🇦 · Aug 25, 2021 · 5:45 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

25 Aug 2021

If you're not interested in your PhD topic and projects at all, what should you do?

148

Andreas Kirsch 🇺🇦 · Oct 25, 2024 · 8:46 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

25 Oct 2024

Replying to @lastpositivist

It's about the University failing the student by not raising issues earlier and other academics disagreeing with that assessment in the first place. You are not supposed to fail a student at confirmation for things that should have been raised at the transfer of status viva

143

12,731

Andreas Kirsch 🇺🇦 · Jan 27, 2023 · 2:58 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

27 Jan 2023

Somebody please explain high-dimensional embeddings to me and what things look like in those spaces 😅

156

72,719

Andreas Kirsch 🇺🇦 · Dec 27, 2022 · 12:58 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

27 Dec 2022

I might be finishing my PhD within the next half year. What does one do after? I guess the time to apply for stuff was a few months ago 😅

156

94,653

Andreas Kirsch 🇺🇦 · May 11, 2025 · 7:41 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

11 May 2025

Publishing a paper at ICML costs 650 USD, which is cheaper than some journals, but it cannot beat TMLR, which is free and comes with higher-quality reviews 🥳

This tweet is unavailable

164

25,561

Andreas Kirsch 🇺🇦 · Sep 30, 2023 · 5:35 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

30 Sep 2023

Replying to @tunguz

ALT Forrest Gump Mircales Forrest Gump Chad GIF

141

59,422

Andreas Kirsch 🇺🇦 · Oct 26, 2021 · 4:44 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

26 Oct 2021

Due to popular demand: for example, revising basic maths some more again (LA, calculus), some more ML basics (kernel methods, SVMs, etc), old DL papers. But also orthogonally: for various projects, spend more time on lit reviews/prior art and playing around with the baselines...

150

Andreas Kirsch 🇺🇦 · Apr 7, 2024 · 12:24 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

7 Apr 2024

Finally catching up on MoE with @finbarrtimbers's great posts on this (link below). My thoughts MoE and objectives: 1. Instead of the weirdly unprincipled additional losses, one can simply maximize the mutual information I[E;T], where E is the expert index and T is the token: I[E; T] = H[E] - H[E | T], so maximizing the mutual info maximizes the entropy of expert selection without knowing the token H[E], ie all experts selected uniformly across all data, while it minimizes the entropy knowing the token H[E | T], ie as one-hot as possible for a given token, equivalent to K=1. 2. The softmax(top-K of router logits + normal noise) is almost equiv a double softmax: Taking the top-K (router logits + Gumbel noise) is equivalent to samping from softmax(router logits) k times w/o replacement. Applying the softmax to those samples simply distributes the credit accordingly between the top-K chosen experts. A potentially cleaner formulation would simply always use a full mixture and only look at the top-K sampling approach etc as performance optimizations. 3. The "router Z-loss" seems overcomplicated. Z seems to stand for the partition constant of the induced categorical distribution of the logits. The Z loss does not affect router predictions as it affects all expert logits in the router equally, and it is motivated by numerical stability. Instead of regularizing with Z loss explicitly as a loss, one could also simply adapt the bias of the router network and shift it by the mean logit activations of a training batch. Same effect and no loss needed. 4. Why do we use MoE only for FFNs and not for attention? MoE for QKV or at least for the Q matrices would seem quite valuable to make attention token-specific and either save FLOPs or get better attention for same FLOPs. Mixture of Depths seems to look at that finally.

162

53,658

Andreas Kirsch 🇺🇦 · Oct 3, 2024 · 1:19 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

3 Oct 2024

Replying to @miniapeur

Had to check: it is real but book-bait... uses "a one-sided generalized derivative called a subdifferential" instead of derivatives link.springer.com/book/10.10…

Calculus Without Derivatives

link.springer.com

143

7,133

Andreas Kirsch 🇺🇦 · Oct 1, 2023 · 4:01 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

1 Oct 2023

“Stochastic Batch Acquisition: A Simple Baseline for Deep Active Learning” has been published @TmlrOrg 🎉 My last paper with @OATML_Oxford and the amazing @seb_far*, @parmi_atg, @anndvision, @fbranchaud1, and @yaringal Some details and a blog post below:

155

26,495

Andreas Kirsch 🇺🇦 · Jul 4, 2025 · 6:06 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

4 Jul 2025

Replying to @cwamidon @lauriewired

So riding a bicycle requires petrol bc your Amazon parcels are delivered by truck? Same logic 🙄 I wouldn't trust a founder making such an argument with anything

155

8,603

Andreas Kirsch 🇺🇦 · Mar 29, 2022 · 6:44 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

29 Mar 2022

ML design interview question @ @Waymo: they log a lot of online telemetry data, too much to transmit it all. only some of that data is interesting when run in simulation later because it might lead to divergent and wrong behaviour by agents in the simulation.

154

Andreas Kirsch 🇺🇦 · Sep 20, 2020 · 7:18 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

20 Sep 2020

arxiv.org/pdf/1905.12957.pdf “Neural Entropic Estimation: A faster path to mutual information estimation” 👈 this paper is beautiful It derives the Donsker-Varadhan representation en passant using simple straightforward steps to get to MINE’s estimator...

150

Andreas Kirsch 🇺🇦 · Aug 11, 2024 · 10:26 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

11 Aug 2024

A small info-theory thread (or at least food for thought): Why is the Bayesian Model Average the best choice? Really why? I'll go through a naive argument (anyone has better references?), simple lower-bounds and decompositions, and pitch a "reverse mutual information" 1/15

151

17,495

Andreas Kirsch 🇺🇦 · Jul 3, 2025 · 4:02 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

3 Jul 2025

Replying to @krishnanrohit @bjeansonne @PessimistsArc

We are though? The overall trajectory is still one of continued global mass extinction, and this is putting lipstick on a pig

142

2,988

Andreas Kirsch 🇺🇦 · Feb 4, 2025 · 10:02 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

4 Feb 2025

Replying to @kellerjordan0

I'm bored, so let's examine how the two losses are maybe similar: d_i = ||w_i - x||^2 = -2 w_i . x + ||w_i||^2 + ||x||^2 -log d_i \approx 1 - d_i So then: 1/d_i = exp(-log d_i) \approx exp(2 w_i . x - ||w_i||^2 - ||x||^2 + 1) = exp(2 w_i . x + C_i) where C_i = - ||w_i||^2 - ||x||^2 + 1 The harmonic loss probability looks approximately like a softmax: 1/d_i / (sum_j 1/d_j) \approx exp(2 w_i . x + C_i) / (sum_j exp(2 w_jY . x + C_j)) Softmax is invariant to shifts, so what if the C_j are approximately constant? Empirically, "Understanding Softmax Confidence and Uncertainty" by Pearce et al. (2021) [1] argues that all w_i are about the same magnitude for optimal decision boundaries when trained using CE loss (so assuming we train the model for long enough). Then, ||w_i|| \approx const, and all C_i are approx const for a given x. Thus : exp(2 w_i . x + C_i) / (sum_j exp(2 w_j . x + C_j)) \approx exp(2 w_i . x + C_i) / (sum_j exp(2 w_j . x + C_i)) = exp(2 w_i . x) / (sum_j exp(2 w_j . x)) This is the same as the regular softmax in the cross-entropy loss with temperature 1/2, so slightly sharper. What does this tell us? Not much, but I was bored, and this was fun 😊 --- [1] arxiv.org/abs/2106.04972

150

11,841

Andreas Kirsch 🇺🇦 · Oct 21, 2023 · 8:37 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

21 Oct 2023

Replying to @ForensicArchi

OSINTtechnical

@Osinttechnical

21 Oct 2023

The Forensic Architecture report appears to use a Russian rocket impact in Ukraine as evidence of an Israeli artillery impact in Gaza. The cited picture they use here fairly clearly shows the remains of a Russian 122mm Grad rocket that hit Kharkiv Oblast last year.

129

7,632

Andreas Kirsch 🇺🇦 · Mar 20, 2022 · 3:24 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

20 Mar 2022

Finally, another paper summary from me for the amazing "Limitations of the Empirical Fisher Approximation for Natural Gradient Descent" by Kunstner, @lukas_balles & @PhilippHennig5 notion.so/Limitations-of-the…

139

Andreas Kirsch 🇺🇦 · Oct 8, 2023 · 3:24 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

8 Oct 2023

Very happy to announce that my reproduction "Does ‘Deep Learning on a Data Diet’ reproduce? Overall yes, but GraNd at Initialization does not" has been published in @TmlrOrg 🥳 1/5

149

41,244

Andreas Kirsch 🇺🇦 · Oct 29, 2025 · 4:09 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

29 Oct 2025

Replying to @suchenzang

Can't wait for the series about it: Game of Flops

148

9,840

Andreas Kirsch 🇺🇦 · Aug 4, 2023 · 11:15 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

4 Aug 2023

Kinda depressing when you keep getting rejection emails for internships 10 months later without even being asked for an interview 😐

128

47,436

Andreas Kirsch 🇺🇦 · Apr 9, 2025 · 11:59 AM UTC

Andreas Kirsch 🇺🇦

@BlackHC

9 Apr 2025

Weirdly, I'm rooting for China somehow. How did that happen?

135

11,506

Andreas Kirsch 🇺🇦 · Jul 1, 2025 · 8:45 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

1 Jul 2025

A lot of finance bros are discovering that they wanted to do AI research all along right now

137

9,518

Andreas Kirsch 🇺🇦 · Jul 3, 2025 · 8:00 PM UTC

Andreas Kirsch 🇺🇦

@BlackHC

3 Jul 2025

Replying to @pli_cachete

Nah it's a human em-dash bc real em-dashes don't have whitespace around them, and an LLM would know better

131

24,992