On X we surface the AI research that matters and explain the ideas behind it. In the newsletter, we connect the dots between AI’s past, present, and future ⬇️

Join over 115,000 readers
CS109A Data Science course materials @Harvard are free and open for everyone! 1. Lecture notes 2. R code, Python notebooks 3. Lab material 4. Advanced sections Learn here: harvard-iacs.github.io/2019-…
35
955
3,076
All algorithms implemented in Python 🤯 This library has 163k stars on GitHub! It includes a ton of algorithms from arithmetic analysis to blockchain to data structures.
22
532
2,583
355,922
MetaGPT: Simulates a whole software company 🤔 It assigns roles like product managers, architects, project managers, and engineers to GPTs. With just one line of code, MetaGPT generates user stories, competitive analyses, requirements, data structures, APIs, documents, and more. Mind-blowing, right?
50
374
2,497
663,020
CS109A Data Science course materials @Harvard are free and open for everyone! 1. Lecture notes 2. R code, Python notebooks 3. Lab material 4. Advanced sections Learn here: harvard-iacs.github.io/2019-…
30
796
2,446
A free book for you! Fundamentals of Data Visualization by Claus O. Wilke It's a guide to making visualizations that accurately reflect the data, tell a story, and look professional. Read the open book here: clauswilke.com/dataviz/
27
623
2,320
Yann LeCun’s @ylecun Deep Learning Course is now free & fully online at @NYUDataScience Videos, slides, notes, and notebooks! cds.nyu.edu/deep-learning/
13
591
2,025
🔥3 free data science books, the most popular among our subscribers! 1. Fundamentals of Data Visualization by @ClausWilke 2. Python Data Science Handbook by @jakevdp 3. Hands-On Data Visualization by @HandsOnDataViz Links⬇️
26
512
1,801
9 techniques you should know to master AI: - RAG (like Multimodal and Agentic RAG) - Knowledge distillation - Prompt optimization - GRPO - Mixture-of-Experts (MoE) - Chains-of-... : Chain-of-Agents and Chain-of-RAG - Methods reducing memory use, e.g. LightThinker, MLA - Advanced attention mechanisms: Slim Attention, KArAt, and XAttention - Synthetic data generation and human-in-the-loop (HITL) role All about them in these guides -> turingpost.com/p/jan-jul-rec…
21
360
1,866
117,289
3 free books, the most popular ones! 1. Fundamentals of Data Visualization 2. Hands-On Data Visualization 3. Reinforcement Learning: An Introduction Share this post with your friends to spread the word! Links⬇️
21
513
1,766
CS109A Data Science course materials @Harvard are free and open for everyone! 1. Lecture notes 2. R code, Python notebooks 3. Lab material 4. Advanced sections Learn here: harvard-iacs.github.io/2019-…
20
576
1,689
Document-to-Markdown converter for LLM pipelines – MarkItDown from @Microsoft This Python tool converts dozens of file types to clean Markdown, keeping headings, lists, tables, links, and metadata. Supports: - PDF, Word, Excel, PowerPoint - HTML, CSV, JSON, XML - Images (OCR + EXIF), audio (transcription + metadata) - ZIP files, YouTube URLs, EPubs, and more As Markdown is LLMs' "native language," it's perfect for preprocessing documents before feeding them into models.
20
229
1,778
153,781
A free book for you! "Think Python" by Allen Downey. Read here: lnkd.in/ge2tTdbu
25
424
1,632
3 free courses, the most popular ones! 1. CS109A Data Science @Harvard 2. Linear Algebra @MIT 3. Mathematics of Big Data and Machine Learning @MIT Share this post with your friends to spread the word! Links⬇️
28
464
1,570
CS109A Data Science course materials @Harvard are free and open for everyone! 1. Lecture notes 2. R code, Python notebooks 3. Lab material 4. Advanced sections Learn here: harvard-iacs.github.io/2019-…
44
523
1,481
13 Awesome MCP Servers: ▪️ Agentset MCP ▪️ GitHub MCP Server ▪️ arXiv MCP ▪️ MCP Run Python ▪️ Safe Local Python Executor ▪️ Cursor MCP Installer ▪️ Basic Memory ▪️ Filesystem MCP Server ▪️ Notion MCP Server ▪️ Markdownify MCP Server ▪️ Fetch MCP Server ▪️ Mobile Next ▪️ MCP Installer Save the list and check this out for the links and more info: huggingface.co/posts/Ksenias…
26
308
1,517
146,759
OpenAI CEO, Sam Altman, testified before Congress yesterday for 4 hours. Don't have the time to watch it all? We've collected the highlights that are already resonating with people 🧵
28
339
1,457
540,252
KV cache compression techniques ▪️KV caching (basic) – stores previously computed Keys and Values in memory and calculates attention only for new tokens. ▪️ Quantization – represents KV cache with fewer bits. ▪️ Low-rank decomposition – compresses the KV cache into smaller spaces. ▪️ Slim Attention – stores only Keys and recovers Values from them using math tricks. ▪️ XQuant – quantizes and stores only the layer input activations (X), and recalculates Keys and Values from X on the fly during inference. Read about XQuant method (the newest one) and other methods with their limitations in this overview: turingpost.com/p/xquant
17
173
1,431
88,339
TimeGPT is the first foundation model specifically designed for time series analysis. It excels at generating precise forecasts across a diverse range of datasets and domains. Here's what you need to know about it: 1/8
23
246
1,380
211,505
Log-linear attention — a new type of attention proposed by @MIT which is: - fast and efficient as linear attention - expressive as softmax It uses a small but growing number of memory slots that increases logarithmically with the sequence length. Here's how it works:
12
210
1,401
103,919
CS106A - Programming Methodology course @Stanford is free and open for everyone! 38 lectures 9 assignments 2 exams Learn here: see.stanford.edu/Course/CS10…
12
435
1,298
3 free books, the most popular ones! 1. Fundamentals of Data Visualization 2. Hands-On Data Visualization 3. Reinforcement Learning: An Introduction Share this post with your friends to spread the word! 1/2
14
317
1,264
A free book for you! Hands-On Data Visualization by Hands-On Data Visualization has an open version. It takes you step-by-step through 1. tutorials 2. real-world examples 3. online resources Read here: handsondataviz.org/
11
290
1,211
Breaking: Microsoft adds @openai’s main rival’s model : @grok is coming to their foundry model collection. @elonmusk has spoken
47
93
1,193
126,720
Linear Algebra course by @MIT is free and open for everyone! Find: 1. Videos and lecture notes 2. Assignments: problem sets with solutions 3. Exams and solutions Learn the basics: ocw.mit.edu/courses/mathemat…
8
311
1,177
A free book for you! Learn: 1. NumPy & Pandas 2. Matplotlib: data visualizations 3. Scikit-Learn: efficient & clean implementations of ML algorithms Read the open "Python Data Science Handbook: Essential Tools for Working with Data": jakevdp.github.io/PythonData…
12
300
1,137
A free book for you! Learn: 1. NumPy & Pandas 2. Matplotlib: data visualizations 3. Scikit-Learn: efficient & clean implementations of ML algorithms Read the open "Python Data Science Handbook: Essential Tools for Working with Data": jakevdp.github.io/PythonData…
13
313
1,148
Linear algebra course materials @MIT are free and open for everyone. 1) Lecture videos and slides 2) Homeworks 3) Exams and solutions Learn linear algebra as a basis for ML. Find materials here: ocw.mit.edu/courses/mathemat…
2
244
1,063
12 Foundational AI Model Types ▪️ LLM ▪️ SLM ▪️ VLM ▪️ MLLM ▪️ LAM ▪️ LRM ▪️ MoE ▪️ SSM ▪️ RNN ▪️ CNN ▪️ SAM ▪️ LNN Save the list and check this out for explanations and links to the useful resources: huggingface.co/posts/Ksenias…
34
259
1,093
69,710
Google’s TPUs are finally having their breakout moment - a decade after the debut. Anthropic just signed a deal for up to 1 million TPUs - over a gigawatt of compute - making it one of the biggest AI infrastructure agreements to date. ▪️ Basically, TPUs accelerate neural networks, focusing on matrix multiplications and machine learning tasks. • At the heart of Google’s TPU is the Matrix Multiply Unit – a 256×256 grid of multiply-accumulate cells that pumps data through in waves. • Around it sit on-chip memories designed to keep data local and fast. The TPU acts as a coprocessor, streaming data through its systolic array and reusing it on-chip, keeping the matrix unit busy ~100% of the time. The newest generation, TPU v7 "Ironwood," is liquid-cooled, built for inference, and ships as 256-chip pods or 9,216-chip superpods, hitting new highs in performance per watt. Now begins a new stage for TPUs as a serious alternative to GPUs, optimized for AI, cheaper to run, and fine-tuned through years of Google’s own use.
34
160
1,115
95,651
13 Outstanding MCP Servers: ▪️ Hugging Face Official MCP Server ▪️ Browser MCP ▪️ Bright Data ▪️ JSON MCP ▪️ Octagon Deep Research ▪️ VLM Run MCP ▪️ AllVoiceLab MCP Server ▪️ MCP Email Server ▪️ Google Admin MCP ▪️ Android MCP ▪️ DeepView ▪️ Calculator ▪️ MCP Aggregator Save the list and check this out for the links and more info: huggingface.co/posts/Ksenias…
21
203
1,093
88,457
Introduction to computer science and programming course materials @MIT are open for everyone! 1. Lecture videos 2. Assignments 3. Exams Learn here: ocw.mit.edu/courses/6-00-int…
11
362
1,022
CS109A Data Science course materials @Harvard are free and open to everyone! 1. Lecture notes 2. R code, Python notebooks 3. Lab material 4. Advanced sections Learn here: harvard-iacs.github.io/2019-…
11
302
1,003
169,858
3 free courses, the most popular ones! 1. CS109A Data Science @Harvard 2. Linear Algebra @MIT 3. Mathematics of Big Data and Machine Learning @MIT Share this post with your friends to spread the word! Links⬇️
14
339
1,007
11 new types of RAG ▪️ InstructRAG ▪️ Collaborative RAG (CoRAG) ▪️ ReaRAG ▪️ MCTS-RAG ▪️ Typed-RAG ▪️ MADAM-RAG ▪️ HM-RAG ▪️ CDF-RAG ▪️ NodeRAG ▪️ HeteRAG ▪️ Hyper-RAG Save the list and check this out for the links and more info: huggingface.co/posts/Ksenias…
11
215
1,002
78,546
15 types of attention mechanisms ▪️ Soft attention (Deterministic) ▪️ Hard attention (Stochastic) ▪️ Self-attention ▪️ Cross-Attention (Encoder-Decoder attention) ▪️ Multi-Head Attention (MHA) ▪️ Multi-Head Latent Attention (MLA) ▪️ Memory-Based attention ▪️ Adaptive attention ▪️ Scaled Dot-Product attention ▪️ Additive attention ▪️ Global attention ▪️ Local attention ▪️ Sparse attention ▪️ Hierarchical attention ▪️ Temporal attention Check this out for the links and more info: huggingface.co/posts/Ksenias…
10
207
997
54,794
A free Artificial Intelligence for Beginners course by @Microsoft. ▪️ 12 weeks ▪️ 24 lessons It introduces learners to the world of AI. github.com/microsoft/ai-for-…
5
279
957
166,654
CS 224: Advanced Algorithms course @Harvard is open for everyone! Materials: - Lecture slides and videos - Assignments For more: people.seas.harvard.edu/~min…
8
236
968
Machine learning is famous for its open resources. We collected 3 free courses about ML: ▪️ Introduction to Machine Learning, @MIT ▪️ Mathematics for Computer Science, @MIT ▪️ Practical Deep Learning, @fastdotai 🧵
11
234
914
115,115
A free book for you! Learn: 1. NumPy & Pandas 2. Matplotlib: data visualizations 3. Scikit-Learn: efficient & clean implementations of ML algorithms Read the open "Python Data Science Handbook: Essential Tools for Working with Data": jakevdp.github.io/PythonData…
6
240
898
A must-read paper → Fundamentals of Building Autonomous LLM Agents Reviews the core cognitive subsystems that make up autonomous LLM-powered agents, including: - Perception - Reasoning & planning: CoT, MCTS, ReAct, Tree-of-Thought (ToT) techniques - Long- & short-term memory - Execution: code execution, tool use, API calls - Closed feedback loop: wiring up perception → reasoning → memory → action
24
164
954
55,793
Free book on Statistics! Introduction to Modern Statistics. Read here: openintro-ims.netlify.app
5
208
892
A free book: A First Course on Data Structures in Python by Donald R. Sheehy Provides building blocks you need for AI and machine learning: - data structures - algorithmic thinking - complexity analysis - recursion/dynamic programming - search methods donsheehy.github.io/datastru…
11
171
928
46,192
Mathematics of Big Data and Machine Learning course materials by @MIT are free and open for everyone! Find: 1. Lecture notes 2. Lecture videos 3. Slides Learn here: ocw.mit.edu/courses/res-ll-0…
10
326
885
CS 224: Advanced Algorithms course @Harvard is open for everyone! Materials: - Lecture slides and videos - Assignments For more: people.seas.harvard.edu/~min…
2
195
883
Linear Algebra course by @MIT is free and open for everyone! Find: 1. Videos and lecture notes 2. Assignments: problem sets with solutions 3. Exams and solutions Learn the basics: ocw.mit.edu/courses/mathemat…
7
218
881
Small Language Models (SLMs) are the future of Agentic AI, claim @NVIDIA researchers. Moreover, they offer a method for converting existing agent systems from using LLMs to SLMs that could work in practice. Here are the details:
15
161
924
76,062
A look at the history of Reinforcement Learning What is Temporal-Difference (TD) learning? @RichardSSutton introduced TD learning in 1988, and today most widely used RL algorithms, like deep actor-critic, rely on TD error as the learning signal. So, TD learning: ▪️ Allows agents to learn under uncertainty, when the system isn’t fully known. How? → It compares successive predictions and updates incrementally, learning a little bit every step. TD learning minimizes prediction error through gradient steps. The main feature is the target it uses: - In supervised learning, the target is actual final outcome. - TD learning’s target is next prediction. ▪️ TD algorithms also demonstrate what is on- vs. off-policy (the agent’s action-making strategy) in RL algorithms: - On-policy: learns from the same policy it uses to act. - Off-policy: acts with one policy but learns about another, usually better, policy. ▪️ TD learning is not only the foundation of today’s RL, but also comes with notable advantages: • Avoids being misled by rare lucky outcomes, tying results to underlying states. • Saves memory and computation since it doesn’t store all predictions until the end. • Works well in real-world use, where waiting for final outcomes is too slow. So TD learning made faster and more accurate predictions possible. More about TD-style algorithms and all RL foundations + the present and future of RL here: turingpost.com/p/rlguide
14
133
902
56,911
4 advanced attention mechanisms you should know: • Slim attention — 8× less memory, 5× faster generation by storing only K from KV pairs and recomputing V. • XAttention — 13.5× speedup on long sequences via "looking" at the sum of values along diagonal lines in the attention matrix. • Kolmogorov-Arnold Attention, KArAt — Adaptable attention with learnable activation functions using KANs instead of softmax. • Multi-token attention (MTA) — Lets the model consider groups of nearby words together for smarter long-context handling. Read the overview of them in our free article on @huggingface -> huggingface.co/blog/Kseniase…
9
173
906
56,329
Microsoft presented Azure ChatGPT. The code is open-sourced under MIT Licence. Benefits: 1. Privacy: Fully isolated from data operated by OpenAI. 2. Control: Network traffic can be fully isolated to your network 3. Plug-n-play: Add your internal data sources or use plug-ins to integrate with your internal services.
12
170
825
284,450
The hardware powering AI ▪️ GPU (Graphics Processing Unit) ▪️ TPU (Tensor Processing Unit) ▪️ CPU (Central Processing Unit) ▪️ ASICs (Application-Specific Integrated Circuits) ▪️ NPU (Neural Processing Unit) ▪️ APU (Accelerated Processing Unit) ▪️ IPU (Intelligence Processing Unit) ▪️ RPU (Resistive Processing Unit) ▪️ FPGA (Field-Programmable Gate Array) ▪️ Quantum Processors ▪️ Processing-in-Memory (PIM) & MRAM-based chips ▪️ Neuromorphic Chips Read about them in this guide: turingpost.com/p/pu
12
133
897
65,511
A free book for you! Learn: 1. NumPy & Pandas 2. Matplotlib: data visualizations 3. Scikit-Learn: efficient & clean ML algorithms Read the open "Python Data Science Handbook: Essential Tools for Working with Data": jakevdp.github.io/PythonData… nitter.app/TheSequenceAI/status/1…
10
235
844
"NLP with Deep Learning" by @stanfordnlp is free and open! Find links on: - lecture slides - lecture videos - notes - codes - suggested readings All is here⬇️ web.stanford.edu/class/cs224…
3
249
817
Tiny Recursive Model (TRM) is a simple, effective approach built on the idea: do more with less. It uses just 1 small 2-layer network that recursively improves its own answers. With only 7M parameters, TRM sets new records, beating LLMs 10,000× larger: - Sudoku-Extreme: 55% → 87% - Maze-Hard: 75% → 85% - ARC-AGI-1: 40% → 45% - ARC-AGI-2: 5% → 8% Here is how it works:
23
110
841
62,629
.@GoogleAI proposed a Chain-of-Agents (CoA) framework that uses multiple AI agents working together to reason through long texts. It outperforms RAG and full-context processing by up to 10%! Here's how CoA works: 1. Worker agents handle different parts of the text and share their insights to the next agent. 2. A manager agent combines these insights into a final, coherent output. Details below:
14
147
818
76,591
One of the most comprehensive Surveys of Reinforcement Learning for LRMs Covers: - LLMs ➝ LRMs via RL (math, code, reasoning) - Reward design, policy optimization, sampling - RL vs SFT, training recipes - Uses: coding, agents, multimodal, robotics, etc. - Future approaches: continual/memory/model-based RL, pretraining, diffusion, co-design
14
177
825
51,486
5 latest open-source LLMs (save the list) 1. BLOOMZ 2. OPT-IML 3. Pythia 4. LLaMA 5. Vicuna All repos in 🧵
13
155
792
196,768
A free book for you! Fundamentals of Data Visualization by Claus O. Wilke It's a guide to making visualizations that accurately reflect the data, tell a story, and look professional. Read the open book here: clauswilke.com/dataviz/
7
181
763
.@karpathy's nanochat is bigger that you think He calls it a ramp, but it's actually a lab of its own – a miniature system where anyone can experiment And most importantly – it’s deeply connected to education, allowing us to understand machine intelligence through a tiny model: 1. What is nanochat and how you can use it? It's a miniature LM that costs anything from $100 (~4 hours on an 8XH100 node) to train and behaves like a small, curious creature. Karpathy described it as a “kindergarten child”: cheerful, error-prone, sometimes absurd, always revealing. It represents a full learning loop – pretraining, supervised fine-tuning, and reinforcement learning – at a scale that can fit on a desk. It's an antidote to abstraction. 2. Learning through synthetic worlds: Using synthetic conversations, Karpathy taught it who it is: a small model named nanochat d32, built by Andrej, aware of its limitations, occasionally royal enough to call him “King.” Later, he added a new skill: counting letters in words, through a small synthetic SpellingBee dataset. This became a teaching aid for anyone curious how identity and behavior emerges from data. 3. New ways of thinking about thought: - Karpathy compared autoregressive text models to diffusion ones, and wondered what it would mean to train nanochat the second way. - He also questioned tokens themselves, dreaming of models that read the world visually, free from text and its inherited mess. - Even new methods like BF16 -> FP16 are immediately going to a different branch on nanochat. So through Nanochat, education shrinks to something smaller, hands-on, open and more visible. Real understanding now comes from microcosms – models light enough to experiment with and clear enough to learn from.
12
72
802
53,469
"NLP with Deep Learning" by @stanfordnlp is free and open! Find links on: - lecture slides - lecture videos - notes - codes - suggested readings All is here⬇️ web.stanford.edu/class/cs224…
4
218
765
A free book for you! "Introduction to Probability for Data Science" by Stanley H. Chan. Read here: services.publishing.umich.ed…
12
207
743
.@karpathy is AI’s favorite wandering naturalist. He’s absurdly influential. In a good way. So why not condense and connect the dots from his August posts? 1. LLMification of Knowledge (Aug 28) Why feed models static PDFs when we could restructure everything as machine-legible courses? Markdown exposition, problems as supervised fine-tuning pairs, exercises as reinforcement environments, infinite synthetic problem generators. An LLM Academy where models study like students. ▪️ The question: If the internet was the training ground of pretraining, could LLMified curricula become the training ground of reasoning? 2. Eras of Model Learning (Aug 27) Pretraining was text. Fine-tuning was conversation. RL is environments. But Karpathy doubts humans learn through reward alone. He hints at new paradigms: system prompts, context-driven updates, memory distillation. ▪️ The question: If RL is a bridge, what’s the destination paradigm for machine learning? 3. Coding in Layers (Aug 24) If 2024 was chat, 2025 is code. Cursor for intent, highlight-and-edit for tweaks, Claude/GPT-5 for large tasks. Abundance shifts the challenge: orchestrating layers of assistance without losing taste, abstraction, or direction. ▪️ The question: What’s the coder’s role now? 4. The Problem of Intent (Aug 9) Models default into “exam mode” when we sometimes just want a glance. Benchmarks push them into overthinking. Humans know the difference. Models don’t. The missing piece is an intent channel. ▪️ The question: Can routing help with it? Looking Forward Along Karpathy’s threads you can trace a map of where LLMs are heading – and where human-AI collaboration still feels unfinished. They sketch an ecosystem in transition: from text ingestion → to study, code, and reason inside environments. If 2024 was chat and 2025 is code, maybe 2026 is environments. Not just for models, but for us too, learning how to live and work inside them. And the main question: What will your environment look like?
2024: everyone releasing their own Chat 2025: everyone releasing their own Code
5
53
778
102,229
Natural Language Reinforcement Learning (NLRL) redefines Reinforcement Learning (RL). The main idea: In NLRL, the core parts of RL like goals, strategies, and evaluation methods are reimagined using natural language instead of rigid math. What are the benefits? - NLRL uses not only single numbers but also detailed feedback - Interpretable and easier to understand - Human-like decision-making Let's explore this approach more precisely🧵
11
139
739
87,503
8 free sources about AI Agents: ▪️ Agents, @Google's whitepaper ▪️ Agents in the Long Game of AI (book) ▪️ AI Engineer Summit 2025: Agent Engineering (video) ▪️ AI Agents course from @huggingface ▪️ Artificial Intelligence: Foundations of Computational Agents, 3rd Edition, book ▪️ Intelligent Agents: Theory and Practice (book) ▪️ Our articles "AI Agents and Agentic Workflows" ▪️ Our collection: 8 Free Sources to Master Building AI Agents Save the list and check this out for the links: huggingface.co/posts/Ksenias…
9
169
739
107,081
Happy Holidays, friends!🥳 We've made a recap of the most popular courses for you. 1. Deep Learning by @ylecun, @alfcnz 2. Linear Algebra @MIT 3. Deep Learning for NLP @DeepMind, @CompSciOxford Find links in the thread! 1/5
3
140
694
Download your copy of Understanding Machine Learning: From Theory to Algorithms book by Shai Shalev-Shwartz and Shai Ben-David. It introduces the reader to ML: from fundamental theoretical ideas to maths behind practical algorithms. 🔗 cs.huji.ac.il/~shais/Underst…
2
215
686
83,811
Chain-of-Experts (CoE) - a new kind of model architecture. It builds on Mixture-of-Experts (MoE) idea that a model can choose a different expert each round. ➡️ As a new addition, experts work in a sequence, one after the other within a layer. CoE keeps the number of active experts the same as before, but: - Uses up to 42% less memory - Unlocks over 800× more effective expert combinations - Improves performance Here's how it works:
16
113
718
63,384
Check out a free edition of Think Bayes by Allen Downey. 1. This book uses Python code instead of math! 2. There’s a Jupyter notebook for every chapter 3. Code is written in NumPy, SciPy, and Pandas Find it here for free: greenteapress.com/wp/think-b…
8
146
689
Mathematics of Big Data and Machine Learning course materials by @MIT are free and open for everyone! Find: 1. Lecture notes 2. Lecture videos 3. Slides Learn here: ocw.mit.edu/courses/res-ll-0…
3
199
677
A free book for you! Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Learn Maths for ML here: mml-book.github.io/book/mml-…
4
170
669
.@GoogleDeepMind proposed a method that enhances LLMs with an offline coprocessor that works with the models' internal memory (kv-cache). • What's the coprocessor's role? It enhances the model's KV-cache by adding extra "latent embeddings" (compressed representations) for more accurate outputs. • What is good about it? - The coprocessor operates independently, and the base LLM remains frozen. - It operates offline and asynchronously, meaning it can improve the model’s memory in the background. - If the coprocessor isn’t available or extra computation isn’t needed, the model still functions as usual. - The model achieves lower perplexity. - This method works across various tasks without additional fine-tuning. Here are the details:
14
113
685
76,119
10 latest Preference Optimization techniques ▪️ Pref-GRPO ▪️ PVPO (Policy with Value PO) ▪️ DCPO (Dynamic Clipping PO) ▪️ ARPO (Agentic Reinforced PO) ▪️ GRPO-RoC (Resampling-on-Correct) ▪️ TreePO ▪️ DuPO ▪️ TempFlow-GRPO ▪️ MixGRPO ▪️ MaPPO (Maximum a Posteriori PO) Save the list! Check this out for the links and more info: huggingface.co/posts/Ksenias…
11
112
684
42,917
9 Multimodal Chain-of-Thought methods ▪️ KAM-CoT ▪️ Multimodal Visualization-of-Thought (MVoT) ▪️ Compositional CoT (CCoT) ▪️ URSA ▪️ MM-Verify ▪️ Duty-Distinct CoT (DDCoT) ▪️ Multimodal-CoT ▪️ Graph-of-Thought (GoT) ▪️ Hypergraph-of-Thought (HoT) Save the list, and check this out for the links and more info: huggingface.co/posts/Ksenias…
5
130
671
41,598
Linear Algebra course by @MIT is free and open to everyone! Find: 1. Videos and lecture notes 2. Assignments: problem sets with solutions 3. Exams and solutions Learn the basics here: ocw.mit.edu/courses/18-06sc-…
2
168
652
7+ main precision formats used in AI ▪️ FP32 ▪️ FP16 ▪️ BF16 ▪️ FP8 (E4M3 / E5M2) ▪️ FP4 ▪️ INT8/INT4 ▪️ 2-bit (ternary/binary quantization) General trend: higher precision for training, lower precision for inference. Save the list and learn more about these formats here: huggingface.co/posts/Ksenias…
12
99
674
63,249
8 free ML courses -- our favorites: 1. Deep Learning @nyuniversity 2. NLP with Deep Learning @Stanford 3. Learning From Data @Caltech 4. Full-stack production deep learning (DL) @UCBerkeley 5. Linear Algebra Review @CarnegieMellon Check the rest here: thesequence.substack.com/p/c…
10
224
630
PPO vs GRPO vs REINFORCE – a workflow breakdown of the most talked-about reinforcement learning algorithms ➡️ Proximal Policy Optimization (PPO): The Stable Learner It’s used everywhere from dialogue agents to instruction tuning as it balances between learning fast and staying safe. ▪️ How PPO works step by step: 1. A query is fed into the Policy Model (which is trainable), and it produces an output. 2. That output gets sent to 2 frozen models for scoring: 🔹 The Reference Model calculates how far the new output strays from the original behavior using KL divergence. 🔹 The Reward Model gives the output a score r, evaluating its helpfulness, coherence, or alignment. 3. The critic’s take: 🔹 The Value Model (also trained) tries to predict how good that output should have been, producing v - an expected reward. 4. Calculating advantage: PPO uses Generalized Advantage Estimation (GAE) to figure out the advantage, meaning how much better or worse the action was compared to expectations. 5. Gentle updates only: This is where PPO earns its name. • It uses a clipped objective to prevent wild updates to the policy, limiting how much the new version can diverge from the old one. • It may also watch the KL divergence to double-check the policy isn't drifting too far. 6. Joint optimization: PPO updates the policy, value function, and sometimes adds entropy to keep the model exploring new ideas. ✅ Why PPO is good? ▪️ Thanks to clipping and KL control, PPO is hard to break, so it stays stable. ▪️ The value function helps squeeze more learning from fewer samples. --- ➡️ Group Relative Policy Optimization (GRPO): Learning by Comparison GRPO skips the value model, and is tailored for reasoning-heavy tasks where relative quality matters more than absolute scores. ▪️ GRPO in action: 1. The policy model takes a query and generates a group of answers, which gives us a playground for comparison. 2. Each answer gets scored: 🔹 The Reward Model evaluates all outputs with rewards r1, r2, ... 🔹 GRPO normalizes these scores, subtracting the group’s mean and dividing by standard deviation. 🔹 Now each output knows where it stands relative to its peers. 3. No critic model: That relative score becomes the advantage. No need for a separate value model. 4. Smart advantage propagation: In case of chain-of-thought reasoning, GRPO assigns rewards to individual steps, then backpropagates scores to all earlier tokens. Tokens contributing early to a strong answer gain more credit, guiding the model on a productive reasoning path. 🔄 Iterative GRPO GRPO retrains the Reward Model with new, better outputs, and refreshes the Reference Model alongside the policy to keep the KL penalty meaningful. It reuses a bit of old data (~10%) to stabilize training and avoid forgetting ✅ Why GRPO can be a better choice: • No value model = no extra weight • Relative Rewards = stronger signals • Perfect for tasks with multiple steps or structured thinking • Can handle longer sequences and bigger batches --- ➡️ REINFORCE: The Monte Carlo Policy Gradient REINFORCE is like “vanilla” policy gradient: it updates the policy directly based on full-episode returns, without needing a value model. ▪️ How REINFORCE works: 1. The trainable policy model interacts with the environment, producing actions until an episode ends. 2. For each action taken at time t, REINFORCE computes the return (Gt) - the sum of discounted rewards from t to the end. ​ 3. Policy update rule: Each action gets reinforced proportional to how good its return was. Actions that led to high returns become more probable. 4. Optionally add a baseline: To reduce variance, subtract a baseline, for example a value estimate or average reward. 5. Run new episodes with the updated policy, gather new returns, and update again. ✅ Why REINFORCE matters: • Simple and unbiased: A pure Monte Carlo estimator of the policy gradient. • No critic needed • Conceptual foundation for many modern algorithms like GRPO.
11
114
662
41,219
Retrieval-of-Thought (RoT) makes reasoning models faster by reusing earlier reasoning steps as templates. These steps are stored in a “thought graph” that shows both their order and meaning. As a result, RoT: - reduces output tokens by up to 40% - speeds up inference by 82% - lowers cost by 59% All without losing accuracy. Here is how it works:
11
90
640
42,989
A free book for you! Learn: 1. NumPy & Pandas 2. Matplotlib: data visualizations 3. Scikit-Learn: efficient implementations of ML algorithms Read the open "Python Data Science Handbook: Essential Tools for Working with Data": jakevdp.github.io/PythonData…
2
151
600
"Deep Learning for Natural Language Processing" course lectures are open for everyone! It's an advanced course from @CompSciOxford and @DeepMind. All lecture materials, recordings, and practicals: github.com/oxford-cs-deepnlp…
4
172
613
A free book for you! Learn: 1. NumPy & Pandas 2. Matplotlib: data visualizations 3. Scikit-Learn: efficient & clean implementations of ML algorithms Read the open "Python Data Science Handbook: Essential Tools for Working with Data": jakevdp.github.io/PythonData…
6
144
544
A free book for you! Hands-On Data Visualization by Hands-On Data Visualization has an open version. It takes you step-by-step through 1. tutorials 2. real-world examples 3. online resources Read here: handsondataviz.org/
7
111
543
3 free MLOps courses you should know about: ▪️ MLOps Course, @GokuMohandas ▪️ CS 329S: Machine Learning Systems Design @Stanford ▪️ MLOps Zoomcamp @Al_Grigor 🧵
6
155
553
84,466
11 alignment and optimization algorithms for LLMs ▪️ PPO (Proximal Policy Optimization) ▪️ DPO (Direct Preference Optimization) ▪️ GRPO (Group Relative Policy Optimization) ▪️ DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization) ▪️ AMPO (Active Multi-Preference Optimization) ▪️ Supervised Fine-Tuning (SFT) ▪️ Monte Carlo Tree Search (MCTS) ▪️ RLHF (Reinforcement Learning from Human Feedback) ▪️ SPIN (Self-Play Fine-Tuning) ▪️ SPPO (Self-Play Preference Optimization) ▪️ RSPO (Regularized Self-Play Policy Optimization) Save the list and check this out for the links and more info: huggingface.co/posts/Ksenias…
6
122
562
40,137
Why do we need GPUs for AI? ➡️A GPU (Graphics Processing Unit) is built for parallelism – it splits a bigger job into smaller tasks and distributes them across processing cores. Inside a GPU, billions of tiny transistors are etched onto a silicon chip, arranged into thousands of these processing cores. Connected by intricate wiring and high-bandwidth memory, they enable rapid data flow. For example, the most popular GPUs are: • A100 (NVIDIA) – features Tensor Cores and supports multi-instance GPU (MIG) to split one GPU into several efficient logical GPUs. • H100, H200 (NVIDIA) – deliver transformer engine support, massive memory bandwidth, and high-level speed for training and inference. So the GPU architecture is perfect for repetitive matrix and tensor calculations on enormous datasets, which are the basis of training and running AI models. ▪️ That's why GPUs are now the main driver of AI performance. But there are many alternatives to GPU to power AI - ASICs, APU, NPU, IPU, TPU, and others. We unpack them in this guide -> turingpost.com/p/pu
9
74
550
36,656
Machine learning is famous for its open resources. We collected 3 free courses about ML: ▪️ Introduction to Machine Learning, @MIT ▪️ Mathematics for Computer Science, @MIT ▪️ Practical Deep Learning, @fastdotai 🧵
3
169
530
71,714
"NLP with Deep Learning" by @stanfordnlp is free and open! Find links on: 1. lecture slides & videos 2. notes 3. codes 4. suggested readings All is here⬇️ web.stanford.edu/class/cs224…
1
144
532
Paper2Agent brings research papers ‘to life.’ This open tool from @Stanford transforms static papers into interactive AI assistants that can explain and apply their methods. It builds on the MCP and works in 2 layers: - Paper2MCP: Extracts the paper’s methods and code into an MCP server. - Agent layer: Links the MCP server to a chat agents. The result is a real conversational assistant for the paper. ▪️ Paper2Agent has already worked impressively for agents that use AlphaGenome, Scanpy, and TISSUE tools. Here are the details 🧵
11
113
522
37,557
A free book for you! "Foundations of Data Science" by Avrim Blum, John Hopcroft, and Ravindran Kannan. It provides an introduction to the mathematical and algorithmic foundations of data science. Find it here: cs.cornell.edu/jeh/book.pdf
3
131
515
Recently, PyTorch introduced mm, a 3D matrix multiplication (matmul) visualizer. This tool is interactive, runs in the browser or notebook iframes, and features shareable links. You can visualize: • Matrix multiplications • Attention heads • LORA Some examples:
6
82
512
103,162
A free book for you! "Foundations of Data Science" by Avrim Blum, John Hopcroft, and Ravindran Kannan. It provides an introduction to the mathematical and algorithmic foundations of data science. Find it here: cs.cornell.edu/jeh/book.pdf
6
136
523
.@GoogleAI has dropped a very interesting study They introduced new types of attentional bias strategies in LLMs and reimagined the "forgetting" process, replacing it with "retention." All of this is wrapped up in Miras – their new framework for designing efficient AI architectures using 4 building blocks: • Memory architecture – how the memory is built • Attentional bias – how the model focuses • Retention gate – how it forgets or keeps information • Memory learning algorithm – how it’s trained Details 🧵
10
77
530
43,371
"Deep Learning for Natural Language Processing" course lectures are open for everyone! It's an advanced course from @CompSciOxford and @DeepMind. All lecture materials, recordings, and practicals: github.com/oxford-cs-deepnlp…
144
527
A free book: Learning Theory from First Principles by @BachFrancis It covers a bunch of key topics from machine learning (ML) theory and practice, such as: - Math basics - Supervised learning - Generalization, overfitting & adaptivity - Tools to design learning algorithms - Optimization in ML - Local, Kernel and sparse methods - Neural networks - Ensembles - Online learning - Overparameterized models and more! The book also includes simple experiments (in MATLAB and Python), exercises, and references to more advanced material Read it here: di.ens.fr/~fbach/ltfp_book.p…
2
104
526
33,827
PPO and GRPO — a workflow breakdown of the most popular reinforcement learning algorithms ➡️ Proximal Policy Optimization (PPO): The Stable Learner It’s used everywhere from dialogue agents to instruction tuning as it balances between learning fast and staying safe. ▪️ How PPO works step by step: 1. A query is fed into the Policy Model (which is trainable), and it produces an output. 2. That output gets sent to 2 frozen models for scoring: 🔹 The Reference Model calculates how far the new output strays from the original behavior using KL divergence. 🔹 The Reward Model gives the output a score r, evaluating its helpfulness, coherence, or alignment. 3. The critic’s take: 🔹 The Value Model (also trained) tries to predict how good that output should have been, producing v - an expected reward. 4. Calculating advantage: PPO uses Generalized Advantage Estimation (GAE) to figure out the advantage, meaning how much better or worse the action was compared to expectations. 5. Gentle updates only: This is where PPO earns its name. • It uses a clipped objective to prevent wild updates to the policy, limiting how much the new version can diverge from the old one. • It may also watch the KL divergence to double-check the policy isn't drifting too far. 6. Joint optimization: PPO updates the policy, value function, and sometimes adds entropy to keep the model exploring new ideas. ✅ Why PPO is good? ▪️ Thanks to clipping and KL control, PPO is hard to break, so it stays stable. ▪️ The value function helps squeeze more learning from fewer samples. --- ➡️ Group Relative Policy Optimization (GRPO): Learning by Comparison GRPO skips the value model, and is tailored for reasoning-heavy tasks where relative quality matters more than absolute scores. ▪️ GRPO in action: 1. The policy model takes a query and generates a group of answers, which gives us a playground for comparison. 2. Each answer gets scored: 🔹 The Reward Model evaluates all outputs with rewards r1, r2, ... 🔹 GRPO normalizes these scores, subtracting the group’s mean and dividing by standard deviation. 🔹 Now each output knows where it stands relative to its peers. 3. No critic model: That relative score becomes the advantage. No need for a separate value model. 4. Smart advantage propagation: In case of chain-of-thought reasoning, GRPO assigns rewards to individual steps, then backpropagates scores to all earlier tokens. Tokens contributing early to a strong answer gain more credit, guiding the model on a productive reasoning path. 🔄 Iterative GRPO GRPO retrains the Reward Model with new, better outputs, and refreshes the Reference Model alongside the policy to keep the KL penalty meaningful. It reuses a bit of old data (~10%) to stabilize training and avoid forgetting ✅ Why GRPO can be a better choice: • No value model = no extra weight • Relative Rewards = stronger signals • Perfect for tasks with multiple steps or structured thinking • Can handle longer sequences and bigger batches
9
111
531
44,502
8 Emerging trends in Reinforcement Learning ▪️ Reinforcement Pre-Training (RPT) ▪️ RL from Human Feedback (RLHF) ▪️ RL with Verifiable Rewards (RLVR) ▪️ RL from AI Feedback (RLAIF) ▪️ Multi-objective RL ▪️ Parallel thinking RL ▪️ MCTS-in-the-loop ▪️ Process-aware RL (like PRM-style GRPO) Save the list and check this out for the links and more info: huggingface.co/posts/Ksenias…
9
82
530
28,441
"NLP with Deep Learning" by @stanfordnlp is free and open! Find links on: 1. lecture slides & videos 2. notes 3. codes 4. suggested readings All is here⬇️ web.stanford.edu/class/cs224…
141
498
CS 224: Advanced Algorithms course by @minilek @Harvard is open for everyone! - Lecture slides and videos - Assignments For more: people.seas.harvard.edu/~min…
3
123
511
Mathematics for Computer Science course @MIT is open for everyone! 1. Definitions, proofs, sets, functions, relations 2. Discrete structures: graphs, state machines, modular arithmetic, counting 3. Discrete probability theory openlearninglibrary.mit.edu/…
1
134
501
Python for Data Science, AI & Development is a free course by @IBM. It will take you from 0 to programming in Python. It consists of 4 weeks: 1. Basics 2. Data Structures 3. Programming Fundamentals 4. Working with Data Learn here: coursera.org/learn/python-fo…
6
153
502