Building self-improving AI @dair_ai • Prev: Meta AI | PhD • Learn about AI Agents for FREE here: academy.dair.ai/courses/elem…

DAIR.AI Academy
BREAKING: xAI announces Grok 3 Here is everything you need to know:
91
351
7,386
1,898,358
We live in incredible times.
64
964
6,890
411,669
AI Agents vs. Agentic AI Interesting paper summarizing distinctions between AI Agents and Agentic AI. It also talks about the key ideas, solutions, and the future. Here are my notes:
225
1,039
5,569
753,154
BREAKING: xAI announces Grok 4 "It can reason at a superhuman level!" Here is everything you need to know:
111
352
5,624
1,323,999
This maths book is trending on Hacker News! I took a quick look and realized how great of a book this is to learn how to think mathematically. It's 700 pages long and very approachable compared to other maths books. math.cmu.edu/~jmackey/151_12…
70
990
4,977
736,944
Just being honest: when looking for skilled machine learning and NLP engineers, I'm not looking at CVs anymore. Now I directly look at blogs, GitHub repos, videos, Twitter, etc. Having a CV is fine... but don't forget to document along the way (in detail) what you've built.
119
370
4,674
Foundations of LLMs This amazing new LLM book just dropped on arXiv. 200+ pages! It covers areas such as pre-training, prompting, and alignment methods. It looks like a great intro to LLMs for devs and researchers.
35
807
4,699
401,654
Anthropic is killing it with these technical posts. If you're an AI dev, stop what you are doing and go read this. It shows, in great detail, how to implement an effective multi-agent research system. Pay attention to these key parts:
55
433
4,522
563,112
The Illusion of Thinking in LLMs Apple researchers discuss the strengths and limitations of reasoning models. Apparently, reasoning models "collapse" beyond certain task complexities. Lots of important insights on this one. (bookmark it!) Here are my notes:
136
612
4,442
955,775
LLMs Get Lost in Multi-turn Conversation The cat is out of the bag. Pay attention, devs. This is one of the most common issues when building with LLMs today. Glad there is now paper to share insights. Here are my notes:
99
611
4,139
754,732
As usual, Anthropic just published another banger. This one is on context engineering. Great section on how it is different from prompt engineering. A must-read for AI devs.
61
440
3,959
367,758
BloombergGPT is a new LLM for finance. It's a 50 billion parameter language model trained on financial data. Claims the largest domain-specific dataset yet with 363 billion tokens... further augmented with 345 billion tokens from general purpose datasets. Outperforms existing models on financial tasks while not sacrificing performance on general LLM benchmarks. arxiv.org/abs/2303.17564v1
64
673
3,493
1,012,446
NEW: OpenAI announces new tools for building agents. Here is everything you need to know:
67
280
3,321
781,294
YC on the key prompting techniques used by the best AI startups:
33
300
3,297
658,759
The past month I've been writing detailed notes for the first 15 lectures of Stanford's NLP with Deep Learning. Notes contain code, equations, practical tips, references, etc. As I tidy the notes, I need to figure out how to best publish them. Here are the topics covered so far:
35
556
3,113
How Transformers Work This is probably one of the most beautiful visualizations of how today's LLMs work. ig.ft.com/generative-ai/
24
721
3,132
743,601
Everyone is talking about this new OpenAI paper. It's about why LLMs hallucinate. You might want to bookmark this one. Let's break down the technical details:
106
456
3,196
454,529
2022: A Year in Review (ML Papers Edition) In this thread, let's take a look at some of the top trending ML papers of 2022 ↓
74
626
3,089
704,925
Llama 4 is here! - Llama 4 Scout & Maverick are up for download - Llama 4 Behemoth (preview) - Advanced problem solving & multilingual - Support long context up to 10M tokens - Great for multimodal apps & agents - Image grounding - Top performance at the lowest cost - Can be served within $0.19-$0.49/M tokens
31
344
3,154
407,077
A Deep Dive into Reasoning LLMs This is a really nice summary of the progress made in post-training and reasoning LLMs. Highly recommend this one!
17
558
2,998
245,047
Exactly how I like learning about this stuff. MCMC is not a difficult concept to understand if you have the right person explain it to you.
16
256
2,929
258,849
All-in-one book that covers most of the maths you will need for machine learning. Free PDF here: cis.upenn.edu/~jean/math-dee…
45
580
2,760
New language model just dropped! GPT4All - a 7B parameter model (based on LLaMA) trained on a massive collection of clean assistant data including code, stories, and dialogue. Also releases 800K data samples, data curation procedures, training code, and model weights to promote open research. A quantized 4-bit version of the model is released that can run on CPU. repo: github.com/nomic-ai/gpt4all
32
502
2,802
719,580
Here is a new open-source IDE to help you build multi-agent systems. It's like Cursor but specifically for building multi-agent workflows. It's powered by OpenAI Agents SDK, connects MCP servers, and can integrate into your apps using HTTP or the SDK.
41
182
506
72,239
NEW: Google announces Agent2Agent Agent2Agent (A2A) is a new open protocol that lets AI agents securely collaborate across ecosystems regardless of framework or vendor. Here is all you need to know:
72
471
2,801
337,114
ML YouTube Courses (5.1K⭐️) I've added more info and categories to the repo, so it's much easier to find relevant courses. github.com/dair-ai/ML-YouTub…
21
647
2,689
Top 50 LLM Interview Questions. Looks like a great resource to learn LLM basics:
13
371
2,666
349,927
This repo contains all my notes for the "Introduction to Deep Learning" course from MIT. The notes are great for studying fundamentals and new topics in ML. I put them in Notion so you can easily extend them. More exciting ML content is on its way! github.com/dair-ai/ML-Course…
19
699
2,607
These machine learning cheatsheets contain some of the best and well-organized ML content I've come across. Sometimes, it's just good to understand the concept at a high level and it's context before going deep. This resource helps with that. stanford.edu/~shervine/teach…
32
550
2,596
Machine Learning YouTube Courses ( 7K⭐️ ) A few new entries have made into the collection. I'm glad to hear that a lot of ML students found this repo helpful to discover new courses. github.com/dair-ai/ML-YouTub…
25
635
2,561
Anthropic's recent AI Prompt Engineering deep dive is a must watch! Here's Claude's summary of the mentioned prompting techniques and tips:
31
338
2,656
394,906
FinGPT: Open-Source Financial LLMs FinGPT is an open-source LLM for the finance sector. It takes a data-centric approach, providing researchers & practitioners with accessible resources to develop FinLLMs. paper: arxiv.org/abs/2306.06031 code: github.com/AI4Finance-Founda…
43
570
2,594
572,227
I've been writing a special notebook to help get comfortable with mathematics for machine learning. It will contain popular equations paired with code and explanations. The idea is to start with a small collection of 100 (beginner to advanced). Would you be interested in this?
148
239
2,565
NEW: Google presents Agent Development Kit (ADK) Features: - code-first - multi-agents - rich tool ecosystem - flexible orchestration - integrated dev xp - development-ready - streaming - state, memory, artifacts - extensibility > pip install google-adk
39
415
2,628
271,860
Prompt Engineering Guide (20K⭐️) We started with basic prompt examples and have expanded to a comprehensive prompt engineering guide used by thousands of AI developers and researchers working with LLMs. - Now over 200K+ learners - Chinese & Japanese translations are now available - GPT-4 & ChatGPT guides and notebooks - Collection of all the latest tools and papers on prompt engineering - Added LLM collection - Papers explanations in progress ... and much more We aim to build the ultimate resource to learn how to work and build with LLMs. A lot more to come. Support and contributions are welcome! web: promptingguide.ai/ repo: github.com/dair-ai/Prompt-En…
50
510
2,551
617,305
New chatbot just dropped! Vicuna-13B - an open-source chatbot trained by fine-tuning LLaMA on ~70K user-shared ChatGPT conversations. Claims to achieve "more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90%* of cases". Seems possible to run it on your own machine with a single GPU. repo: github.com/lm-sys/FastChat blog: vicuna.lmsys.org/ demo: chat.lmsys.org/
46
446
2,504
634,286
Understanding Deep Learning If you are looking for a comprehensive deep learning book with some of the more recent trends ( transformers, diffusion models, GNNs,...), this looks like a great option. udlbook.github.io/udlbook/
31
581
2,472
283,228
deep learning activation functions made cool and cute teddit.net/r/learnmachinelea…
14
547
2,441
We are living in the most insane timeline. I just asked Claude Code (with Claude Sonnet 4.5) to develop an MCP Server (end-to-end) that allows me to programatically create n8n workflows from within Claude Code itself. Took about 10 mins!
68
206
2,526
221,888
As an ML Engineer, this is one of the most useful applications of GPT-4 I've seen. Chat Explore is a powerful AI-powered data exploration tool. Here’s why I am so impressed:
37
341
2,418
716,668
Say goodbye to Chain-of-Thought. Say hello to Chain-of-Draft. To address the issue of latency in reasoning LLMs, this work introduces Chain-of-Draft (CoD). Read on for more:
38
365
2,435
278,486
o3-mini-high (left) vs. deepseek-r1 (right) results from the first try deepseek-r1 is cracked... wtf!
102
170
2,368
719,806
🎓 ML YouTube Courses 🎓 In case you missed it, I maintain a highly-curated collection of some of the best and latest machine learning courses available on YouTube. So much good free content to get started with or to catch up on. github.com/dair-ai/ML-YouTub…
13
597
2,353
Anthropic continues to crush it with these guides. This is a good example of what context engineering involves.
32
170
2,436
404,036
265 pages of everything you need to know about building AI agents. 5 things that stood out to me about this report:
29
414
2,362
281,147
Nice list by Google! It consists of 321 real-world gen AI use cases from the world's leading organizations. Great for learning how others are finding success with gen AI and AI agents.
25
274
2,279
200,256
OpenAI Introduces Operator & Agents! Here is everything you need to know:
58
209
2,241
488,986
Lots of tweets about GPT-4 in the last 8 hours. Here is a thread highlighting some of the interesting examples, tricks, and discussions I've come across ↓
34
417
2,205
654,344
I'm creating a new repo organizing all my machine learning & NLP PyTorch notebooks. Out of curiosity, would you be interested in this?
51
291
2,145
Here we go! Microsoft introduces a multimodal large language model called Kosmos-1. Achieves great performance on language understanding, OCR-free NLP, perception-language tasks, visual QA, and more.
27
463
2,054
410,298
Wow! This is exciting! First ever course on Transformers by Stanford. Really looking forward to the release of all lectures. website: web.stanford.edu/class/cs25/ youtube: piped.video/playlist?list=PL…
19
384
2,056
Google recently published this great whitepaper on Agents. 2025 is going to be a huge year for AI Agents. Here's what's included: - Introduction to AI Agents - The role of tools in Agents - Enhancing model performance with targeted learning - Quick start to Agents with LangChain - Production applications with Vertex AI Agents Great place to start learning about AI Agents.
26
363
2,097
261,635
🎓LLM Course This is such a beautiful and comprehensive resource on LLMs. It includes notebooks, key references, and roadmaps. There is something to learn for everyone. For students, researchers, and practitioners. The Prompt Engineering Guide is also referenced, which is cool to see. One observation as I was reviewing the references is how much hard work the ML community dedicates toward open and high-quality education. This resource does a great job of organizing all those incredible LLM educational resources that exist out there. One topic I would add is LLMOps. But to be fair, the majority of the topics are roughly covered in the LLM Engineer Roadmap. Highly recommended! And last but not least, many thanks to @maximelabonne for releasing this excellent resource. 👏
10
456
2,034
173,323
NEW: Google introduces AI co-scientist. It's a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs. 2025 is truly the year of multi-agents! Let's break it down:
109
375
2,055
211,574
ChatLLaMA - an open-source implementation of LLaMA based on RLHF. Claims a 15x faster training process than ChatGPT. It allows users to fine-tune personalized ChatLLaMA assistants. github.com/nebuly-ai/nebullv…
33
427
2,019
347,784
Context Engineering Guide I'm writing a detailed guide on context engineering for AI devs. v1 is out now! (bookmark it) I use a concrete deep research multi-agent example to show what context engineering involves.
38
294
2,033
287,730
Hierarchical Reasoning Model This is one of the most interesting ideas on reasoning I've read in the past couple of months. It uses a recurrent architecture for impressive hierarchical reasoning. Here are my notes:
43
278
2,070
258,762
Microsoft releases NLWeb NLWeb uses MCP to make it simple to interact with websites in a standardized way. Devs can now convert any website into an AI app. MCP is to NLWeb what HTTP is to HTML. This went largely unnoticed this week, but it looks like a big deal.
28
318
2,060
282,761
Machine Learning Notes I've been writing notes introducing some of the most important topics in AI today. This thread lists a few notes I've published so far:
54
420
1,956
405,279
Don't do RAG Proposes cache-augmented generation (CAG) to eliminate retrieval latency and minimize retrieval errors. What is CAG? CAG aims to leverage the capabilities of long-context LLMs by preloading the LLM with all relevant docs in advance and precomputing the key-value (KV) cache. The preloaded context helps the model to provide contextually accurate answers without the need for additional retrieval during runtime. When to apply CAG? It's a useful alternative to RAG for cases where the documents/knowledge for retrieval are of limited, manageable size. My thoughts: As LLMs advance in capabilities, I suspect that what we know as RAG today could change significantly either architecturally or how it's optimized. CAG is one in a growing list of developments and new ideas that have emerged recently to address limitations like poor retrieval relevancy and latency. There could also be hybrid methods that combine preloading with selective retrieval. Don't sleep on long-context LLMs. They are here to stay.
52
294
1,969
169,899
📘 Probabilistic Machine Learning: An Introduction I have been looking for a book like this. Kevin Murphy published the 2021 edition of the Probabilistic Machine Learning e-textbook. Love the emphasis on probability and math. It includes code examples. probml.github.io/pml-book/bo…
18
461
1,907
Transformer Explainer Really cool interactive tool to learn about the inner workings of a Transformer model. Apparently, it runs a GPT-2 instance locally in the user's browser and allows you to experiment with your own inputs. This is a nice tool to learn more about the different components inside the Transformer and the transformations that occur. Tool: poloclub.github.io/transform…
16
459
1,921
121,920
OpenAssistant is officially released! OpenAssistant is an open-source chat model. The release includes models, datasets, and a chat interface. The dataset consists of a ~161K human-generated, human-annotated assistant-style conversation corpus, including 35 different languages and annotated with ~461K quality ratings. This dataset release is huge! There are different models available including LLaMA-based and Pythia-based ones. We have seen many chat models released in the past few weeks but this one is probably a lot more powerful in terms of conversational capabilities. Will be testing it out in the coming days. web: open-assistant.io/chat dataset: huggingface.co/datasets/Open… models: huggingface.co/OpenAssistant
31
409
1,885
436,918
🐙ML Papers Explained An awesome new project with explanations of key deep learning concepts. (by @RitvikRastogi19 on @dair_ai) github.com/dair-ai/ML-Papers…
33
463
1,826
167,017
The hype around o3 is out of control. It’s not AGI, it’s not the singularity, and you definitely don’t have to change your worldview. In fact, the public doesn’t even have access to the models so how can anyone claim any of the above. I appreciate how the OpenAI researchers presented o3. I encourage folks to checkout the original presentation on YouTube. Don’t fall for all the hype threads you see here on X. OpenAI made it clear that there lots of things to improve on. It’s exciting yes but the headlines are misleading and benchmark results don’t really say much these days. Hoping these words balance your timeline a bit. Share if you think it helps.
145
176
1,809
227,229
From word vectors to Reinforcement Learning from Human Feedback... Stanford's "Natural Language Processing with Deep Learning" course is one of the most relevant and best AI/ML courses today. It's just amazing how much knowledge and content this course pushes out every year. It's hands down one of my go-to resources to catch up on everything to do with NLP every year. It contains notes, suggested readings, slides, tips, exercises, and so on. I remember watching the 2017 lectures and just falling in love with the course. I have studied the course material since then and it has tremendously helped me to keep up with things on NLP. It can feel like an advanced course for people that are just beginning but it's still an exceptional reference to keep up with research and topics. (links in the replies)
19
345
1,839
411,951
Kimi K2 Thinking is a bigger deal than I thought! I just ran a quick eval on a deep agent I built for customer support. It's on par with GPT-5; no other LLM has reached this level of agentic, orchestration, and reasoning capabilities. Huge for agentic and reasoning tasks.
75
183
1,872
229,726
Small Language Models are the Future of Agentic AI Lots to gain from building agentic systems with small language models. Capabilities are increasing rapidly! AI devs should be exploring SLMs. Here are my notes:
50
303
1,866
268,580
My AI usage these days: - claude-3.5-sonnet for most creative and writing tasks - gemini-1.5-pro for video-related tasks - chatgpt for image analysis and web search - gpt-4o-mini and gemini-flash for agentic stuff - o1-mini for reasoning and knowledge-intensive tasks - llama-3.1 for local LLM usage - midjourney for image generation - runway for video generation - elevenlabs for speech-related stuff I'm intentionally experimenting with various models. It's often a combination of these that leads to the best performance. Where things stand, I believe it's a bad idea to overcommit to one model series. How's your usage looking?
97
178
1,819
227,116
Stanford CS234: Reinforcement Learning These lectures look like a nice introduction to reinforcement learning (RL). After the impact of RL in recent models like DeepSeek-R1 and o1, it's worth learning about RL today.
18
262
1,774
122,984
Graph neural networks (GNNs) are rapidly advancing progress in ML for complex graph data applications. Let's have a look at some resources to help you learn and keep up-to-date with GNNs ↓
19
379
1,733
Just came across this on arXiv. An awesome book on graph theory. If you are in computer science, graph theory is one of the most useful topics to study. 422 pages. Publicly available here: arxiv.org/abs/2308.04512
14
366
1,720
185,782
Mathematics is worth every minute you spend learning it.
23
212
1,688
🎉 Proud and excited to announce Galactica - a large language model for science. We trained a 120B parameter language model on a massive scientific corpus that performs different tasks such as solving math problems and summarizing academic literature.
🪐 Introducing Galactica. A large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more. Explore and get weights: galactica.org
43
310
1,697
LLMs as Optimizers This is a really neat idea. This new paper from Google DeepMind proposes an approach where the optimization problem is described in natural language. An LLM is then instructed to iteratively generate new solutions based on the defined problem and previously found solutions. It was first tested on linear regression and the traveling salesman problem. Leveraging LLMs with simple prompting match or surpass hand-designed heuristic algorithms. This shows good potential for using LLMs as optimizers. The idea is then applied to prompt optimization that aims to maximize task accuracy on different tasks like math word problem-solving. The first piece of the proposed meta-prompt takes in previously generated prompts along with corresponding training accuracies. The second piece includes the optimization problem description with samples obtained from a training set representing the task. At each optimization step, the goal is to generate new prompts that increase test accuracy based on the trajectory of previously generated prompts. The optimized prompts outperform human-designed prompts on GSM8K and Big-Bench Hard, sometimes by over 50%! For math word problem solving, one of the most effective instructions found begins with "Take a deep breath and work on this problem step-by-step". arxiv.org/abs/2309.03409
22
373
1,723
300,469
Agent Leaderboard v2 is here! > GPT-4.1 leads > Gemini-2.5-flash excels at tool selection > Kimi K2 is the top open-source model > Grok 4 falls short > Reasoning models lag behind > No single model dominates all domains More below:
53
201
1,754
274,917
GPT-4o mini is 60% cheaper than GPT-3.5 Turbo. That's insane! The model is priced at $0.15 per million input tokens and $0.60 per million output tokens (~2500 pages book). For comparison, GPT-3.5-turbo-0301 was $2.00 per 1M tokens roughly over a year ago. On blended pricing (80% input tokens and 20% output tokens) GPT-4o has reduced costs to $0.24 per 1M tokens. Based on a few tests, the model seems to be good at structuring information, long-context understanding, function calling, and has great vision capabilities. I have done an extensive overview along with some test cases of GPT-4o mini here: piped.video/FNa1-OKN3yU?si=GmLc… As stated in the announcement, the cost per token of GPT-4o mini has dropped by 99% since text-davinci-003. If that's a trend, where are we going to be in a couple of months?
10
46
262
33,500
The most effective AI Agents are built on these core ideas. It's what powers Claude Code. It's referred to as the Claude Agent SDK Loop, which is an agent framework to build all kinds of AI agents. (bookmark it) The loop involves three steps: Gathering Context: Use subagents (parallelize them for task efficiency when possible), compact/maintain context, and leverage agentic/semantic search for retrieving relevant context for the AI agent. Hybrid search approaches work really well for domains like agentic coding. Taking Action: Leverage tools, prebuilt MCP servers, bash/scripts (Skills have made it a lot easier), and generate code to take action and retrieve important feedback/context for the AI agent. Turns out you can also enhance MCP and token usage through code execution and routing, similar to how LLM routing increases efficiency in AI Agents. Verifying Output: You can define rules to verify outputs, enable visual feedback (this becomes increasingly important in multimodal problems), and consider LLM-as-a-Judge to verify quality based on fuzzy rules. Some problems will require visual cues and other forms of input to perform well. Don't overcomplicate the workflow (eg, use computer-using agents when a simple Skill with clever scripts will do). This is a clean, flexible, and solid framework for how to build and work with AI agents in all kinds of domains.
44
253
1,823
176,069
🎓 Mathematics for Deep Learning This reference contains some mathematical concepts to help build a better understanding of deep learning: d2l.ai/chapter_appendix-math…
8
396
1,647
🎓Generative AI for Beginners Another great effort by Microsoft on AI education. This one contains a series of lessons on generative AI, including an introduction to LLMs, prompt engineering fundamentals, building text generation/chat applications, and more. github.com/microsoft/generat…
17
362
1,675
284,205
Generative AI Learning Path This is a great new learning resource on Generative AI by Google! Accessible for FREE! cloudskillsboost.google/path…
9
398
1,660
238,027
🎓 The Art of Linear Algebra An impressive set of graphic notes on the popular book "Linear Algebra for Everyone". A great resource for what's an important subject in machine learning and computer science in general. repo: github.com/kenjihiranabe/The… pdf: github.com/kenjihiranabe/The…
5
367
1,655
JUST IN: Meta AI introduces LLaMA, a 65B parameter LLM. LLaMa only relies on publicly available data and outperforms GPT-3 on most benchmarks despite being 10x smaller.
27
312
1,620
259,687
YouTube is easily becoming one of the best free "universities" for all things machine learning, engineering, math, and science.
49
178
1,575
MegaParse is an open-source tool for parsing various types of documents for LLM ingestion. Supports text, PDF, PowerPoint, excel, csv, and Word documents. It can convert these into a format ideal for LLMs. It can parse content of different types such as tables, TOC, headers, footers, images, etc. I am also building a similar tool and I think the most important feature at the moment is the ability to customize the format of the transformed data as different LLMs prefer different formats.
20
213
1,633
120,673
How I leverage AI today: - Claude Projects for summarization - CrewAI agents for orchestrating research agents - Flowise AI for private doc analysis (i.e., RAG) - Midjourney for image generation - NotebookLM for education-related tasks - ChatGPT Search for discovery - Grok for finding interesting AI papers - Google AI Studio for video transcription - Cursor for fast code prototyping - v0 for design work - Anthropic console prompt optimization and evaluation - Claude Artifacts for artifact generation like flow charts - ChatGPT-o1-preview/canvas for reviewing/refining writing This is a subset of my stack. When I feel performance deteriorates for any one tool or model, I switch to other alternatives. Bad idea to overcommit to one AI tool or product. I am also constantly experimenting with different models. My work varies between writing, research, coding, product, marketing, and business operations. These tools, and many others, are simplifying the way I do work. How about you?
59
190
1,620
211,878
🎓Stanford CS229: Machine Learning (Spring 2022) Really cool to see a new iteration of this course. It's a classic ML course from Stanford that has helped tons of students get started with machine learning. Covers foundational topics such as logistic regression, Naive Bayes, kernels, neural networks, bias-variance, regularization, k-means, expectation maximization, and more. YouTue Lectures: piped.video/playlist?list=PL…
6
348
1,573
190,869
Large Language Models (in 2023) An excellent summary of the research progress and developments in LLMs. I appreciate that @hwchung27 made this content publicly available. It's a great way to catch up on some important themes like scaling and optimizing LLMs. talk: piped.video/dbo3kNKPaUA?feature… slides: docs.google.com/presentation…
7
382
1,575
229,503
Stanford CS25 - Transformers United So much fun catching up with these Transformer lectures. There is a lot of content I'm already familiar with but I always love reviewing stuff to build on my understanding of complex concepts and learn new ones along the way. I find that in the field of LLMs, there are many different perspectives and interpretations so it's good to keep an open mind to different takes and explanations. This approach helps strengthen my intuition about LLMs. Pair it with a few coding sessions along the way and it's well worth every minute. At least that is how I've always made good use of these lectures. All the latest lectures are highly recommended.
10
313
1,589
139,162
Llama 3 From Scratch This project is really cool! It implements Llama 3 from scratch. The whole thing is explained, step-by-step, in the readme. I like the way it's broken down as it also serves as a good way to study the main components of an LLM. github.com/naklecha/llama3-f…
9
366
1,604
107,400
PydanticAI A new Python-based agent framework to build production-grade LLM-powered applications. - Built by the team behind Pydantic - Model-agnostic - Type-sage - Structured response validation with Pydantic - Streamed responses (including validation) with Pydantic - Tools for testing and eval-driven iterative development - Logfire integration for debugging and monitoring
11
244
1,585
122,929
Stanford CS330: Deep Multi-Task and Meta Learning Great new lectures to catch up on advanced topics in deep learning like meta learning and multi-task learning. Also includes other topics like generative models and few-shot learning. piped.video/playlist?list=PL…
18
328
1,545
173,364
GPT in 60 Lines of NumPy This looks like another fun tutorial on how to implement GPT from scratch with NumPy.
4
256
1,564
151,730
🎓 Probabilistic Machine Learning: Advanced Topics Got a chance to briefly check out the new ML book by @sirbayes. It's genuinely a one-of-a-kind resource for students looking to be well-versed in ML. 👏 probml.github.io/pml-book/bo…
2
304
1,533
ChatDoctor: A medical chat model fine-tuned on LLaMA using medical domain knowledge. Collects data on around 700 diseases and generated 5K doctor-patient conversations to finetune the LLM. paper: arxiv.org/abs/2303.14070 code: github.com/Kent0n-Li/ChatDoc…
26
374
1,559
229,905
Another impressive paper by Google DeepMind. It takes a closer look at the limits of embedding-based retrieval. If you work with vector embeddings, bookmark this one. Let's break down the technical details:
36
222
1,554
206,087
RAG vs. Fine-Tuning Cool report discussing the tradeoff between RAG and fine-tuning when using LLMs like Llama 2 and GPT-4. It performs a detailed analysis and highlights insights when applying the pipelines on an agricultural dataset. Here is a figure showing the pipeline used in this study: Here is a summary of the comparison between RAG and fine-tuning results: Findings: The authors observe that there is an "accuracy increase of over 6 p.p. when fine-tuning the model and this is cumulative with RAG, which increases accuracy by 5 p.p. further." They also "demonstrate that the fine-tuned model leverages information from across geographies to answer specific questions, increasing answer similarity from 47% to 72%." RAG is effective where data is contextually relevant such as interpretation of farm data. However, it might significantly increase the prompt size and become harder to steer. Fine-tuning, on the other hand, could be tuned for brevity and can incur less cost (i.e., necessitates minimal input token size) when dealing with large datasets. The challenge is the initial cost and effort required to fine-tune models on new data. Overall, the suitability of each approach depends on the specific application, the nature and size of the data, and available resources for model development. As suggested by many other reports, there is also the possibility of combining the two approaches. I also agree with the authors that it would be interesting to combine structured information from PDFs with images and captions to enable multi-modal fine-tuning opportunities.
16
340
1,531
177,575
A Survey of Context Engineering 160+ pages covering the most important research around context engineering for LLMs. This is a must-read! Here are my notes:
22
315
1,572
203,768
How do you build effective AI Agents? This is a problem I think deeply about with other AI devs and students. Simplicity works well here. I think we can all learn a lot from how Claude Code works. The Claude Agent SDK Loop generalizes the approach to build all kinds of AI agents. I wrote a few notes from Anthropic's recent guide. The loop involves three steps: Gathering Context: Use subagents (parallelize them for task efficiency), compact/maintain context, and leverage agentic/semantic search for retrieving relevant context for the AI agent. Taking Action: Leverage tools, prebuilt MCP servers, bash/scripts, and generate code to take action and retrieve important feedback/context for the AI agent. Verifying Output: You can define rules to verify outputs, enable visual feedback (this becomes increasingly important in multimodal problems), and consider LLM-as-a-Judge to verify quality based on fuzzy rules. I believe this is a really clean and solid framework for how to build and work with AI agents in all kinds of domains.
38
213
1,587
143,945
A Survey of LLMs A new 50 pages survey on large language models just dropped on arXiv. arxiv.org/abs/2303.18223
16
356
1,521
207,848