R1 Deep Researcher Fully local research assistant w @deepseek_ai R1 + @ollama. Give R1 a topic and watch it search web, learn, reflect, search more, repeat as long as you want. Gives you a report w/ sources at end. All open source ..
77
619
5,228
624,446
I built an app that uses ChatGPT for question-answering over all 365 episodes of the @lexfridman podcast. Uses @OpenAI Whisper model for audio-to-text and @langchain. All code is open source (linked below). App: lex-gpt.fly.dev/
68
271
2,066
911,913
o3-mini researcher Give it a topic, use o3-mini for report planning w/ human feedback, then parallelize all research/writing when plan is accepted. All open source (code below)
14
102
1,320
189,918
RAG From Scratch Here's a set of short (5-10 min videos) and notebooks explaining > a dozen of my favorite RAG papers. Took a stab at implementing each idea myself (all code open source) and grouped according to the diagram. Repo: github.com/langchain-ai/rag-… Video playlist: piped.video/playlist?list=PL… Some highlights: Is RAG Really Dead? How RAG might change with long context LLMs. Video: piped.video/watch?v=SsHUNfhF… Adaptive-RAG Dynamically route queries based on complexity to different RAG approaches. Implemented in LangGraph w/ @cohere cmd-R. Video: piped.video/04ighIjMcAI Code: github.com/langchain-ai/lang… Paper (@SoyeongJeong97 et al): arxiv.org/abs/2403.14403 Corrective-RAG Self-correct retrieval errors in-the-loop unit tests for doc relevance and fallback to web-search. I implemented in LangGraph w/ @MistralAI-7b + @ollama for running locally. Video: piped.video/watch?v=E2shqsYw… Code: github.com/langchain-ai/lang… Paper (@Jiachen_Gu et al): arxiv.org/pdf/2401.15884.pdf Self-RAG Self-correct RAG errors with in-the-loop unit tests for doc relevance, answer hallucinations, and answer quality. Implemented in LangGraph w/ @MistralAI-7b + @ollama for running locally. Code: github.com/langchain-ai/lang… Code (local): github.com/langchain-ai/lang… Paper (@AkariAsai et al): arxiv.org/abs/2310.11511.pdf Query Routing Various approaches for directing questions to the correct datasource (e.g., logical, semantic, etc). Video: piped.video/pfpIndq7Fi8 Code: github.com/langchain-ai/rag-… Query Structuring Use an LLM to convert from natural language-to-<DSL> where DSL is a domain specific language required to interact with a given database (SQL, Cypher, etc). Video: piped.video/kl6NwWYxvbM Code: github.com/langchain-ai/rag-… Blog: blog.langchain.dev/query-con… 2/ Deep dive on graphDBs (c/o @neo4j): blog.langchain.dev/enhancing… 3/ Query structuring docs: python.langchain.com/docs/us… 4/ Self-query retriever docs: python.langchain.com/docs/mo… Multi-Representation Indexing Use an LLM to produce document summaries ("propositions") that are optimized for retrieval. Embed these summaries for similarity search, but return full documents to the LLM for generation. Video: piped.video/gTCU9I6QqCE Code: github.com/langchain-ai/rag-… Paper (@tomchen0 et al): arxiv.org/pdf/2312.06648.pdf RAPTOR Cluster docs in the corpus and summarize similar ones recursively. Index them all together, resulting in lower-level docs and summaries that can be retrieved to answer questions that span detailed-to-higher level. Video: piped.video/z_6EeA2LDSw Code: github.com/langchain-ai/lang… Paper (@parthsarthi03 et al): arxiv.org/pdf/2401.18059.pdf ColBERT Improve embedding granularity w/ a contextually influenced embedding for each token in the document and query. Video: piped.video/cN6S0Ehm7_8 Code: github.com/langchain-ai/rag-… Paper (@lateinteraction & @matei_zaharia): arxiv.org/abs/2004.12832 Multi-Query Re-write the user question from multiple perspectives, retrieve documents for each re-written question, return the unique documents for all queries. Video: piped.video/watch?v=JChPi0CR… Code: github.com/langchain-ai/rag-… Paper: arxiv.org/pdf/2305.14283.pdf RAG-Fusion Re-write the user question from multiple perspectives, retrieve documents for each re-written question, and combine the ranks of multiple search result lists to produce a single, unified ranking w/ Reciprocal Rank Fusion (RRF). Video: piped.video/watch?v=77qELPbN… Code: github.com/langchain-ai/rag-… Repo (@Raudaschl): github.com/Raudaschl/rag-fus… Decomposition Decompose a question into a set of sub-problems / questions, which can either be solved sequentially (use the answer from first + retrieval to answer the second) or in parallel (consolidate each answer into final answer). Various works such as Least-to-Most prompting (@denny_zhou et al) and IR-CoT present ideas that be utilized. Video: piped.video/watch?v=h0OPWlEO… Code: github.com/langchain-ai/rag-… Papers: arxiv.org/pdf/2205.10625.pdf arxiv.org/pdf/2212.10509.pdf Step-back prompting First prompt the LLM to ask a generic step-back question about higher-level concepts or principles, and retrieve relevant facts about them. Use this grounding to help answer the user question. Video: piped.video/watch?v=xn1jEjRy… Code: github.com/langchain-ai/rag-… Paper (@denny_zhou + colleges): arxiv.org/pdf/2310.06117.pdf HyDE LLM to convert questions into hypothetical documents that answer the question. Use the embedded hypothetical documents to retrieve real documents with the premise that doc-doc similarity search can produce more relevant matches. Video: piped.video/watch?v=SaDzIVkY… Code: github.com/langchain-ai/rag-… Paper: arxiv.org/abs/2212.10496
26
261
1,084
118,497
I'm open-sourcing a tool I use to auto-evaluate LLM Q+A chains: given inputs docs, app will use an LLM to auto-generate a Q+A eval set, run on a user-selected chain (model, retriever, etc) built w/ @langchain, use an LLM to grade, and store each expt. github.com/PineappleExpress8…
33
161
967
386,924
Finally got GPT4 API access, so built an app to test it: here's Q+A assistant for all 121 episodes of the @theallinpod. You can ask any question abt the shows. It uses @OpenAI whisper model for audio -> text, @pinecone, @langchain. App is here: besties-gpt.fly.dev/
49
91
793
374,258
Fully local / open source deep researcher Works w/ any local model hosted via @ollama + @lmstudio. Uses diff tools (@perplexity_ai, @tavilyai, SearXNG, DDG). MCP for local files coming soon. Code: github.com/langchain-ai/loca…
25
111
741
60,715
Agents from scratch This repo covers the basics of building agents: + Fundamentals + Build an agent + Agent eval + Agent w/ human-in-the-loop + Agent w/ long-term memory Builds to a deployable agent to run your email Code (all open source): github.com/langchain-ai/agen…
9
103
694
52,983
Context Engineering @dbreunig and I did a meetup on context engineering last night. Wanted to share slides (below) + a recap of some themes / discussion points. 1/ Context grows w/ agents. @manusai mentions typical task requires ~50 tool calls. manus.im/blog/Context-Engine… 2/ Performance drops as context grows. @kellyhongsn + @trychroma showed this very nicely. research.trychroma.com/conte… 3/ @dbreunig highlights that new buzzwords ("context eng") identify common experiences. Many of us built agents this year and had challenges wrt managing context. @karpathy distilled this well back in May. nitter.app/karpathy/status/193790… 4/ Many are sharing their experiences in blogs, etc but no common philosophy yet. "Pre-HTML era". Still, some common themes are emerging. 6/ Offload context. Use file system to offload context. @manusai writes todo.md at the start of a task and re-writes it during the task. They found that recitation of agent objective is helpful. Anthropic multi-agent writes research plan to file so it can be retrieved as needed and preserved. Manus offloads tok heavy tool observations. anthropic.com/engineering/bu… 7/ Reduce context. Summarize / prune messages / tool observations. Seen across many examples. Anthropic multi-agent summarizes the work of each sub agent. We use it w/ open deep research to prune tool feedback. github.com/langchain-ai/open… 8/ Retrieve context. RAG has been a major theme w/ LLM apps for several years. @_mohansolo (Windsurf) and Cursor team have shared interesting insights on what it takes to perform RAG w/ prod code agents. On Lex pod, @mntruell (Cursor) + team talk about Preempt to assemble retrievals into prompts. Clearly have been doing "context eng" since well before the term. nitter.app/_mohansolo/status/1899… lexfridman.com/cursor-team-t… 9/ Isolate context. A lot of interest in using multi-agent systems to isolate context. @barry_zyj + co (Anthropic) argue benefits, @walden_yan argues risks (it is hard to coordinate). Need to be careful, but benefit in cases where independent decisions made by each sub-agent won't case conflicts. cognition.ai/blog/dont-build… 10/ Cache context. @manusai mentions caching agent message history (system prompt, tool desc, past messages). Big cost / latency saving, but still does not get around long-context problems. Still very early in all of this ..
Replying to @_mohansolo
But embedding search becomes unreliable as a retrieval heuristic as the size of the codebase grows. Instead, we must rely on a combination of techniques like grep/file search, knowledge graph based retrieval, and more. With all these heuristics, a re-ranking step also becomes needed where the retrieved context is ranked in order of relevance. We use LLM based reranking under the hood.
14
119
697
60,173
Deconstructing RAG It can be hard to follow all of the RAG strategies that have come out over the past months. I created a few guides to organize them into major themes and show how to build multi-modal / semi-structured RAG on complex docs (w/ images, tables). Here's a few of the major themes: 1. Query Transformations - User questions may not be well-posed / -worded for retrieval. There's a host of methods that re-write and / or expand (fan-out into multiple sub-questions) questions that maximize the chance of retrieving relevant documents. See blog: blog.langchain.dev/query-tra… 2. Routing - Queries may need to be routed to different data sources depending on what is being asked. Recent blog reviewing OpenAI's RAG strategies provides some guidance on question routing: blog.langchain.dev/applying-… 3. Query Construction - To access structured data, natural language needs to be converted into specific a query syntax. Various approaches can access data in SQL, SQL w/ semantic columns (pgvector), graph DBs, vectorDB w/ metadata filters, etc. See blog: blog.langchain.dev/query-con… 4. Index Building - One of the most useful tricks I've been using is multi-representation indexing: decouple what you index for retrieval (e.g., table or image summary) from what you pass to the llm for answer synthesis (e.g., the raw image, a table). See blog: blog.langchain.dev/semi-stru… 4a. Multi-Modal - This cookbook show how I used this approach for RAG on a substack (@jaminball's Clouded Judgement) that has many images of densely packed tables, graphs: github.com/langchain-ai/lang… 4b. Semi-Structured - This cookbook show how I used this for RAG on a docs (papers) with tables, which can be split using naive RAG text-splitting (that does not explicitly preserve them): github.com/langchain-ai/lang… 5. Post-processing - Given retrieved documents, there are various way to rank / filter them. Recent blog reviewing OpenAI's RAG strategies provides a few ideas on applying post-processing: blog.langchain.dev/applying-…
8
119
656
131,043
Here's a simple (< 100 lines of code) app to run #ChatGPT question-answering on any uploaded document (using @langchain DBQA w/ ChatGPT API): pineappleexpress808-doc-gpt-…
16
92
558
57,860
Document splitting is common for vector storage / retrieval, but useful context can be lost. @langchain has 3 new "context-aware" text splitters that keep metadata about where each split came from. Works for code (py, js) c/o @cristobal_dev, PDFs c/o @CorranMac, and Markdown ..
18
111
552
124,167
open-deep-research is the best performing fully open source deep research agent on DeepResearchBench (100 PhD-level research tasks across 22 distinct fields). leaderboard: huggingface.co/spaces/Ayanam… code: github.com/langchain-ai/open…
5
88
562
36,206
Building Agents: Free Course We just released a course with > 20 videos & notebooks focused on building agents. All code is open-source and the course is free! Context Back in June, I gave at talk at @aiDotEngineer on building agents with LangGraph. I got ~2 hrs of questions. We took these questions along with lots of feedback we've heard from users and built a course! Module 1: Foundations The first module includes several notebooks & videos that focus on what is an agent explained in simple terms, how to build various types of agents (routers, ReAct, etc), how to debug them w LangGraph Studio, and how to deploy them w LangGraph Cloud. Module 2: Memory One of the biggest questions we've heard is how to build long-running agents, which can remember important details. We show how memory works with LangGraph, and how to use various databases (SQLite, Postgres) to serve as agent memory. Module 3: Human-In-The-Loop Another central question with agents is allowing humans to approve actions (tools use) or modify the agent state (add feedback). We show various human in the loop interaction patterns that are supported in LangGraph, and also show how to stream the graph state during agent execution for human review. Module 4: Controllability The final module focuses on various design patterns for agent control flow, including parallelization of tasks and creating multi-agent teams with their own tasks / internal memory. This builds up into a customizable multi agent system for research that pulls together themes from the entire course. Course (links to code, all videos): academy.langchain.com/course…
4
95
523
39,199
I added the @sama episode to Lex-GPT (a Q+A assistant w/ ChatGPT over all 367 episodes of the @lexfridman podcast). It uses @OpenAI whisper for audio -> text, @pinecone for text embeddings, and @langchain. App here: lex-gpt.fly.dev/
10
70
483
126,212
Building Agents w/ Memory: Free Course If you're interested in agents, have a look at this course. > 25 videos & notebooks (free + open source)! Our newest module builds an agent (task_mAIstro) that uses long-term memory to track + manage your ToDos. --- Context Back in June, I gave at talk at @aiDotEngineer on building agents with LangGraph. I got ~2 hrs of questions. We took these questions along with lots of feedback we've heard from users and built a course! Module 1: Foundations The first module includes several notebooks & videos that focus on what is an agent explained in simple terms, how to build various types of agents (routers, ReAct, etc), how to debug them w LangGraph Studio, and how to deploy them w LangGraph Cloud. Module 2: Short-Term Memory One of the biggest questions we've heard is how to persist chat history, allowing the agent to remember important details. We show how memory works with LangGraph, and how to use various databases (SQLite, Postgres) to serve as agent memory. Module 3: Human-In-The-Loop Another central question with agents is allowing humans to approve actions (tools use) or modify the agent state (add feedback). We show various human in the loop interaction patterns that are supported in LangGraph, and also show how to stream the graph state during agent execution for human review. Module 4: Controllability We've seen that multi-agent teams are important to parallelize tasks or collaborate. We show how to build a multi-agent team for web research automation. Module 5: Long-Term Memory Agents that remember things (e.g., user preferences, etc) across chat sessions / interactions are useful for personalization. We show how to build task_mAIstro, an agent for ToDo list management that uses long-term memory to manage your ToDos. Course: academy.langchain.com/course… Code: github.com/langchain-ai/lang…
5
71
472
46,228
My RAG From Scratch tutorial is live on @freeCodeCamp -- covers over a dozen of my favorite papers on RAG w/ accompanying code notebooks (all open source). Thanks @beaucarnes! Video: piped.video/watch?v=sVcwVQRH…
8
90
454
38,619
MCP in ~2 min In ~2 min I try to explain what it is, build a MCP server from scratch, connect it to @windsurf_ai, @AnthropicAI, @cursor_ai desktop app, show it working. All code and longer vid below ...
10
72
387
40,461
Lived to see the day: GPT4-level LLM runs on my Mac (~9 tok / sec, Mac M2 max 32 gb + ollama.ai).
Phind finds fine-tuned CodeLlama-34B beats GPT-4. phind.com/blog/code-llama-be…
13
70
437
262,474
I wrote about some popular patterns for managing context ("context engineering") w/ AI agents: rlancemartin.github.io/2025/…
5
60
447
48,564
GPT-3.5 and LLaMA2 fine-tuning guides 🪄 Considering LLM fine-tuning? Here's two new CoLab guides for fine-tuning GPT-3.5 & LLaMA2 on your data using LangSmith for dataset management and eval. We also share our lessons learned in a blog post here: blog.langchain.dev/using-lan…
7
74
423
126,449
Retrieval for QA systems is hard. I'm open sourcing a tool I've been using to easily evaluate custom and/or advanced retrievers (e.g., SelfQueryRetriever). It runs locally as a lightweight app using @langchain. Here are some things I've used it for ... github.com/langchain-ai/auto…
7
60
373
116,347
Evaluation of LLM question+answering chains can be challenging: here's @huggingface space to automate this. Upload doc(s) and select a QA chain configuration you want to test. The app builds the chain (w/ @langchain), grades it, and logs results for you. huggingface.co/spaces/rlance…
8
79
376
114,386
Building Async ("Ambient") Agents Happy to share new, free course on building "ambient" agents! This is one of the most interesting agent UX patterns (e.g., Devin, Codex), allowing the agent to do work "in the background" and interact with the user via human-in-the-loop for select actions / approvals. Course builds towards a concrete application -- an assistant that can autonomously run your gmail -- in a few steps, but the principles can be applied to other types of "ambient" agents beyond email. Course starts with basics of building agents, setting up a simple router + email response agent - github.com/langchain-ai/agen… Then moves to fundamentals of agent evaluation, using llm-as-judge as well as heuristic evals - github.com/langchain-ai/agen… Then it adds human in the loop for approval of specific tool calls (e.g., actually sending the email) - github.com/langchain-ai/agen… Finally, it adds simple memory to remember the human-in-the-loop feedback - github.com/langchain-ai/agen… At the end, it show how to deploy the agent and connects to actual gmail tools. I've been using this to run my email for a few months. You can find course link w/ all videos here. Many thanks to @labdmitriy to helpful feedback + review! academy.langchain.com/course…
6
58
377
42,194
To explore @langchain as a LLM programming framework, I wrote a simple app (~100 lines of code) to summarize papers. I've wanted this for a while given the rapid pace of progress / publication in AI. lancemartin.notion.site/Lang…
9
57
332
49,096
I've seen questions about @AnthropicAI's 100k context window: can it compete w/ vectorDB retrieval? We added Claude-100k to the @langchain auto-evaluator app so you can compare for yourself (details showing Claude-100k results below). App is here: autoevaluator.langchain.com/…
10
51
326
114,439
Awesome to see @vercel edge functions now working w/ @langchain! This enables Langchain streaming on Vercel. Here's an example free-to-use / open-source lex-gpt app example on Vercel. Great work @nfcampos. lex-gpt.vercel.app/
12
35
301
97,297
Nice RAG trick for diverse content types (images / tables): generate + embed a text summary (for natural language search), but return full doc for LLM synthesis. Short write-up w/ 3 cookbooks below showing semi-structured and multi-modal RAG using this idea with the multi-vector retriever. Table summaries work nicely w/ the multi-vector retriever for semi-structured RAG. And I use LLaVA-7b (c/o @imhaotian) to generate image summaries. Also include a cookbook showing this full pipeline running private / local on my laptop w/ llama.cpp c/o @ggerganov, @ollama_ai, @nomic_ai embeddings, and @trychroma. Write-up: blog.langchain.dev/semi-stru…
3
59
293
62,786
Check out these new guides for 13 popular LLM use-cases. Part of a major community effort to improve the @langchain docs + add CoLabs prototyping. 1/13: Open source LLMs How to use many open source LLMs on your device python.langchain.com/docs/gu…
10
56
292
71,259
Using LLMs to summarize large datasets can be hard! @langchain x @mendableai partnered to analyze user questions on our documentation. We're open sourcing notebooks showing 2 approaches that use both @AnthropicAI's new Claude-2 and @OpenAI .. blog.langchain.dev/llms-to-i…
7
64
273
75,763
VectorDB doc retrieval can vary w/ minor changes to the user input. @langchain just added MultiQueryRetriever to help w/ this: pass input to an LLM that generates similar queries w/ slightly diff keywords or phrases, retrieve docs across all queries, keep the unique ones ...
7
35
273
59,576
Multi-modal LLMs unlock RAG on images. Local RAG stack (M2 max 32gb) w/ OSS models: 1/ @UnstructuredIO: doc -> img, txt, tables 2/ LLaVA-7b: img -> txt summaries 3/ @nomic_ai: embd 4/ @trychroma: store 5/ ollama.ai LLaMA-13b Cookbook: github.com/langchain-ai/lang…
3
64
273
65,505
Self-Improving LLM Evaluators One of the major themes I heard from @aiDotEngineer last week was: how to test LLM apps? @HamelHusain gave a great talk on this w/ 3 types of testing: (1) Simple assertions - first, try to hard-code simple rules or assertions (e.g., does the LLM app output follow the expected schema). (2) Human review - but, some things can't be captured w/ simple hard-coded rules (e.g., style or accuracy of my LLM app outputs). you always need to look at your data 🗣️! (3) LLM-as-judge - human review is critical, but doesn't scale. encode rules from your human review into a prompt and have an LLM automate your process of human review / scoring. The challenge w/ LLM-as-judge is that you need to tune a prompt that encodes your scoring criteria. This is often hard. @sh_reya put out a fantastic blog on data flywheels, which discusses a way to tackle this. Use a process where you (1) review the LLM-as-judge, (2) correct it, and (3) pass those human corrections back to the evaluator as few-shot examples. I spent some time working on this w/ LangSmith and this process whenever I want to apply an LLM-as-judge. It's a really useful approach / worth a look. @sh_reya's write-up: sh-reya.com/blog/ai-engineer… @HamelHusain's write-up: hamel.dev/blog/posts/evals/ Self-Improving LLM evaluators: nitter.app/LangChainAI/status/180… Video explainer for more detail: piped.video/watch?v=fmL6cB5Q…
🧑‍⚖️Self-improving evaluators in LangSmith One method for evaluating LLM systems is to use another LLM "as a judge". These 'LLM-as-a-Judges' can review raw text, using a prompt to guide the grader and automate human review. However, these "LLM-as-a-Judge" systems require constant prompt engineering to align with human preferences. In LangSmith, you can now use "LLM-as-a-Judge" evaluators with a self-improving feedback loop: + Allow a human to easily correct 'LLM-as-a-Judge' + And easily pass these back to the 'LLM-as-a-Judge' as few shot examples In part 1 last week, we showed how to apply self-improving evaluators to any LangSmith project: + The evaluator is applied to all traces in your project automatically and can run on production logs + It's easy to review, correct, and pass back correction to improve the evaluator Here in part 2, we show how to pin self-improving evaluators to any LangSmith dataset: + The evaluator is applied on every experiment run on your dataset In both cases, the evaluator can be self-improved with human feedback! 🎥 Video: piped.video/fmL6cB5Q5M0 📓 Docs: docs.smith.langchain.com/how… 🛞 Data flywheel resource: sh-reya.com/blog/ai-engineer… ✍️ Blog: blog.langchain.dev/aligning-…
5
57
273
29,213
A few highlights from the latest @langchain release (v0.0.203): context-aware text splitting 🪄. Splits a file into chunks, but keeps metadata about where each chunk came from. Works w/ SelfQueryRetriever to chat w/ specific sections of a doc ... github.com/hwchase17/langcha…
8
45
252
112,074
Did you ever want to extract knowledge graphs using LLM function calling? No? Well, here's a @streamlit app where you can play around with various inputs. E.g., feed it the Barbie plot, gpt-3.5 w/ function calling extracts graph triples. Give it a try: auto-graph.streamlit.app/
16
41
251
94,638
Fully local RAG agent with Llama3-8b Threw a few things at Llama3-8b on my first test drive (routing, fallback to web search, retrieval / answer grading, RAG). Seems very strong! Short vid building this flow from scratch / initial impressions: piped.video/-ROS6gfYIts?feature…
9
46
248
21,235
YouTube is a great source of content for LLM chat / Q+A apps. I recently added a @langchain document loader to simplify this: pass in YouTube video urls, get back text documents that can be easily embedded for retrieval QA or chat (see below)🪄 github.com/hwchase17/langcha…
10
38
249
54,721
LLM Use Case: Summarization 📚🧠 We've kicked off a community driven effort to improve @langchain docs, starting w/ popular use cases. Here is the new use case doc on Summarization w/ @GoogleColab notebook for easy testing ... python.langchain.com/docs/us…
4
42
242
29,961
The @langchain team / community have heard the recent feedback on documentation loud-and-clear! We've been working very hard to improve it. Yesterday we added an update to "QA and Chat on Documents", a popular use-case, which I'll break down below ... python.langchain.com/docs/us…
10
50
239
41,262
Private Chat / QA over docs at ~25 tokens / s with 13b Llama-v2 (on Mac M2 max gpu). Using @trychroma vectorDB, @nomic_ai GPT4all embeddings, LLama-v2 Full recipe added to @langchain docs: python.langchain.com/docs/us…
9
50
239
67,440
I just added @nomic_ai new GPT4All Embeddings to @langchain. Here's a new doc on running local / private retrieval QA (e.g., on your laptop) w/ GPT4All embeddings + @trychroma + GPT4All LLM. Easy setup, great work from @nomic_ai ... python.langchain.com/docs/us…
6
60
235
64,731
Now that Generative Agents is open source, I hooked it up to Llama2-13b. Runs locally ~25-50 tok / s on Mac M2 max (w/ Llama.cpp or Ollama.ai). Saves $ for long sims. Still hacking on it, but draft PR w/ instructions for anyone interested: github.com/joonspk-research/…
7
31
232
47,333
Some personal news: I’m super excited to be officially joining the @langchain 🦜🔗 team!
16
4
208
30,237
One of the most interesting apps of GPT-4V is retrieval / RAG on documents w/ text + images (tech manuals, finance docs, textbooks, etc). I've been testing a few options. Here is one w/o multimodal embd. Evals + other approaches coming soon. Cookbook: github.com/langchain-ai/lang…
3
27
210
24,852
Great to see folks at the @ollama meetup last night! Gave a lightning talk on a theme we've seen: oss / local LLMs for narrow tasks w/in the RAG stack. (1) Query transformations Local / oss LLM can be useful for tasks like query re-writing or decomposition that require reasoning abt a query. Esp interesting for small models (phi-2, etc). See template for one example of query re-writing: github.com/langchain-ai/lang… (2) Routing @atroyn mentioned in a talk that routing w/ local / oss LLMs likely to get wrapped in w/ Chroma to route btwn sqlite vs chroma (jointly query on relational and semantic data). Makes a lot of sense. piped.video/watch?v=fDmQnB8G… (3) Query construction Text-to-X (SQL, Cypher, Metadata) make a lot of sense for local / oss, esp in cases where the DB is private. Template for text-to-SQL example: github.com/langchain-ai/lang… (4) Indexing Tasks related to doc summarization or captioning in the indexing process really good for local / oss to avoid high cost in indexing large corpus. Esp, I like potential for this in multi-modal. Newer LLaVA models (c/o @imhaotian) supporting better OCR (IIRC a recent talk mentioned this is coming) would be great here. Template: github.com/langchain-ai/lang… Talk: piped.video/watch?v=k7i2BpeL… (5) Post processing @jerryjliu0 had a nice talk on using local oss LLMs (Mistral) for post-processing (RankGPT). Cool idea, another area that makes a lot of sense. nitter.app/jerryjliu0/status/1657… Across all of these steps in the flow, @ollama JSON mode can be useful for output parsing. @Hacubu has done some nice working benchmarking this for an eval dataset of email spam. Promising results (e.g., Mixtral 8x-7b can beat GPT3.5 w/ fxn calling). nitter.app/Hacubu/status/17262772… Slides: docs.google.com/presentation…
Data extraction is a huge use case for LLMs. @Ollama_ai's new JSON mode made me curious how local OSS models might do compared to OpenAI. I found a recently released 7B model, OpenOrca, was almost as good as 3.5-turbo despite not having native functions support! Check out the dataset (publicly available below) + evals in @langchain LangSmith: smith.langchain.com/public/3… The first (and most difficult) step was gathering a good dataset. No artistry here - I plumbed the depths of my spam filter for raw material, cleaned/deduped, and used a @langchain extraction chain with GPT-4 to extract fields like sender, phone #, and action items. I then went through the runs by hand with LangSmith’s annotation queue double-checking for correctness. Using LangSmith’s `run_on_dataset` feature, I evaluated various OSS models such as Llama 2, Mistral, and Zephyr locally through @Ollama_ai using their newly added JSON mode + a passed schema against my created dataset. I also tried OpenAI and Anthropic models as a baseline. I used GPT-4 to evaluate each run and score it. GPT-4 did the best by a significant margin, followed by Claude 2 and 3.5-turbo. However, a not-so-distant 4th was OpenOrca! Stock Llama 2 did poorly (which fits previously established benchmarks around coding tasks). Hardware limited me to small 7B models, but my assumption is that larger OSS models would do even better! I also don't think my prompting was optimal by any means, and that there are likely still performance gains there. I had a lot of fun with this - it combines two of my favorite topics in LLMs: local models and structured output. And if you’d like to replicate this experiment yourself, check out the below repo for some of the scripts: github.com/jacoblee93/oss-mo… You can try OpenOrca through Ollama here: ollama.ai/library/mistral-op…
4
44
205
33,770
Multi-modal LLMs unlock new opps for RAG apps. Ideas+cookbooks (w/ LLaVA-7b as a demo) below: 1/ Pre-process images to text Multi-modal LLM converts images to text, embed + retrieve img summaries as txt chunks like std RAG. 2/ Retrieve images Multi-modal LLM creates img summaries (same as 1), but retrieve raw images (multi-vector retriever allows this). Retrieve img+txt for multi-modal LLM in RAG. Cookbook w/ LLaVA-7b + GPT4 (can be easily adapted for future GPT4-V API :) - github.com/langchain-ai/lang… Cookbook w/ LLaVA-7b + LLaMA2-13 (cookbook runs locally on my Mac M2 w/ llama.cpp + @Ollama_ai) - github.com/langchain-ai/lang… Curious to see how these ideas evolve and if others have run experiments ...
1
39
194
40,423
R1 Deep Researcher x Perplexity Give @deepseek_ai R1 a topic. It searches @perplexity_ai for you, learns, reflects, searches again to learn more, as long as you want. Gives you a report at the end. All open source + runs locally w distilled R1 via @ollama ..
5
19
189
10,937
Agent simulations can be expensive w/ LLM APIs. I created a fork of @joon_s_pk generative agents repo and hooked it up to llama.cpp, gpt4all, & ollama.ai to test sim w/ diff local open source models (img below): github.com/rlancemartin/gene…
Blackpill on LLms: everything is bottlenecked by costs The generative agents simulacra of human behavior cost ~$10/hr, each That’s more than most humans are paid
5
35
194
51,707
Flow engineering (c/o @karpathy, @itamar_mar) for code generation is a great idea. I built a simple version inspired by AlphaCodium. Just code import + execution checks with reflection on errors lets the LLM self-correct. Video: piped.video/MvNdgmM7uyc?si=f1cc…
Prompt engineering (or rather "Flow engineering") intensifies for code generation. Great reading and a reminder of how much alpha there is (pass@5 19% to 44%) in moving from a naive prompt:answer paradigm to a "flow" paradigm, where the answer is constructed iteratively.
3
22
191
32,470
Text-to-SQL 📒 LLMs unlock a natural language interface with structured data. Part 4 of our initiative to improve @langchain docs shows how to use LLMs to write / execute SQL queries w/ chains and agents. Thanks @manuelsoria_ for work on the docs: python.langchain.com/docs/us…
3
39
187
30,582
I added @OpenAI's model-graded QA evaluation prompt to auto-evaluator. You can select it (left) and the LLM grader will use this prompt to grade answers. Thanks to @kondrich2 and @OpenAI for open-sourcing this and helpful discussion last wk. Code: github.com/rlancemartin/auto…
4
29
191
34,376
"Do you really need an agent?" Shared some recent work at @ollama meetup using graphs to reliably express complex logical flows (self-RAG, corrective-RAG) fully local w/ @nomic_ai + @MistralAI + @ollama (w/ JSON mode). Tx @AlexReibman for video! nitter.app/AlexReibman/status/175…
Replying to @AlexReibman
2/ LangGraph Create LLM applications and agents with planned graph execution workflows @RLanceMartin @langchain
5
41
183
30,275
Web research is a great LLM use case. @hwchase17 and I are releasing a new retriever to automate web research that is simple, configurable (can run in private-mode w/ llamav2, GPT4all, etc), & observable (use LangSmith to see what it's doing). Blog: blog.langchain.dev/automatin…
8
29
188
42,343
Extraction 📚➡️🗒️ Getting structured LLM output is hard! Part 3 of our initiative to improve @langchain docs covers this w/ functions and parsers (see @GoogleColab ntbk). Thanks to @fpingham for improving the docs on this: python.langchain.com/docs/us…
13
29
184
71,221
I've been writing a lot of docs recently! ✍️🧐 Just finished an RAG / retrieval docs re-write that captures ideas from a lot of my favorite papers. Docs here: python.langchain.com/v0.2/do…
💡📚 Understanding RAG and other concepts 📚💡 Retrieval is a deep topic, and there are many strategies to improve performance. To help guide you, @RLanceMartin has completely revamped our retrieval docs! We now categorize key strategies for retrieval into seven different categories: Query Translation: Reviewing/rewriting inputs Routing: Mapping incoming queries to specific data sources Query Construction: Taking advantage of the underlying structure of a database and metadata filters Indexing: Ingest-time strategies to improve later performance Search methods: Considering techniques beyond vector similarity search Post-processing: Filtering, reranking, etc. Generation: Self-correcting and sanity checking retrieved documents We've also updated other parts of our conceptual docs to help you more deeply understand important ideas behind building with LLMs. Check it out below, and stay tuned for more! 🐍: python.langchain.com/v0.2/do… ☕: js.langchain.com/v0.2/docs/c…
4
35
180
19,495
Open Deep Research w/ Claude 3.7 Fully open source (code below) deep researcher w/ Claude 3.7 for research planning. Claude 3.7 makes a plan + accepts user feedback. Once approved, iterative research performed on the plan set by Claude.
6
21
172
12,750
Here's a Q+A assistant for the @tferriss podcast: using @OpenAI + @langchain with UI elements from @mckaywrigley's great work. ferris-gpt.fly.dev/
9
27
175
67,902
Context Engineering in Manus i had a great conversation w/ @peakji abt the design of @manusai + how these use context engineering. wrote some notes here (video link below). rlancemartin.github.io/2025/…
5
23
186
19,004
@karpathy's YouTube course is one of the best educational resources on LLMs. In this spirit, I built a Q+A assistant for the course and open soured the repo, which shows how to use @langchain to easily build and evaluate LLM apps karpathy-gpt.vercel.app/ github.com/rlancemartin/karp…
4
36
171
25,503
Gave a talk last night at Unstructured Data Meetup in SF with the uncontroversial title "Is RAG Really Dead"? A bunch of folks asked for slides, so adding below. Also giving this talk again tmrw 9a pst. Signup: lu.ma/rpw9907u Slides: docs.google.com/presentation…
7
29
172
19,361
Multi Needle In a Haystack One of the most popular benchmarks for long context LLM retrieval is @GregKamradt's Needle in A Haystack. I extended Greg's repo so that you can place many needles in the context and tested GPT-4-128k. Short video (more detail below): piped.video/UlmyyYQGhzc --- Most Needle in A Haystack analyses to date have only evaluated a single needle. But, RAG is often focused on retrieving multiple facts & reasoning abt them. To replace RAG, long context LLMs need to retrieve & reason about multiple facts in the prompt. To test this, I recently updated Greg's repo to work with multi-needle and use LangSmith for evaluation. I tested GPT-4-128k on retrieval of 1, 3, and 10 needles in a single turn across 1k to 120k context windows. I find that performance degrades: 1/ As you ask LLMs to retrieve more facts 2/ As the context window increases 3/ If facts are placed in the first half of the context 4/ When the LLM has to reason about retrieved facts All code is open source: github.com/gkamradt/LLMTest_… All runs can be seen here w/ public traces: github.com/gkamradt/LLMTest_… Write-up: blog.langchain.dev/multi-nee… Short video explainer: piped.video/UlmyyYQGhzc
7
36
165
30,761
LLM app development is rate-limited by quality evals. There's a paradox of LLM, prompt, etc choice. Recently put together a short guide on setting up custom evals. Playlist (5 min vids): piped.video/playlist?list=PL… Code: github.com/langchain-ai/lang…
LangSmith Evaluations With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've heard that it is challenging to get started. So, we are launching a series of short videos focused on explaining how to perform evaluations using LangSmith. 1. Why Evals Matter Lays out 4 general considerations for evaluation: (1) dataset, (2) evaluator, (3) task, (4) how to apply eval to improve your product (e.g., unit tests, A/B tests, etc). 📽️: piped.video/vygFgCNR7WA 📓: docs.smith.langchain.com/eva… 2. Evaluation Primitives Introduces the primary components of LangSmith evaluation, including tracing (along with metadata, feedback, tags), datasets, and evaluators. 📽️: piped.video/OuFUy45RsHU 📓: docs.smith.langchain.com/tra… 3. Dataset Creation: Manual curation Users often want to build custom eval sets (e.g., of QA pairs for RAG, or prompt-expected response pairs). This shows how to create, edit, and version your own evaluation dataset using the LangSmith SDK. 📽️: piped.video/N9hjO-Uy1Vo 📓: docs.smith.langchain.com/eva… 4. Dataset Creation: From Logs Users often want to capture user logs as good and / or challenging examples to re-test their application on. This shows how to create datasets directly from logs (e.g., user interactions with your app that are captured in LangSmith). 📽️: piped.video/hPqhQJPIVI8 📓: docs.smith.langchain.com/eva… 5. Evaluators: Pre-built Users often want to quickly get started with eval; for this you can use many of LangSmith's pre-built evaluators (e.g., that use LLM-as-a-judge) for tasks such as RAG (question answering), evaluating LLM output based upon user-supplied criteria, etc. 📽️: piped.video/y5GvqOi4bJQ 📓: docs.smith.langchain.com/eva… 6. Evaluators: Custom Users often want to define custom evals that are domain specific to a particular app. This shows how to define your own custom evaluation logic in LangSmith. 📽️: piped.video/w31v_kFvcNw 📓: docs.smith.langchain.com/eva… 7. Eval comparisons Once a user has run a few different experiments, it is common to compare results (both using metrics and with manual inspection of the examples that show the most difference). This shows how to compare results of multiple experiments in the LangSmith UI, using review of traces to inspect run outputs or the grader decisions. 📽️: piped.video/kl5U_efgK_8 📓: docs.smith.langchain.com/use… Notebook used in videos: github.com/langchain-ai/lang…
1
30
158
30,389
"RAG" isn't dead. the question is how to do retrieval. vectorstore as the "default" option may be dead. i've found using a quality llms.txt file effective + simple: rlancemartin.github.io/2025/…
RAG is dead posts are annoying as F "R" is retrieval and "AG" is the LLM. This means you think retrieval is dead. Seriously, you think retrieval is dead? Keyword search, metadata filtering (dates, users), grep, and other filtering are retrieval. Good luck without retrieval
8
23
153
22,030
Corrective/Self-Reflective RAG in LangGraph Self-reflection w/ RAG is a cool idea from a few recent papers - Self-RAG (@AkariAsai et al), CRAG, etc. I tried laying out the flows from each paper as graphs and works pretty well. Short vid w/ code links: piped.video/watch?v=pbAd8O1L…
6
32
155
14,392
Code understanding 🖥️🧠 LLMs excel at code analysis / completion (e.g., Co-Pilot, Code Interpreter, etc). Part 6 of our initiative to improve @langchain docs covers code analysis, building on contributions of @cristobal_dev + others: python.langchain.com/docs/us…
3
29
153
26,264
Balancing relevance vs diversity in LLM document retrieval is a challenge; many similar docs use up tokens w/o adding new information. @musicaoriginal2 and @GregKamradt recently introduced a new approach in @langchain that can help w/ this ...
3
23
152
47,217
A lot of ppl asked for a recording, so here's a summary of my @aiDotEngineer workshop on building + testing reliable agents. Builds a corrective RAG agent w/ 1) ReAct & 2) custom in LangGraph. Tests each one, shows trade-offs. Code in vid description. piped.video/watch?v=XiySC-d3…
3
26
154
13,825
Born too early to explore space Born too late to explore the earth Born just in time to watch Llama-2 do a rap battle btwn Stephen Colbert and John Oliver on my Macbook Worked out-of-the-box w/ @langchain llama.cpp integration and w/ LangSmith for tracing
4
21
143
52,865
Getting LLaMA to produce structured outputs (e.g., JSON) is a challenge. @evanqjones + @GrantSlatton work on grammar-based sampling is a cool approach: supply a grammar file to guide / constrain sampling. Thanks @deepsense_ai for adding to @langchain llamacpp integration ...
4
30
139
61,098
Here's a free-to-use, open-source app for evaluating LLM question-answer chains. Assemble modular LLM QA chain components w/ @langchain. Use LLMs to generate a test set and grade the chain. Built by 🛠️ - me, @sfgunslinger, @thebengoldberg Link - autoevaluator.langchain.com/
2
38
147
34,238
I've been testing a few different approaches for multi-modal RAG / QA over visual content w/ GPT-4V. I built an eval set on an investor presentation (Q3 earnings from @datadoghq) as a test case. Results / learnings: (1) Text loading: As a base-case, I loaded the slide deck w/ a PDF loader and performing text based RAG. This scores poorly (20%) on my eval set, largely b/c slide visuals encode much of the information and this is all lost if you simply load the slide text. (2) Multi-modal embeddings: I extract each slide an an image, embed w/ OpenCLIP multimodal embeddings, and store in @trychroma. The goal is to retrieve a slide relevant to each question and pass that image to GPT-4V to answer the question. Multi-modal embeddings were OK (60%), but it's worth noting that OpenCLIP has many models to choose from and test. It has a high performance ceiling as multi-modal embeddings improve. Some OpenCLIP models available: github.com/mlfoundations/ope… (3) Image summarization: I use GPT-4V to summarize each image, embed the image summary, and use it to retrieve the raw image. It has strong performance b/c GPT-4V is very good at image summarization and retrieval is done using text embedding / similarity. The raw image linked to the summary is then passed to GPT-4V to answer. The problem is that this approach has high cost from the need to pre-compute summaries, but I will test this w/ LLaVA (OSS) to defer cost. It also has higher complexity relative to 2 since raw images + summaries need to managed. Here is a video I did w/ @mayowaoshin on this approach: nitter.app/mayowaoshin/status/172… The eval set is available as a LangChain public benchmark for anyone to test. See docs: langchain-ai.github.io/langc… Full write-up: blog.langchain.dev/multi-mod…
GPT-4 Vision: How to use @langchain with Multimodal AI to Analyze Images, Tables and Texts in Financial Reports. In this in-depth, practical workshop with @RLanceMartin, you'll learn how use multimodal RAG with @OpenAI's multimodal GPT-4V to analyze documents that contains diverse content types. In our demo, we analyse tables and images in @jaminball's Clouded Judgement blog. Full workshop video: piped.video/Rcqy92Ik6Uo?feature…
2
31
142
31,735
Spent ~24 hrs on planes w/ a 1.5 year old over break (not advised!) and listened to A LOT of podcasts. Some notes from my favorite ones. What is an agent? @erikschluntz + @barry_zyj define an agent as an LLM that autonomously performs actions (e.g., calling tools in a loop) [1], similar to the ReACT architecture [2]. How well do agents perform? @erikschluntz + team achieved 49% on SWE-bench Verified with a Claude3.5-Sonnet ReACT agent [3]. @claybavor gives an overview of Tau-Bench, a customer support eval benchmark, but mentions that frontier models with ReACT have poor reliability (gpt-4o achieves 61% and 35% on retail and airline customer support evaluations respectively) [4, 5]. How to address agent shortcomings today? @erikschluntz + @barry_zyj define “agentic workflow” as a system where LLMs + tools are orchestrated through predefined code paths (chain, parallelization, orchestrator-worker, etc) [1]. @claybavor mentions these types of workflows (or “reasoning scaffolding”) perform better than ReACT on Tau-Bench [4]. But @polynoamial argues that reasoning scaffolding may not scale w/ data and agent definition above w/ high capacity reasoning models (e.g., o-series, etc) + tool use ultimately may prevail [6]. What is happening with pre-training scaling? @dylan522p highlights what the people who know the most are doing [7]: Anthropic is working on a 400k Trainium chip cluster with Amazon, Zuck has a 2GW datacenter planned in Louisiana [8], Elon will have a 100k H100 cluster come online in the next few months [9]. @DarioAmodei mentioned that we will probably see a $100B cluster by 2027 [10]. @polynoamial says to consider the economics of the scaling of pre-training rather than the idea of a hard (e.g., data) wall; we know it’s viable to spend ~hundreds of millions, but at what scale are returns no longer viable [6]. What about test time compute (TTC) scaling? @polynoamial says we are much earlier in this curve [6]. But @dylan522p does point out that that TTC is less profitable than pre-training [7]: MSFT has reported $10B inference at ~50-70% gross margin on hosting OpenAI models, but TTC in reasoning models (e.g., o-series) uses ~10x more tokens are used to generate answers with reduced batching (~4-5x more servers to handle the same number of users) so the cost may be ~50x more [7]. @polynoamial points out that some reasoning problems are extremely high value (pay ~millions to solve them). What problems can TTC address? @polynoamial framed this [11]: it works well on cases where there is a clear “generator-verifier gap” (it is hard to generate solutions, but easy to verify a correct one). Coding and math are obvious examples. SWE-bench Verified went from 49% w/ Sonnet-3.5 [12] to 71% with O3 [13]. @swyx mentioned he uses o1 for AI news (writing w/ strong curation / summarization) [14]. What do these scaling trends mean for NVDA? @dylan522p argued that, for LLMs today, the software moat is smaller for inference vs training [7]: MSFT can justify deploying models on AMD if it lowers costs b/c they're running relatively few models at scale. But, for TTC this may shift this: @polynoamial argues that scaling inference for TTC was one of his primary concerns about the timeline for AGI [6], but apparently they’ve done a lot of work to resolve this. Jensen said Blackwell plays into the higher inference load for TTC (e.g., 10k+ tokens of thinking and also the need for much greater demand on high bandwidth memory) [15], which is provided by SK Hynix + Micron. What do these scaling trends mean for the application layer? @chetanp points out the rapidly dropping cost of inference in part due to open source models (w/ routers that pass requests between different models to cost optimize) [16]. Benchmark has made 25 AI bets (21 are application layer and 4 are infrastructure), the most they’ve invested since 2009 (mobile) and 1995 (internet). He is seeing a fast sales cycle with application layer companies because typically it is workforce displacement and targeting big / incumbent spent markets (sales automation, legal, accounting, ad networks, game development, circuit board design, new document processing tools). When and what is AGI? @DarioAmodei argues there is not a discrete threshold for AGI, it’s a smooth progression of capabilities like the term “supercomputing” in the 1990s with one “we’ll know it when we see it” heuristic that we’ll see Nobel-prize level work across many domains [10]. @polynoamial says that he underestimated prior timelines (e.g., to solve inference for test time compute) and that he expects progress to accelerate in 2025: “the problems we’ve already solved are harder than the problems we have ahead.” Sources [1] anthropic.com/research/build… [2] react-lm.github.io/ [3] latent.space/p/claude-sonnet [4] sequoiacap.com/podcast/train… [5] sierra.ai/blog/benchmarking-… [6] piped.video/watch?v=OoL8K_AF… [7] piped.video/watch?v=QVcSBHhc… [8] datacenterdynamics.com/en/ne… [9] datacenterdynamics.com/en/ne… [10] piped.video/ugvHCXCOmm4?si=ZuxK… [11] piped.video/watch?v=jPluSXJp… [12] anthropic.com/research/swe-b… [13] nitter.app/arankomatsuzaki/status… [14] latent.space/p/2024-review [15] teddit.net/r/singularity/com… [16] open.spotify.com/episode/16Y…
2
13
141
17,179
After being offline for a month w/ a new baby, I just drank the AI twtr firehose + see recent RAG themes: 1/ Improve RAG w/ condensed content embedding 2/ Manage RAG prompts 3/ Write RAG pipelines w/ low-level components Updated RAG docs shows all 3: python.langchain.com/docs/us…
4
17
140
27,960
@langchain released a prompt hub ~1.5 months ago to share + test prompts. I did a deep dive into hundreds of user-generated public prompts and distilled major themes. Writeup w/ themes + prompt highlights: blog.langchain.dev/the-promp…
11
25
134
43,455
I used @karpathy's Whisper transcriptions for the first 325 episodes and generated the rest. I used @langchain for splitting transcriptions / writing embeddings to @pinecone, LangChainJS for VectorDBQA, and @mckaywrigley's UI template. Some notes below ...
4
3
134
21,067
llms.txt + agent w/ url loader tool may be "all you need" but llms.txt files need to be well written use an llm generate them for you open source + works well w/ local models ... code: github.com/rlancemartin/llms…
1
11
130
8,506
Fully local agents w Llama3.1-8b Llama3.1-8b looks excellent for local (e.g., your laptop) workflows / agents. I built + evaluated a corrective RAG agent running locally (M2 Mac, 32gb, w/ @ollama). Short explainer, code, eval results: piped.video/watch?v=nPpgh_Ka…
2
26
131
10,259
Recent additions to @langchain data ecosystem (as of v0.0.215): improvements to @trychroma, @Redisinc, @weaviate_io, @pinecone, @supabase, and @elastic vectorstores; two new data loaders and improvements to @NotionHQ loader, and updated @MongoDB docs ...
8
20
123
18,216
CodeLlama model c/o @TheBlokeAI now work w/ llama-cpp-python. Getting ~25 tok / sec (Mac M2 max). Enabled b/c support for new llama.cpp GGUF format just got added to llama-cpp-python ~1hr ago. PR: github.com/abetlen/llama-cpp… Model download: huggingface.co/TheBloke/Code…
5
24
130
19,193
Recently added @gpt_index as a retriever option to auto-evaluator. Ran all 4 retrievers on a small test of 5 generated question-answer pairs from @karpathy's pod w/ @lexfridman: SVM retriever performing on par (in terms of performance and latency) as KNN (on FAISS VectorDB) ...
4
11
123
33,139
Fun RAG flow I worked on w/ @cohere command-R. Ties together (1) routing, (2) structured output w/ online unit tests, (3) RAG. command-R is good for flows like this b/c it's fast + structured outputs (for online tests) and good at RAG / routing.
Adaptive RAG w/ Cohere's new Command-R+ Adaptive-RAG (@SoyeongJeong97 et al) is a recent paper that combines (1) query analysis and (2) iterative answer construction to seamlessly handle queries of differing complexity. We took at stab at implementing these ideas from scratch using a ReAct agent and LangGraph with @cohere's Command-R and the new Command R+. Command-R is fast and lightweight (35b parameter) with strong tool-use and RAG performance. It works very nicely w/ LangGraph, performing query analysis (re-writing and routing) between a vectorstore, web search, and fallback to LLM. We also perform RAG with fast in-the-loop unit tests for doc relevance, answer hallucinations, and answer quality. We show the same same workflow using a ReAct agent and the larger Command R+. In the video, we discuss the trade-offs between using agents vs LangGraph, and Command-R vs the newer / larger Command R+. Video: piped.video/04ighIjMcAI LangGraph code: github.com/langchain-ai/lang… ReACT agent code: github.com/cohere-ai/noteboo… Paper: arxiv.org/abs/2403.14403
2
13
120
12,608
Multi-modal RAG for slide decks Visual Q+A assistants on slide decks are great app for multi-modal LLMs. Here is a template for quickstart: index slides as images w/ multi-modal embd, retrieve, pass to GPT-4V. Template: templates.langchain.com/?int… Blog: blog.langchain.dev/multi-mod…
2
19
119
16,062
For part 7 of our effort to improve @langchain docs, we're releasing an Open Source LLM guide: covers open source LLM SOTA (overview fig below) and ways to run them locally (llama.cpp, ollama.ai, gpt4all). python.langchain.com/docs/gu…
2
26
114
18,623
Possible tip on prompting Llama-2. Try special tokens from llama's generation code (<<SYS>>, <</SYS>>, [INST], [/INST]). Answers seem better w/ them. LangSmith trace w/o tokens linked (also, image left): smith.langchain.com/public/a… w/ tokens (right): smith.langchain.com/public/5…
8
19
117
29,003
a few thoughts on the current state of agents based on what I saw at @aiDotEngineer: + rise of "ambient" agents + the bitter lesson & agent UX + RL for non-verifiable tasks + the case for MCP + early days for agent memory rlancemartin.github.io/2025/…
8
19
115
12,422
There's a lot of interest in keyword + semantic search in retrieval. A few design patterns: 1/ Two-stage (@cohere Rerank), 2/ Ensemble (e.g., @langchain EnsembleRetriever), 3/ Hybrid search (e.g., via @pinecone, @weaviate_io, etc), but curious if folks have used others? ...
6
20
109
26,265
Fav local LLM use-case: Research assistant I give it a topic, it does iterative web search and result summarization for me. Rabbit-holes as long as I want. Free to run w/ @ollama (qwen-2.5, llama3.2, etc). Quick vid explainer (w/ code link): piped.video/XGuTzHoqlj8
1
18
107
6,318
Some notes from @aiDotEngineer day 1 - @simonw on state of AI > Visual eval for LLMs: asked each LLM to generate code for an SVG image of a pelican riding a bicycle. Ran this across ~30 model releases over the past 6 months. Created a script to select random image pairs, GPT4.1 as a grader to pick the better one, ran across a large set of pairwise sample to generate elo scores for each LLM. Gemini-2.5 #1+2, o3 #3, Claude4-sonnet #4. > Local model are getting better: Highlighted Mistral Small3 24GB, but hoping for a strong Llama4.1 release. > Memory can mean loss of control: Highlighted a case where GPT-4o injected location into an image based upon memories. Good example of memory working “behind the scenes” in an undesirable way. > Many models will “rat you out”: Claude4 infamous for this behavior, but benchmarking shows that many models will do similar (snitchbench.t3.gg/). @saranormous on AI opportunities > Code was the first major AI app b/c it’s easy to verify, on critical path to AGI, and eng build tools to help them first. Code adoption sets a roadmap for other industries. > Low-tech industries have ironically seen high AI adoption (leapfrog effect): Harvey ($70m ARR) in law, OpenEvidence in medicine, Sierra in customer support. > Execution is the moat: Cursor shipped fast w features that surfed the rising tide of model capability. Jasper is a counter-example: got crushed as models improved. @chu_onthis on MCP origins > Need more MCP servers: Beyond devtools (to sales, finance, legal, edu), expose agents as MCP servers > Need better tools to simplify server building: Automated MCP server generation. LLMs will eventually write their own MCP servers. > Don’t just wrap APIs: Think carefully about end-user, the client, the tools / resources that you want the server to expose in order to behave properly. @johnw188 on MCP Gateway at @AnthropicAI > The setup for MCP within Anthropic: LLMs got good at tool calling. Everyone started writing tools w/o coordination, resulting in duplication + many custom endpoints for each use-case. Inconsistent interfaces confuse developers. Duplicated functionality created maintenance challenges. > MCP standardized the message and transport: Standardize on something (anything)! MCP is just JSON streams. JSON-RPC spec for the message and streamable HTTP with oAuth 2.1 for global transport standard. > Why standardize in general: Integration plumbing is table stakes, not your differentiator. One pattern to learn, debug, secure, and optimize. Save cognitive load for problems worth solving. Each new integration builds on your previous work. > Why standardize on MCP: Ecosystem demand (AI ecosystem requires it), developed and maintained by large coalition of engineers, future-ready design to evolve along with model capabilities and solves problems that you haven’t hit yet. > Build pits of success: Make the easiest thing the right thing. Anthropic built an internal MCP Gateway to handle MCP connections and made it the easiest way to connect Claude to context or tools. It has a single entry point (connect_to_mcp) that abstracts all transport and auth, URL based routing to internal / external servers, automatic credential management (OAuth flows) and observability handled by the Gateway. Centralize at the right layer. This gives a central point of ingress / egress for all model context, allowing for auditing / policy enforcement and visibility into what models are trying to do. @dylan522p on GPU geopolitics > China will have chips from Huawei: Huawei is cracked; won 5G. Huawei Ascend 910b/c chips are strong, with HBM from Samsung and wafers from TSMC. SMIC is getting better (will try for 5nm this year, has 7nm now), but 910b/c still using TSMC for wafer purchased via a third-party (Sophgo). > US net energy supply growth insufficient to meet data center demand: xAI $10b cluster is 200-300MW. Stargate TX cluster is 1.2GW in 4 years. SemiAnalysis predicts 88GW of power load demand growth from datacenters by 2030. Net US energy supply additions fall 63GW short. > Energy projects are picking up in the middle east as a result: 5GW Stagate campus in UAE and similar project in Saudi Arabia. China added the entire US grid in 7 years, btw. @kevinhou22 on Windsurf > Code agents need to read anything that a SWE can (from many sources outside of the IDE): Much of the developer workflow is done outside of the IDE using external sources (Slack, Jira, Figma, Google Docs, Github, web searches) and informed by taste (memory, personal notes). Windsurf will use MCP to connect to / read from these external sources. > Code agents will take actions that SWE do (across many surfaces): Windsurf adding ability to do things like take control of Chroma, use Github MCP to create PR, deployment, etc. > Code agents shift from sync to ambient (in the background), async workflows that only alert the user for (final) approvals: Started with human-in-the-loop sync workflows in Cascade. But Windsurf wants to move to async ambient workflows running in the background, only asking the user for final approval. > Trained their own model, on par w/ SOTA: Trained their own model, SWE-1. Trained on SWE workflows, not just code gen. Shows near-SOTA on a few benchmarks vs o-series or Claude at fraction of the cost, and accept rates within Windsurf on par with frontier models in Windsurf. > IDE provides a data flywheel: Get feedback from users in IDE (accept/reject) and learn patterns of working. Use these to expand the model / agent. Then, ship improved model. @gdb on AI > How he developed intuition that AGI is achievable: Inspired by Turing paper, which mentioned idea of building a machine that learns like a human child. Neural nets are a 70+ year old idea. 1990s critique is that neural net people are “out of ideas” and “just want to build larger computers.” Felt that yes, this is exactly what we should do. 2012-era AlexNet shows SOTA in vision. Deep learning SOTA across other domains like NLP gave added confidence. Then, transformer and scaling laws empirically. So, makes the point that riding a 70+ year wave, with intuition that started with Turing, with early DL in 2010s in vision / NLP showing that it’s possible to build machines that can learn its own representations directly from data, and now transformer + scaling laws generalize to human intelligence / beyond. Amazing conf @swyx!
6
19
115
16,587
Gave this short talk on RAG vs long context LLMs at a few meetups recently. Tries to pull together threads from a few recent projects + papers I really like. Just put on YT, a few highlights w papers below ... piped.video/watch?v=SsHUNfhF…
2
10
108
6,344
Recent @langchain integrations to highlight across loaders, doc transformers, embeddings, retrievers (@googlecloud), and llms (llama-v2 support w/ @replicatehq and @ggerganov's llama.cpp) ...
3
22
103
37,290
Using Nomic embeddings locally Great to see @nomic_ai long context (8k tok), variable sized embeddings now run locally w/ llama.cpp. Fully local self-rag w @MistralAI-7b + @ollama + @nomic_ai v1.5 embd. Cookbook: github.com/langchain-ai/lang… Related vid: piped.video/watch?v=E2shqsYw…
2
23
105
12,440
Fully local tool calling llama3.1 + Ollama Ollama just added tool calling for local models! I tested this w/ llama3.1-8b + @GroqInc fine-tune-8b. With both, tool calling agents can run locally. Quick explainer w/ code: piped.video/watch?v=Nfk99Fz8…
2
17
101
9,407
Mistral Agent Cookbooks Great to work with @sophiamyang to contribute 3 cookbooks to @MistralAI! I’ve used LangGraph to build agents reliably with Mistral-7b on up to Mistral-Large. Short explainer w/ code linked: piped.video/sgnrL7yo1TE?si=2Z2S…
1
26
104
12,972
Looking forward to this webinar w/ @arizeai and @pinecone coming up at 9am PST! We often embed / store (e.g., in @pinecone) texts for LLM retrieval. The Phoenix tool from Arize is a great way to directly viz these embeddings and debug retrieval ... pinecone-io.zoom.us/webinar/…
2
22
102
18,965
There's a lot of interest in eval of open-source LLMs. I benchmarked @lmsysorg's Vicuna vs @OpenAI GPT-3.5/4 in the @langchain auto-evaluator app: in some cases, Vicuna-13b perf is on par w/ GPT3.5. Instructions to run Vicuna in LangChain and reproduce this are below ...
8
22
101
29,369
Code checks w/ reflection vastly improved my code assistant (inspired by @itamar_mar). But, biggest pain-point is deployment. @charles_irl + @modal_labs showed me a nice solution to this. We'll discuss it tmrw 9a pst! Signup: crowdcast.io/c/codeagents
Prompt engineering (or rather "Flow engineering") intensifies for code generation. Great reading and a reminder of how much alpha there is (pass@5 19% to 44%) in moving from a naive prompt:answer paradigm to a "flow" paradigm, where the answer is constructed iteratively.
6
12
101
15,867
Just added the @ESYudkowsky episode to lex-gpt, a Q+A assistant for all episodes of the @lexfridman pod. It is open source and I just made it free to use. App: lex-gpt.fly.dev/
8
13
101
19,284
I enjoyed @simonw's writeup on ColBERT, a nice method for high granularity document embedding from @lateinteraction & @matei_zaharia. Did a deep dive into ColBERT + RAGatouille. Short video, ntbk on usage, and useful links here:
RAG From Scratch: Indexing w/ ColBERT Our RAG From Scratch video series walks through impt RAG concepts in short / focused videos w/ code. This is the 14th video in our series and focuses on indexing with ColBERT for fine-grained similarity search. 🔧 Problem: Embedding models compress text into fixed-length (vector) representations that capture the semantic content of the document. This compression is very useful for efficient search / retrieval, but puts a heavy burden on that single vector representation to capture all the semantic nuance / detail of the doc. In some cases, irrelevant (to a query) / redundant content can dilute the semantic usefulness of the embedding for retrieval. 💡 Idea: ColBERT (@lateinteraction & @matei_zaharia) is a neat approach to address this with a higher granularity embedding approach: (1) produce a contextually influenced embedding for each token in the document and query. (2) score similarity between each query token and all document tokens. (3) take the max. (4) do this for all query tokens. (5) take the sum of the max scores (in step 3) for all query tokens to get a query-document similarity score. This granular token-wise similarity scoring between document and query has shown strong performance. 📽️ Video: piped.video/cN6S0Ehm7_8 💻 Code: github.com/langchain-ai/rag-… 🧠 References: 1/ Paper: arxiv.org/abs/2004.12832 2/ Nice review from @DataStax: hackernoon.com/how-colbert-h… 3/ Nice post from @simonw: til.simonwillison.net/llms/c… 4/ColBERT repo: github.com/stanford-futureda… 5/ RAGatouille to support RAG w/ ColBERT: github.com/bclavie/RAGatouil…
1
16
99
21,234
There's a lot of questions abt smaller, open source LLMs vs larger, closed models for tasks like question answering. So, we added @MosaicML MPT-7B & @lmsysorg Vicuna-13b to @langchain auto-evaluator. You test them on your own Q+A use-case ... autoevaluator.langchain.com/…
1
22
98
24,775