Yi Ding -- prod/acc · Mar 5, 2026 · 9:25 PM UTC

Yi Ding -- prod/acc

Pinned Tweet

Yi Ding -- prod/acc

@yi_ding

Mar 5

GPT-5.4: the most RLed joke teller yet. @Ed_Miliband levels of consistency.

811

Yi Ding -- prod/acc · Nov 7, 2023 · 3:15 AM UTC

Yi Ding -- prod/acc

@yi_ding

7 Nov 2023

OpenAI has put together a pretty good roadmap for building a production RAG system. h/t @benparr 1. Start with naive RAG. 2. Tune your chunks: your splitting, chunk sizes, but also small to big strategies where you retrieve more than you embed. Forget fine tuning embedding models (for now). 3a. Rerank the results of your retrievals. 3b. Classify. Here I'm speculating they're running a classifier on the type of query: bucketing questions about a particular part of one document questions vs. questions about the entire document vs. questions about multiple documents.* 4. Do prompt engineering, and layer on agentic flows to expand queries and use tools. The cool thing is to see how much performance improves when you combine all of these strategies. We have almost all of these pieces in @llama_index for you to try today. 1. Start with our 5 line naive RAG setup. 2. Tune splitter options and small to big strategies: docs.llamaindex.ai/en/stable… 3a. Reranking strategies: blog.llamaindex.ai/using-llm… 3b. Routing queries based on classification: replit.com/@LlamaIndex/Llama… 4. Data Agents and the SubQuestionQueryEngine: gpt-index.readthedocs.io/en/… *It's possible that they're also doing some classification of the returned chunks, which would be another interesting idea.

548

86,321

Yi Ding -- prod/acc · Oct 31, 2023 · 12:20 AM UTC

Yi Ding -- prod/acc

@yi_ding

31 Oct 2023

Are you using GPT-4 and wondering why it feels a bit more robotic these days? Like it knows only one joke? Fairly conclusive proof that @OpenAI is now caching its responses, and in a way that ignores the temperature setting.

444

159,679

Yi Ding -- prod/acc · Jun 10, 2024 · 10:15 PM UTC

Yi Ding -- prod/acc

@yi_ding

10 Jun 2024

Siri 2.0 is a RAG app.

Nick Dobos

@NickADobos

10 Jun 2024

Siri can read EVERY piece of data on your phone (for apps that opt in)

429

66,263

Yi Ding -- prod/acc · Nov 6, 2023 · 11:17 PM UTC

Yi Ding -- prod/acc

@yi_ding

6 Nov 2023

A few thoughts on how OpenAI is implementing RAG in the new Assistants Retrieval tool before I was locked out. 1. They're splitting on newlines. You can tell because they forget to insert the newlines back between the splits when giving you the reference (red squiggles). 2. The reference is contiguous. I haven't tried more complicated questions so can't rule out that they would do some kind of subquestion decomposition, but the queries I tried all just gave a single reference. 3. The number of chunks returned is variable. This indicates some kind of "small to big" strategy where they retrieve more context than they match. Perhaps something similar to our AutoMergingRetriever docs.llamaindex.ai/en/stable… w/ some kind of similarity cutoff but worth a bit more investigation. 4. They have some trouble handling UTF-8. In general, very bleeding edge still, but look forward to seeing how this evolves.

417

83,043

Yi Ding -- prod/acc · Nov 21, 2023 · 4:55 PM UTC

Yi Ding -- prod/acc

@yi_ding

21 Nov 2023

If you’re using RAG in @AnthropicAI models the best position to put your most relevant chunk is at the bottom. For ChatGPT it’s at the top.

328

81,157

Yi Ding -- prod/acc · Dec 22, 2023 · 12:11 AM UTC

Yi Ding -- prod/acc

@yi_ding

22 Dec 2023

RIP @ChatGPTapp Plugins. 🫗

278

53,938

Yi Ding -- prod/acc · Nov 28, 2023 · 7:56 PM UTC

Yi Ding -- prod/acc

@yi_ding

28 Nov 2023

If you're new to Retrieval Augmented Generation, a fantastic place to start trying it out is @jerryjliu0's new RAGs @streamlit app. 3.6K github stars in 2 weeks! github.com/run-llama/rags

GitHub - run-llama/rags: Build ChatGPT over your data, all with natural language

Build ChatGPT over your data, all with natural language - run-llama/rags

github.com

227

26,893

Yi Ding -- prod/acc · Sep 5, 2023 · 11:26 PM UTC

Yi Ding -- prod/acc

@yi_ding

5 Sep 2023

Another good article about RAG vs. Finetuning and when to use RAG with finetuning. towardsdatascience.com/rag-v… One of the benefits of RAG that isn't well known: it helps reduce hallucinations. "So in applications where suppressing falsehoods and imaginative fabrications is vital, RAG systems provide in-built mechanisms to minimise hallucinations. The retrieval of supporting evidence prior to response generation gives RAG an advantage in ensuring factually accurate and truthful outputs."

RAG vs Finetuning - Which Is the Best Tool to Boost Your LLM Application? | Towards Data Science

The definitive guide for choosing the right method for your use case

towardsdatascience.com

230

39,125

Yi Ding -- prod/acc · May 8, 2024 · 6:56 PM UTC

Yi Ding -- prod/acc

@yi_ding

8 May 2024

A few takeaways from the @OpenAI Model Spec: cdn.openai.com/spec/model-sp… 1. GPT-5 and future models will be significantly better at decision making and instruction following. The sheer number of conditions here, some almost contradictory, is impressive. 2. Multiple levels of instructions, and the user won't be able to access to most of the top levels. system messages are now developer messages, and there's a layer on top called platform messages that can only be changed by OpenAI. This change was done for safety/security reasons, but if not done well could lead to overrefusals like we saw with Llama 2. 3. Customized outputs based on "settings." "interactive" and "max_tokens" will now directly affect the output of the models, meaning there will no longer be one canonical "GPT-5" response. Previously max_tokens only truncated but now it affects brevity. 4. YAML, JSON, and XML are explicitly called out and treated differently by the model than other parts of the prompt. This is called out for avoiding prompt injection, but could also mean that prompts constructed with these types of inputs will perform better in future models. 5. There is some indication that some of the rules can be overridden. It'll be interesting to test which ones are more strongly held and which ones are more weakly held by the model. The good: I think this is a good indication that GPT-5 is going to be a significant step up in reasoning capabilities. It's also an indication that OpenAI continues to take safety seriously, although the track record has been that every level of filtering and safety they've put in so far has been able to be worked around up until now. Will be interesting to see whether the new prompt hierarchy puts that problem to rest once and for all. The bad: For application developers, this may be an indication that GPT output will become even more unpredictable and opaque. The many layers of rules will undoubtedly cause corner cases where previously tuned programs and prompts stop working. The ugly: Even with 4 levels of prompts now, there's no dedicated context or RAG message.

198

47,466

Yi Ding -- prod/acc · Nov 9, 2023 · 7:23 PM UTC

Yi Ding -- prod/acc

@yi_ding

9 Nov 2023

How does @OpenAI's RAG do on PDFs? First test: table extraction. As we found with secinsights.ai, handling tables often requires special logic. @AnthropicAI clearly does that, and OpenAI Assistants doesn't, so most of the numbers it outputs are wrong.

188

37,118

Yi Ding -- prod/acc · Jun 7, 2024 · 6:40 PM UTC

Yi Ding -- prod/acc

@yi_ding

7 Jun 2024

Replying to @VicVijayakumar @rbolte

Please don’t do this. You don’t want your customer’s first impression of your product to be “my boss just got a wtf from the security/branding/marketing/partnerships team”

178

4,727

Yi Ding -- prod/acc · Oct 30, 2023 · 4:27 PM UTC

Yi Ding -- prod/acc

@yi_ding

30 Oct 2023

ChatGPT now has built in Retrieval Augmented Generation (RAG)! Can’t wait to put it through its paces!

170

56,963

Yi Ding -- prod/acc · Aug 7, 2024 · 4:29 PM UTC

Yi Ding -- prod/acc

@yi_ding

7 Aug 2024

Replying to @ArmandDoma

136

5,578

Yi Ding -- prod/acc · Aug 10, 2023 · 3:42 PM UTC

Yi Ding -- prod/acc

@yi_ding

10 Aug 2023

What's the difference between the temperature, top P, and top K settings in LLMs? This 2020 @huggingface article does probably the best job I've seen in terms of explaining what goes on when you adjust those settings. huggingface.co/blog/how-to-g…

How to generate text: using different decoding methods for language generation with Transformers

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

144

52,828

Yi Ding -- prod/acc · Aug 10, 2023 · 5:36 PM UTC

Yi Ding -- prod/acc

@yi_ding

10 Aug 2023

Some people may have noticed that LLMs tend to lose some of their intelligence when temperature is set to 0. This article does a good job of explaining why. It was so convincing that we changed our default temperature in @llama_index Python and TS to 0.1.

Yi Ding -- prod/acc

@yi_ding

10 Aug 2023

145

39,734

Yi Ding -- prod/acc · Aug 18, 2023 · 2:44 PM UTC

Yi Ding -- prod/acc

@yi_ding

18 Aug 2023

Great article on the pros and cons of fine tuning. Too often I see people reaching for fine tuning when they should really try k-shot prompting and retrieval augmented generation first, tidepool.so/2023/08/17/why-y…

141

25,426

Yi Ding -- prod/acc · Dec 6, 2023 · 10:05 PM UTC

Yi Ding -- prod/acc

@yi_ding

6 Dec 2023

Replying to @ArmandDoma

Check out the demographics for the UAE.

118

9,123

Yi Ding -- prod/acc · Sep 25, 2024 · 6:31 PM UTC

Yi Ding -- prod/acc

@yi_ding

25 Sep 2024

First take: Llama 3.2 is going to really expand the ColPali-style retrieval strategies. ColPali is currently built on a 3B vLLM (PaliGemma), so super interested to see how it scales to 90B.

133

13,090

Yi Ding -- prod/acc · Dec 6, 2023 · 3:53 PM UTC

Yi Ding -- prod/acc

@yi_ding

6 Dec 2023

Is the @GoogleDeepMind Gemini announcement the first time prompt engineering is so prominently advertised in a LLM? The CoT here stands for “chain of thought” and it looks like it’s one of the key reasons for Gemini’s record breaking performance.

113

22,067

Yi Ding -- prod/acc · Oct 17, 2023 · 3:39 PM UTC

Yi Ding -- prod/acc

@yi_ding

17 Oct 2023

Simple AI Agent math: On a 10 step agent loop: If your LLM error rate is 5%, then your overall success rate is 0.95^10 = 60% If error rate is 10%, then your success rate is 0.9^10 = 35% But if you can just reduce your error rate to 3% (w/ retrieval) success rate becomes 74%

116

30,115

Yi Ding -- prod/acc · Nov 27, 2023 · 6:21 PM UTC

Yi Ding -- prod/acc

@yi_ding

27 Nov 2023

If you’re new to using LLMs like ChatGPT, it’s important to know what they are or are not good at. Good at: Text generation with recall (RAG) Entity extraction Coding Decision making with <5 choices Not good at: Math Numerical tasks (counting) Decision making with >20 choices

108

20,303

Yi Ding -- prod/acc · Nov 5, 2023 · 1:24 AM UTC

Yi Ding -- prod/acc

@yi_ding

5 Nov 2023

Got to show off @seldo's @MongoDB starter at the hackathon today. github.com/run-llama/mongodb… Complete LlamaIndex RAG setup including Flask backend and Next frontend and one line deployment to @render

GitHub - run-llama/mongodb-demo

Contribute to run-llama/mongodb-demo development by creating an account on GitHub.

github.com

104

17,980

Yi Ding -- prod/acc · Dec 6, 2023 · 4:56 PM UTC

Yi Ding -- prod/acc

@yi_ding

6 Dec 2023

The @GoogleDeepMind Gemini paper introduces this concept of "uncertainty-routed" Chain of Thought. Still trying to digest it, but looks like the Gemini model does significantly better once "uncertainty-routing" is added to the technique. Why doesn't GPT-4 benefit in the same way? Is it because GPT-4 (and other OpenAI models) no longer return token probabilities? Need to dig into it more. Full paper here: storage.googleapis.com/deepm… Experts feel free to chime in.

20,781

Yi Ding -- prod/acc · Sep 18, 2023 · 12:14 AM UTC

Yi Ding -- prod/acc

@yi_ding

18 Sep 2023

The RAG/Prompt Engineering vs. Finetuning/Distillation question hinges on whether LLMs are still improving exponentially or we have reached a plateau. If we are still in exponential growth then RAG/Prompt Engineering is more likely to pay off and vice versa if we’ve plateaued.

26,288

Yi Ding -- prod/acc · Sep 7, 2023 · 6:49 PM UTC

Yi Ding -- prod/acc

@yi_ding

7 Sep 2023

What should you do if you're retrieving over multiple documents that have similar, but distinct content? Here, @thesourabhd shows off what I consider one of @llama_index's less known superpowers: the SubQuestionQueryEngine. By wrapping the documents into tools and letting the LLM choose which document to query, we ensure that $AAPL's financials aren't confused for $MSFT's. Check out his full video here: piped.video/watch?v=2O52Tfj7… and our documentation for the SubQuestionQueryEngine here: ts.llamaindex.ai/modules/hig…

16,951

Yi Ding -- prod/acc · Dec 16, 2022 · 6:20 PM UTC

Yi Ding -- prod/acc

@yi_ding

16 Dec 2022

Replying to @garybasin

It’s a better indication of the noise/biases of interview process than anything. Anybody who thinks they can determine someone, much less 99% of the people, they spend an hour talking to is “is dead weight” is wildly overconfident in their own abilities.

6,227

Yi Ding -- prod/acc · Sep 11, 2023 · 6:51 PM UTC

Yi Ding -- prod/acc

@yi_ding

11 Sep 2023

Saw that we just hit 1K stars on secinsights.ai! 🚀 If you're interested in building a production ChatGPT application with data (RAG), you definitely want to check it out. github.com/run-llama/sec-ins…

7,509

Yi Ding -- prod/acc · Nov 10, 2023 · 3:43 PM UTC

Yi Ding -- prod/acc

@yi_ding

10 Nov 2023

So perhaps the most interesting thing to come out of the spelunking I did yesterday into OpenAI’s RAG architecture is the revelation that it looks like they’re not embedding the whole query. Instead they’re doing almost a sort of an advanced keyword extraction before embedding. I didn’t find any explicit instructions on how they do this. @jxnlco thinks it may have been fine tuned into the model and I’m tempted to agree with him. This goes against pretty much every RAG query embedding example I’ve seen up until this point. Does it actually help performance? In any case it deserves more exploration.

11,368

Yi Ding -- prod/acc · Jul 23, 2024 · 7:59 PM UTC

Yi Ding -- prod/acc

@yi_ding

23 Jul 2024

Prompt Guard might be one of the most interesting, underlooked, parts of this 3.1 drop. A model fine tuned on detecting prompt injections and other LLM attacks. ai.meta.com/blog/meta-llama-… Only 11 downloads so far on @huggingface huggingface.co/meta-llama/Pr…

Expanding our open source large language models responsibly

Today, we’re sharing the measures and safeguards we’ve taken to responsibly scale the Llama 3.1 collection of models, including the 405B.

ai.meta.com

3,505

Yi Ding -- prod/acc · Jul 28, 2023 · 6:34 PM UTC

Yi Ding -- prod/acc

@yi_ding

28 Jul 2023

It's bittersweet to leave the Messaging Apps team at Apple, a team I co-founded 7 years ago. It was, and still is, an incredible team of engineers. Over the years, we shipped Apple's official account on WeChat, the Apple Business Chat chatbot, and two weeks ago Apple's WeChat mini-program. cnbc.com/2023/07/11/apple-la… Internally we were known as the cowboys 🤠, and I loved our posse. But this last year, the world has changed with the explosion of capabilities from Large Language Models, and I knew that this would be the the defining tech of the decade. So when the call came, I had to join @llama_index LlamaIndex is not only one of the open source leaders in the LLM space, it's also a really amazing community. When I talk to @jerryjliu0 and @disiok their commitment to quality, developer experience, and listening to customers is always in our conversations. So I'm really excited to build LlamaIndex.TS (we shipped this Monday!) and DevRel with them. LFG

21,262

Yi Ding -- prod/acc · Sep 27, 2023 · 12:39 AM UTC

Yi Ding -- prod/acc

@yi_ding

27 Sep 2023

2023 is the year of the AI demo. 2024 is the year we will see LLM backed applications widely deployed in production.

6,257

Yi Ding -- prod/acc · Sep 2, 2023 · 5:33 PM UTC

Yi Ding -- prod/acc

@yi_ding

2 Sep 2023

After 6 weeks at @llama_index all of the following are true: It's the most work I've taken on in my career. It's the most interesting work of my career. It's the most work/day I've completed in my career. It's the most behind I've been on my work in my career. Sounds fun? Join us! dub.sh/llama

17,008

Yi Ding -- prod/acc · Nov 7, 2023 · 8:00 PM UTC

Yi Ding -- prod/acc

@yi_ding

7 Nov 2023

Bit of a personal milestone for me. Up until earlier this year, I spent my time on this site without a profile picture or a bio, primarily to read the insights of others w/ <200 followers. A testament to the LlamaIndex and AI community for how quickly that changed. That said, if you have any account recommendations, please list them below. Would love to follow another couple hundred folks who I can learn more from.

7,256

Yi Ding -- prod/acc · Jun 14, 2024 · 10:06 PM UTC

Yi Ding -- prod/acc

@yi_ding

14 Jun 2024

How was your week? My answer, for almost every week of the last year at @llama_index, was "it's been a crazy week." Well, today, it's the last of those weeks, and I cannot help feeling a little sad to be leaving such an amazing team. 🙏 Jerry and Simon for the opportunity.

6,851

Yi Ding -- prod/acc · Aug 21, 2023 · 5:52 AM UTC

Yi Ding -- prod/acc

@yi_ding

21 Aug 2023

🪙Tokens are one of the first things people hear about when they come across large language models like ChatGPT. What is a token? a sequence of unicode characters converted into a number. These numbers are then how the text you send to the LLM are represented internally inside the model. For example, if we use the @OpenAI tokenizer, platform.openai.com/tokenize…, we see that the first three words of this tweet are separated into three tokens: ["Tokens", " are", " one"]. However, not all words are like that: for example, if you try the word strawberry, you may see that the word is separated into three words: ["st", "raw", "berry"]. But wait, if you just add a space before the word, it ends up being one token: [" strawberry"]. 🍓 What's going on here? This is one of the quirks of the way tokenization works. Spaces and punctuation also need to be tokenized. So GPT is trained on a lot of sentences with the word strawberry in the middle, but fewer sentences with the word strawberry, uncapitalized, at the beginning of a sentence or paragraph. This is a good reason why you don't want to have extraneous spaces (and depending on what you're trying to do newlines) at the end of your prompts. By leaving a space at the end of your prompt, you make it harder for the model to complete with many common words.

25,968

Yi Ding -- prod/acc · Dec 15, 2023 · 7:05 PM UTC

Yi Ding -- prod/acc

@yi_ding

15 Dec 2023

What is RAG? Retrieval Augmented Generation is three intimidating words for newcomers to the world of ChatGPT and LLMs. Today, @tonykipkemboi and I coined an acronym live at @streamlit. BOWS: Better Output with Search. Thanks @CarolineFrasca! Very fun! piped.video/watch?v=PLKkudXY…

Demystifying RAG apps with LlamaIndex!

Join @tonykipkemboi and @LlamaIndex 's Yi Ding as they delve into ...

youtube.com

8,688

Yi Ding -- prod/acc · Mar 27, 2024 · 3:42 PM UTC

Yi Ding -- prod/acc

@yi_ding

27 Mar 2024

When we first built @llama_index TS I would have never imagined we'd have a course on it taught by @AndrewYNg and @seldo in less than a year. 🤯 deeplearning.ai/short-course…

JavaScript RAG Web Apps with LlamaIndex

Build a full-stack web application that uses RAG capabilities to chat with your data. Learn to build a RAG application in JavaScript, using an intelligent agent to answer queries.

deeplearning.ai

6,267

Yi Ding -- prod/acc · Aug 11, 2023 · 2:16 PM UTC

Yi Ding -- prod/acc

@yi_ding

11 Aug 2023

What are the best ways to keep LLM responses concise? Llama2 ignores "Please be concise" or even "PLEASE BE CONCISE!" But "Prefer shorter answers. Keep your response to 100 words or less." seems to work OK. github.com/run-llama/LlamaIn…

6,711

Yi Ding -- prod/acc · May 8, 2024 · 6:34 PM UTC

Yi Ding -- prod/acc

@yi_ding

8 May 2024

Interesting change from @OpenAI with consequences for RAG: max_tokens will affect generated output in the future.

25,490

Yi Ding -- prod/acc · Oct 31, 2023 · 12:23 AM UTC

Yi Ding -- prod/acc

@yi_ding

31 Oct 2023

Replying to @yi_ding @OpenAI @goodside

Interestingly, it looks like the caching is rather aggressive. So the LLM responds with the same joke whether you say "Tell me a joke." or "Tell me a new joke." This indicates OpenAI is also doing some kind of query clustering, and not just 1 to 1 query->response caching.

6,781

Yi Ding -- prod/acc · Jun 1, 2024 · 3:09 PM UTC

Yi Ding -- prod/acc

@yi_ding

1 Jun 2024

Replying to @netcapgirl

More like k dot vs. vanilla ice

26,152

Yi Ding -- prod/acc · Oct 5, 2023 · 1:19 AM UTC

Yi Ding -- prod/acc

@yi_ding

5 Oct 2023

There are basically three groups of vector databases today: 1. Vector DBs built from scratch. 2. Vector DBs built on top of existing DBs 3. Vector search built on top of existing search platforms It'll be really interesting to see which use cases end up using which.

10,942

Yi Ding -- prod/acc · Sep 19, 2023 · 6:41 PM UTC

Yi Ding -- prod/acc

@yi_ding

19 Sep 2023

Do LLMs need intrinsic facts for decision making and what does that mean for RAG (data driven LLM apps)? LLMs store a lot of facts in their parameters. Many more than your average human knows, from string theory to ballroom dance. Some people, including @sama, have suggested that in the future we may be able to remove those facts from the LLM itself, distill the LLM to a pure reasoning engine and using RAG, allow the LLM to then retrieve only the facts it needs. Is this possible? Not currently and I’m also skeptical long term. From what we know about human decision making, knowledge acquisition is an integral part of making good decisions. You cannot put a super intelligent grade schooler in a corporate role and expect them to do as well or better than a veteran (I was in a gifted program so knew some very intelligent grade schoolers.) My bet is that LLMs will always need some baseline of intrinsic facts. What does that mean for RAG? When you are building a RAG application, there is always a tension between what a model “knows” and what data you give it. Oftentimes we want to instruct the model to only consider the data we give it but it can lead to degradation in performance in certain use cases if we are too strict (as we found out when we pushed a new prompt in 0.8 and then reverted). If we aren’t strict enough in the prompt then the model may “hallucinate.” Unfortunately there’s no clear answer at the moment to the exact balance between letting or preventing the LLM from using its intrinsic facts. This is something we are actively working on at @llama_index but in the meantime the key to production RAG will be to properly tune the strictness to your own data and use cases.

11,915

Yi Ding -- prod/acc · Jul 28, 2023 · 3:19 AM UTC

Yi Ding -- prod/acc

@yi_ding

28 Jul 2023

What’s Retrieval Augmented Generation? Search, Give, Get. For those of us coming from a traditional software development background RAG can sound intimidating, but it really is a simple concept: Search for the relevant data Give the data to GPT Get a better response Of course, finding the most relevant data is not straightforward. That’s why if you’re new to LLM App development it’s best to use a framework like @llama_index or ts.llamaindex.ai Best part? It’s all open source so you can see exactly what we are doing.

23,028

Yi Ding -- prod/acc · Jul 16, 2023 · 7:13 AM UTC

Yi Ding -- prod/acc

@yi_ding

16 Jul 2023

Another @scale_AI hackathon in the books! Built a tool to convert any typescript function with JSDoc into a callable @OpenAI GPT3.5/4 function. Thanks @amasad for the inspiration and push to get this started, and @atroyn for the push to clean this up. Source coming this week!

10,133

Yi Ding -- prod/acc · Nov 11, 2023 · 12:41 AM UTC

Yi Ding -- prod/acc

@yi_ding

11 Nov 2023

It looks like @OpenAI's code interpreter runs in a Jupyter notebook. So does @juliusai. So have we as an industry decided that Jupyter notebooks are the best way to sandbox potentially malicious Python code? What a wild couple of decades for @IPythonDev

2,148

Yi Ding -- prod/acc · Nov 22, 2023 · 6:36 AM UTC

Yi Ding -- prod/acc

@yi_ding

22 Nov 2023

OK, absolutely worst time to publish this, but just pushed a new version of @llama_index TS with day one support for @AnthropicAI Claude 2.1 including a new Anthropic specific RAG template. npm.im/llamaindex Lots of other great community contributions also. Will wait to highlight them until after everyone's finished digesting/celebrating the Sam/Greg/OpenAI news.

11,042

Yi Ding -- prod/acc · Nov 22, 2023 · 6:54 AM UTC

Yi Ding -- prod/acc

@yi_ding

22 Nov 2023

Replying to @emilychangtv

Leaked footage of @sama negotiating.

18,907

Yi Ding -- prod/acc · May 2, 2023 · 5:18 PM UTC

Yi Ding -- prod/acc

@yi_ding

2 May 2023

Replying to @amasad

Wasn't that excited until I saw that it was Python compatible. 🔥 Could be the next Kotlin.

18,698

Yi Ding -- prod/acc · Aug 27, 2023 · 4:28 PM UTC

Yi Ding -- prod/acc

@yi_ding

27 Aug 2023

Prediction: "small to big": embed smaller, context bigger will become the default retrieval augmented generation technique by 2024. Great job @jxnlco and team.

Jerry Liu

@jerryjliu0

27 Aug 2023

This might be the first time ChatGPT (+@jxnlco) helped us come up with a better retrieval algorithm for RAG: 1️⃣ Create a hierarchy/graph of “parent chunks” -> smaller chunks. Also link adjacent chunks together. 2️⃣ During query-time, first retrieve smaller chunks with embedding similarity. 3️⃣ Merge leaves: If any subset of these chunks is a major portion of a larger chunk, return the parent chunk instead. Result 💡: Dynamically retrieve less disparate / larger contiguous blobs of context *only when you need it*. Helps the LLM synthesize better results, but avoids always cramming in as much context as you can. We’ve implemented these ideas in @llama_index. We created a HierarchicalNodeParser to parse unstructured text into a node hierarchy, and then a AutoMergingRetriever to “merge in parent chunks” during query-time. Full guide here: gpt-index.readthedocs.io/en/… Again, full credits for this idea go to @jxnlco - not only Python wizard, but also a ChatGPT Code Interpreter whisperer 🪄

10,352

Yi Ding -- prod/acc · Nov 7, 2023 · 2:43 AM UTC

Yi Ding -- prod/acc

@yi_ding

7 Nov 2023

Not sure why this tweet hasn't gotten more attention, but looks like @OpenAI presented some info on how they improved their RAG pipeline. A good roadmap think about how to optimize RAG for production, with a lot of the steps we've been evangelizing at @llama_index

Ben Parr

@benparr

6 Nov 2023

.@openAI has put in a lot of work into RAG (retrieval augmented generation). Accuracy has jumped from 45% to 98% with things like retaining, prompt engineering, and query expansion.

4,207

Yi Ding -- prod/acc · Oct 31, 2023 · 12:22 AM UTC

Yi Ding -- prod/acc

@yi_ding

31 Oct 2023

Replying to @yi_ding @OpenAI

If you set a high n value, you can still get the "raw" GPT-4 response. Another sign that the response is not cached is that your latency goes up a lot. h/t @goodside for the temp=2 trick.

7,350

Yi Ding -- prod/acc · Oct 17, 2023 · 4:13 AM UTC

Yi Ding -- prod/acc

@yi_ding

17 Oct 2023

If you’re building an agent today it needs to be on GPT-4. Because agents compound LLM errors, small differences in single trip capability become glaringly obvious.

6,450

Yi Ding -- prod/acc · Aug 21, 2023 · 7:23 PM UTC

Yi Ding -- prod/acc

@yi_ding

21 Aug 2023

Context windows🪟: You may see models advertised as 4K, 8K, 16K, 32K, 100K. One of the key things to remember about context windows is that they count the input and output combined. The other thing to remember is they're based on tokens, not words. What that means is that the more you input, the less you can receive as output. If you're using Anthropic 100K this might not matter so much, but if you're using the default ChatGPT (3.5-turbo) model, you're limited to 4096 tokens. And you will be rather abruptly reminded of that fact, because if you set max_tokens to a larger number it will simply refuse to process the request at all. So context window management becomes a rather important part of building LLM applications. If the input is too large, then not enough space is left for the output. If we ask the LLM to provide too large of an output, it may error out. Thankfully, frameworks like @llama_index (Python and TS) have already solved the context window management problem for you.

8,538

Yi Ding -- prod/acc · Sep 1, 2023 · 12:37 PM UTC

Yi Ding -- prod/acc

@yi_ding

1 Sep 2023

Happy 2nd birthday to ChatGPT's knowledge cutoff! 🎂

3,242

Yi Ding -- prod/acc · Oct 20, 2023 · 1:27 AM UTC

Yi Ding -- prod/acc

@yi_ding

20 Oct 2023

“It’s really easy to build a cool demo. It’s really hard to build a robust product.” ⁦@DrJimFan⁩ ⁦@NotionHQ⁩

4,419

Yi Ding -- prod/acc · Sep 2, 2023 · 4:36 PM UTC

Yi Ding -- prod/acc

@yi_ding

2 Sep 2023

Cool article about building a PandasAI-like interface using @llama_index's Pandas Query Engine. Personal note: my ML journey started with R dataframes, and it's amazing to see that many years later the data structure has held up. kdnuggets.com/build-your-own…

Build Your Own PandasAI with LlamaIndex - KDnuggets

Learn how to leverage LlamaIndex and GPT-3.5-Turbo to easily add natural language capabilities to Pandas for intuitive data analysis and conversation.

kdnuggets.com

5,113

Yi Ding -- prod/acc · Oct 24, 2023 · 5:19 AM UTC

Yi Ding -- prod/acc

@yi_ding

24 Oct 2023

Retrieval augmented generation is basically giving a talk to a LLM and then having a pop quiz at the end.

4,101

Yi Ding -- prod/acc · Apr 19, 2024 · 5:05 PM UTC

Yi Ding -- prod/acc

@yi_ding

19 Apr 2024

Move over JSON/YAML. Markdown seems to be the clear winner when it comes to LLM interactions. Thank you Aaron. 🙏

Caleb Peffer (Hiring!)

@CalebPeffer

19 Apr 2024

Spread the word

7,599

Yi Ding -- prod/acc · Nov 3, 2023 · 4:29 AM UTC

Yi Ding -- prod/acc

@yi_ding

3 Nov 2023

2024 will be the year of LLMs in production. Those of us building tools will need to be ready for the shift.

Arize AI

@arizeai

27 Oct 2023

Replying to @arizeai

1. LLM adoption has reached a tipping point. Over two-thirds (66.9%) of developers and ML teams are planning production deployments of #llm applications in the next 12 months or “as fast as possible” – and 14.1% are already in production.

5,868

Yi Ding -- prod/acc · Aug 8, 2023 · 12:04 AM UTC

Yi Ding -- prod/acc

@yi_ding

8 Aug 2023

Our @llama_index playground llama-playground.vercel.app/ is a great place to get started with your RAG journey. We made it even easier to use: - Temperature and Top P options. 🙏 @PusarlaSamay - Tooltips with plain lang explanations. Did you know ChatGPT's temperature goes up to 2?

4,295

Yi Ding -- prod/acc · Sep 2, 2023 · 5:20 PM UTC

Yi Ding -- prod/acc

@yi_ding

2 Sep 2023

Replying to @itsandrewgao

My guesses? Quantization, caching, speculative execution, and maybe a bit of distillation also.

Yi Ding -- prod/acc

@yi_ding

8 Aug 2023

Q: Do you think ChatGPT is using 4 bit quantization today?

12,267

Yi Ding -- prod/acc · Jun 29, 2024 · 3:19 PM UTC

Yi Ding -- prod/acc

@yi_ding

29 Jun 2024

TL;DR: if you’re using LLMs to make decisions then it’s agentic.

Harrison Chase

@hwchase17

29 Jun 2024

❓What is an agent? I get asked this question a lot, so I wrote a little blog on this topic and other things: - What is an agent? - What does it mean to be agentic? - Why is “agentic” a helpful concept? - Agentic is new Check it out here: blog.langchain.dev/what-is-a…

15,493

Yi Ding -- prod/acc · Sep 13, 2023 · 4:09 AM UTC

Yi Ding -- prod/acc

@yi_ding

13 Sep 2023

One of the hard things with remote learning is finding opportunities to meet your professors. In my case, I finally got some office hours with @AndrewYNg after 12 years. 🙏 @philipvollet for the photo and @DeepLearningAI_ for the hospitality. Learners first!

10,591

Yi Ding -- prod/acc · Sep 21, 2023 · 5:13 PM UTC

Yi Ding -- prod/acc

@yi_ding

21 Sep 2023

Going to give away some prod alpha tonight at my "LLM Quirks Mode" talk @mlopscommunity Thanks for the invite @RahulParundekar and look forward to the other talks by nbox.ai and run.ai lu.ma/llmops-overcoming-hurd…

6,371

Yi Ding -- prod/acc · Jan 26, 2024 · 12:24 AM UTC

Yi Ding -- prod/acc

@yi_ding

26 Jan 2024

What we've all been waiting for. GPT-4-0125-preview finally has a new joke!

6,065

Yi Ding -- prod/acc · Jul 25, 2023 · 10:04 PM UTC

Yi Ding -- prod/acc

@yi_ding

25 Jul 2023

One of the things we want to do with @llama_index TS is to help you see how LlamaIndex can help you choose the right text splitting, retrieval, and combination strategy. To that end, we're building llama-playground.vercel.app/. OSS and built with @nextjs and @shadcn

11,113

Yi Ding -- prod/acc · Nov 10, 2023 · 1:17 AM UTC

Yi Ding -- prod/acc

@yi_ding

10 Nov 2023

Looks like GPT-4 is doing pre-processing on the query before sending the search to the backend. @jxnlco you were right.

4,925

Yi Ding -- prod/acc · Jul 30, 2023 · 3:22 AM UTC

Yi Ding -- prod/acc

@yi_ding

30 Jul 2023

In honor of the @AnthropicAI Claude 2 @cerebral_valley hackathon I implemented Anthropic support in @llama_index TS: github.com/run-llama/LlamaIn… Overall it was really straightforward spent a bit over an hour and most of that was making sure the examples worked. 🧵

Anthropic support by yisding · Pull Request #52 · run-llama/LlamaIndexTS

github.com

11,199

Yi Ding -- prod/acc · Jun 27, 2024 · 11:24 PM UTC

Yi Ding -- prod/acc

@yi_ding

27 Jun 2024

Meet the AI Agent Avengers. @joaomdmoura @crewAIInc @mlejva @e2b_dev @rushing_andrei and Nikita Saurov.

4,656

Yi Ding -- prod/acc · Mar 14, 2024 · 2:48 PM UTC

Yi Ding -- prod/acc

@yi_ding

14 Mar 2024

create-llama is not only the easiest way to get started with @llama_index it’s the easiest way to bootstrap your RAG application period. If you haven’t tried it yet, all you need is 5 minutes today. Supports both Python and Typescript!

Marcus Schiesser @MarcusSchiesser

14 Mar 2024

`npx create-llama` lets you quickly generate LLM apps using @llama_index. Did you know you can open a generated project directly in VS Code? The video shows generating a FastAPI endpoint and opening it in a dev container. Code 🛠️: github.com/run-llama/create-…

7,346

Yi Ding -- prod/acc · Aug 23, 2023 · 10:24 PM UTC

Yi Ding -- prod/acc

@yi_ding

23 Aug 2023

A trend to watch: Retrieval in the Loop. Had a good discussion with @_superAGI @ishaanbhola @philipvollet @mlejva @silennai regarding this subject and finally got some time to get some thoughts down. We currently think of retrieval mostly as providing context in one shot, either at the beginning of a chat or the beginning of an agent process. However what we have found at @llama_index is that it’s often more useful to do retrieval at every step, either every chat in ContextChatEngine or every agent step in our Data Agents. Why? Because the LLM can almost inevitably benefit from new relevant context. Remember at this point ChatGPT’s knowledge cutoff is almost 2 years old. I saw @atroyn predict that in the future we will be doing retrieval on every token or even in the attention heads themselves. While I’m not sure they’ll go that far for latency reasons among others, I would very much welcome it if the LLM providers would formalize a “context message” just like the “system message” today. If you’re building a LLM application today the key thing to ask yourself is “would the LLM benefit from more context?” In most cases the answer is yes.

5,287

Yi Ding -- prod/acc · Jul 4, 2023 · 8:31 PM UTC

Yi Ding -- prod/acc

@yi_ding

4 Jul 2023

Replying to @FoolGreatest @lolitataub @PigPugHealth

The most well written contract in the world doesn’t beat having a trustworthy counterparty.

6,987

Yi Ding -- prod/acc · Aug 11, 2023 · 8:21 PM UTC

Yi Ding -- prod/acc

@yi_ding

11 Aug 2023

Even though I've only been on the team for 3 weeks, I have to say that the team at @llama_index is the fastest shipper, pound for pound, that I've seen in my career. To get this magnitude of a change in this quickly from inception to completion is 🔥.

LlamaIndex 🦙

@llama_index

11 Aug 2023

Replying to @llama_index

All credits go to @LoganMarkewich and @disiok for all the amazing work on 0.8.0. We'll be doing deeper feature highlights of all the changes in the upcoming week. End result: less tuning/customization, better out of the box support 🎁

6,643

Yi Ding -- prod/acc · Sep 1, 2023 · 4:30 AM UTC

Yi Ding -- prod/acc

@yi_ding

1 Sep 2023

Like many things in life, garbage in, garbage out. If you want your large language model, like GPT to provide good output, you need to be very careful to only give it the most relevant context as input.

4,287

Yi Ding -- prod/acc · Oct 31, 2023 · 7:13 PM UTC

Yi Ding -- prod/acc

@yi_ding

31 Oct 2023

For those of us in the AI space, this was the most interesting part of Apple's M3 announcement yesterday. You can now run the biggest open source LLM (Falcon with 180 billion parameters) with low quality loss on a 14 inch laptop. What a time to be alive. huggingface.co/TheBloke/Falc…

2,492

Yi Ding -- prod/acc · Nov 20, 2023 · 3:50 PM UTC

Yi Ding -- prod/acc

@yi_ding

20 Nov 2023

A few thoughts on the OpenAI situation after seeing some bad hot takes (my own included, which I've deleted.) 1. In my conversations with the people I know at OpenAI I've always been struck by how committed they are to the mission of creating safe AGI. The events of this weekend reinforce that for me. 2. It's clear that OpenAI had a rather large lead on the rest of the industry, and from what we've seen this weekend their next innovations looked like another massive step up from where we are today. 3. @satyanadella has once again proven that he's the best dealmaker in the tech world. As @yishan pointed out, he now owns both halves. 4. Despite that, I hope that the OpenAI team gets back together again. This technology is so important and that team was executing so well that it would be a shame for human disputes to slow down or halt that work. 5. I will continue on my personal, and our company's , mission to help productionize this technology so that it can provide the most benefit to the widest set of people on Earth. No matter what happens this week, the future is bright for all of us.

7,866

Yi Ding -- prod/acc · Aug 12, 2023 · 2:44 AM UTC

Yi Ding -- prod/acc

@yi_ding

12 Aug 2023

Coding in the car so we can have a family vacation this weekend. #StartupLife

5,092

Yi Ding -- prod/acc · Nov 14, 2023 · 5:48 PM UTC

Yi Ding -- prod/acc

@yi_ding

14 Nov 2023

Using LlamaIndex is now as simple as running npm create llama npm.im/create-llama Works whether you’re using Typescript or Python!

4,437

Yi Ding -- prod/acc · Aug 8, 2023 · 6:09 PM UTC

Yi Ding -- prod/acc

@yi_ding

8 Aug 2023

Is 4 bit about to be the standard for Large Language Models? Big implications for on device ML if true: A little primer: LLMs, like most deep learning models, are built with 16-bit floating point point numbers. 4 bit quantization reduces the memory use by 4x. Practically, that means that the smallest models, like Llama2-7B can be run using as little as 3.5GB of memory, or on the latest iPhone Pros without an internet connection. Maybe even non-Pros, although you're only left with 500MB for the rest of the system/apps. It also means that Llama2-13B can be run using 6.5GB of memory, or basically on almost any laptop for sale today. Of course, there's a tradeoff: a 4 bit floating point number can only represent 16 total values: 2^4 = 16. How possibly could these highly sophisticated models function when only given 16 values to work with? Through a compression mechanism called GPTQ. Full paper here: arxiv.org/pdf/2210.17323.pdf If you want to try out 4 bit models yourself, check out llama.cpp, exllama, and ollama.

Yi Ding -- prod/acc

@yi_ding

7 Aug 2023

Just found out @replicatehq is using @turboderp_'s 4bit quantization on their Llama2 70B endpoint by default now. github.com/a16z-infra/cog-ll… 🤯

8,066

Yi Ding -- prod/acc · Oct 2, 2023 · 4:25 PM UTC

Yi Ding -- prod/acc

@yi_ding

2 Oct 2023

Finally get to reveal some awesome news that I’ve been dying to share. Laurie Voss, @seldo, is joining us at @llama_index as VP of Developer Relations! While Laurie is someone who needs no introduction, a quote from one of his coworkers said it best “Nobody … fully understands how much Laurie was the brains and the conscience of NPM.” Excited to work with Laurie to make LlamaIndex into the best tool for data driven LLM applications! medium.com/llamaindex-blog/l…

6,294

Yi Ding -- prod/acc · Nov 11, 2023 · 1:22 AM UTC

Yi Ding -- prod/acc

@yi_ding

11 Nov 2023

Replying to @ArmandDoma

Pfft. I could put together a better team with governors of Illinois alone.

1,172

Yi Ding -- prod/acc · Sep 19, 2023 · 11:45 PM UTC

Yi Ding -- prod/acc

@yi_ding

19 Sep 2023

Google Bard now has retrieval augmented generation (RAG). Do you?

Sundar Pichai

@sundarpichai

19 Sep 2023

We’re adding extensions to Bard so you can connect it to your favorite Google apps including Gmail, Drive + Docs for even deeper collaboration. We’re also updating how we validate the claims in Bard’s responses with an improved “Google It” button + more. blog.google/products/bard/go…

4,056

Yi Ding -- prod/acc · Jul 28, 2023 · 4:12 PM UTC

Yi Ding -- prod/acc

@yi_ding

28 Jul 2023

The @llama_index webinar with @jxnlco on @OpenAI functions and @pydantic capped out at 100 participants 2 minutes in.. For those who can’t come in it’s being recorded and will be on YouTube soon! We’ll have to do another one!

4,901

Yi Ding -- prod/acc · Nov 22, 2023 · 6:06 AM UTC

Yi Ding -- prod/acc

@yi_ding

22 Nov 2023

Replying to @OpenAI

Did not have @LHSummers joining the OpenAI board on my bingo card.

10,529

Yi Ding -- prod/acc · Aug 23, 2023 · 3:55 PM UTC

Yi Ding -- prod/acc

@yi_ding

23 Aug 2023

First is best for retrieval when using ChatGPT models. Not only can info be "lost in the middle," GPT-3.5-turbo and GPT-4 both show significantly higher accuracy when it's the relevant info is in the first chunk. arxiv.org/abs/2307.03172

3,051

Yi Ding -- prod/acc · Sep 7, 2023 · 4:24 PM UTC

Yi Ding -- prod/acc

@yi_ding

7 Sep 2023

Inspired by the LLM hackathon we are hosting with @streamlit, we're having a "How to Win a LLM Hackathon" panel tomorrow (Friday, 9/8) at 9AM Pacific Time w/ @AlexReibman, @RahulParundekar, @CarolineFrasca, and yours truly. lu.ma/ps6rdkyl

How to Win a LLM Hackathon · Zoom · Luma

luma.com

7,645

Yi Ding -- prod/acc · Nov 9, 2023 · 7:57 PM UTC

Yi Ding -- prod/acc

@yi_ding

9 Nov 2023

Gotta step away to do some real work, but really agree with @simonw that both @OpenAI and @AnthropicAI should open source their RAG platforms. There are no AI safety issues here, and it would be a real boon to the entire AI application development community.

bruno @brunosavoca

9 Nov 2023

Replying to @yi_ding @OpenAI

is there any repo/docu to learn more about this approach from @AnthropicAI?

10,588

Yi Ding -- prod/acc · Dec 21, 2023 · 1:27 AM UTC

Yi Ding -- prod/acc

@yi_ding

21 Dec 2023

We are hiring at @llama_index! Don’t miss out on the opportunity to work at the center of LLMs and data.

LlamaIndex 🦙

@llama_index

21 Dec 2023

We’re looking for founding engineers! Specifically: Founding backend engineer 🧱: architecting and scaling cloud services, with data infra experience a bonus. Founding frontend/full-stack ⚒️: Full-stack work on our managed offerings + our open-source ecosystem (LlamaHub, LlamaIndex.TS). Experience with React, NextJS, TS/JS. Do you want to dive headfirst into the LLM developer ecosystem in a fast-moving sub-10 person team? This might be a good role for you! Check out our job descriptions below 👇 FS/FE eng JD: pretty-sodium-5e0.notion.sit… BE eng JD: pretty-sodium-5e0.notion.sit… Apply here: docs.google.com/forms/d/e/1F…

2,539

Yi Ding -- prod/acc · Apr 25, 2023 · 3:57 PM UTC

Yi Ding -- prod/acc

@yi_ding

25 Apr 2023

Replying to @anthonymacuk @MattBinder

It's not a big blue doofus badge. It's a tastefully sized indicator of a stupid, incompetent, or foolish person.

7,798

Yi Ding -- prod/acc · Sep 7, 2023 · 2:33 PM UTC

Yi Ding -- prod/acc

@yi_ding

7 Sep 2023

ChatGPT is a Sputnik moment. Every family, company, city and country, should be thinking about how to prepare themselves for the AI race.

2,366

Yi Ding -- prod/acc · Apr 2, 2024 · 1:42 AM UTC

Yi Ding -- prod/acc

@yi_ding

2 Apr 2024

Prediction: more AI engineers will be building with TS than Python in a few years.

Dave Font

@davefontenot

1 Apr 2024

Breakout AI startup looking for an AI engineer with typescript experience. If you want to kill google, DM your github.

5,777

Yi Ding -- prod/acc · Aug 31, 2023 · 3:32 PM UTC

Yi Ding -- prod/acc

@yi_ding

31 Aug 2023

Chinese microblogging platform Weibo, the Chinese equivalent of X, formerly known as Twitter, the future American equivalent of WeChat, formerly known as QQ, formerly known as OICQ, a clone of the Israeli ICQ.

Tianyu Fang @tianyuf

31 Aug 2023

“Chinese microblogging platform Weibo, the Chinese equivalent of X, formerly known as Twitter”

2,794

Yi Ding -- prod/acc · Sep 22, 2023 · 7:44 PM UTC

Yi Ding -- prod/acc

@yi_ding

22 Sep 2023

Something I didn’t mention last night: I wonder how many developers get confused because top_k in a LLM and top_k in a vector DB have very different effects. Something for #LLMQuirksMode part 2? Thanks for the invite @RahulParundekar @mlopscommunity @southpkcommons

3,335

Yi Ding -- prod/acc · Nov 28, 2023 · 5:31 PM UTC

Yi Ding -- prod/acc

@yi_ding

28 Nov 2023

Almost all of the most useful production LLM use cases use RAG under the hood @OpenAI GPTs, @GitHubCopilot @perplexity_ai, @bing Copilot, @Google Bard, @sweepai @mendableai, @AnthropicAI Claude. If you’re new to LLMs like ChatGPT learning RAG should be 1st priority.

Yi Ding -- prod/acc

@yi_ding

28 Jul 2023

5,616

Yi Ding -- prod/acc · Nov 6, 2023 · 11:21 PM UTC

Yi Ding -- prod/acc

@yi_ding

6 Nov 2023

Now that we know @OpenAI is doing vector similarity search internally, the (multi-) million dollar question for VCs is "which vector DB are they using?" Anyone know?

1,935

Yi Ding -- prod/acc · Nov 20, 2023 · 12:05 AM UTC

Yi Ding -- prod/acc

@yi_ding

20 Nov 2023

Replying to @ArmandDoma

I was confused too. Then I looked up Wikipedia and yeah… 🤷‍♂️

2,587

Yi Ding -- prod/acc · Oct 19, 2023 · 5:43 PM UTC

Yi Ding -- prod/acc

@yi_ding

19 Oct 2023

One of the most exciting areas of research is the idea of "Retrieval in the Loop" where every input to the LLM is augmented with retrieved chunks/features. Here we see a novel approach where the retrieval is built into the model and then the model is tuned for multi-turn chat.

Philipp Schmid

@_philschmid

19 Oct 2023

Can we pre-train LLMs with Retrieval Augmentation? 🤔 RETRO was a research by @GoogleDeepMind, which included retrieval into the pre-trainng process. Now @NVIDIA continues this research by scaling RETRO to 48B🤯 🧶

5,142

Yi Ding -- prod/acc · Aug 14, 2023 · 9:42 PM UTC

Yi Ding -- prod/acc

@yi_ding

14 Aug 2023

Good thread on the trade offs between smaller and larger chunk sizes and how to take advantage of both. Glad to see Langchain and us both working on this problem from different angles. SentenceWindowNodeParser is slightly more flexible IMHO (config # of chunks, no separate store) but I think there's a lot of interesting things still to be explored in this area.

Rohan

@rsrohan99

14 Aug 2023

Replying to @rsrohan99

The issue: - smaller chunks reflect more accurate semantic meaning after creating embedding - but they sometimes might lose the bigger picture and might sound out of context, making it difficult for the LLM to properly answer user's query with limited context per chunk.

11,512

Yi Ding -- prod/acc · Nov 6, 2023 · 10:07 PM UTC

Yi Ding -- prod/acc

@yi_ding

6 Nov 2023

Updated our LITS Playground with GPT-4-turbo. Try our RAG with the new model. github.com/run-llama/ts-play…

GitHub - run-llama/ts-playground

Contribute to run-llama/ts-playground development by creating an account on GitHub.

github.com

6,064