AI Architect @qventus Prev LITS and Partnerships @llama_index, Messaging Apps @Apple, HFT @ GETCO, @Citadel

San Jose, CA
GPT-5.4: the most RLed joke teller yet. @Ed_Miliband levels of consistency.
3
811
OpenAI has put together a pretty good roadmap for building a production RAG system. h/t @benparr 1. Start with naive RAG. 2. Tune your chunks: your splitting, chunk sizes, but also small to big strategies where you retrieve more than you embed. Forget fine tuning embedding models (for now). 3a. Rerank the results of your retrievals. 3b. Classify. Here I'm speculating they're running a classifier on the type of query: bucketing questions about a particular part of one document questions vs. questions about the entire document vs. questions about multiple documents.* 4. Do prompt engineering, and layer on agentic flows to expand queries and use tools. The cool thing is to see how much performance improves when you combine all of these strategies. We have almost all of these pieces in @llama_index for you to try today. 1. Start with our 5 line naive RAG setup. 2. Tune splitter options and small to big strategies: docs.llamaindex.ai/en/stable… 3a. Reranking strategies: blog.llamaindex.ai/using-llm… 3b. Routing queries based on classification: replit.com/@LlamaIndex/Llama… 4. Data Agents and the SubQuestionQueryEngine: gpt-index.readthedocs.io/en/… *It's possible that they're also doing some classification of the returned chunks, which would be another interesting idea.
25
85
548
86,321
Are you using GPT-4 and wondering why it feels a bit more robotic these days? Like it knows only one joke? Fairly conclusive proof that @OpenAI is now caching its responses, and in a way that ignores the temperature setting.
32
47
444
159,679
Siri 2.0 is a RAG app.
Siri can read EVERY piece of data on your phone (for apps that opt in)
8
60
429
66,263
A few thoughts on how OpenAI is implementing RAG in the new Assistants Retrieval tool before I was locked out. 1. They're splitting on newlines. You can tell because they forget to insert the newlines back between the splits when giving you the reference (red squiggles). 2. The reference is contiguous. I haven't tried more complicated questions so can't rule out that they would do some kind of subquestion decomposition, but the queries I tried all just gave a single reference. 3. The number of chunks returned is variable. This indicates some kind of "small to big" strategy where they retrieve more context than they match. Perhaps something similar to our AutoMergingRetriever docs.llamaindex.ai/en/stable… w/ some kind of similarity cutoff but worth a bit more investigation. 4. They have some trouble handling UTF-8. In general, very bleeding edge still, but look forward to seeing how this evolves.
7
61
417
83,043
If you’re using RAG in @AnthropicAI models the best position to put your most relevant chunk is at the bottom. For ChatGPT it’s at the top.
8
35
328
81,157
RIP @ChatGPTapp Plugins. 🫗
9
34
278
53,938
Another good article about RAG vs. Finetuning and when to use RAG with finetuning. towardsdatascience.com/rag-v… One of the benefits of RAG that isn't well known: it helps reduce hallucinations. "So in applications where suppressing falsehoods and imaginative fabrications is vital, RAG systems provide in-built mechanisms to minimise hallucinations. The retrieval of supporting evidence prior to response generation gives RAG an advantage in ensuring factually accurate and truthful outputs."
2
53
230
39,125
A few takeaways from the @OpenAI Model Spec: cdn.openai.com/spec/model-sp… 1. GPT-5 and future models will be significantly better at decision making and instruction following. The sheer number of conditions here, some almost contradictory, is impressive. 2. Multiple levels of instructions, and the user won't be able to access to most of the top levels. system messages are now developer messages, and there's a layer on top called platform messages that can only be changed by OpenAI. This change was done for safety/security reasons, but if not done well could lead to overrefusals like we saw with Llama 2. 3. Customized outputs based on "settings." "interactive" and "max_tokens" will now directly affect the output of the models, meaning there will no longer be one canonical "GPT-5" response. Previously max_tokens only truncated but now it affects brevity. 4. YAML, JSON, and XML are explicitly called out and treated differently by the model than other parts of the prompt. This is called out for avoiding prompt injection, but could also mean that prompts constructed with these types of inputs will perform better in future models. 5. There is some indication that some of the rules can be overridden. It'll be interesting to test which ones are more strongly held and which ones are more weakly held by the model. The good: I think this is a good indication that GPT-5 is going to be a significant step up in reasoning capabilities. It's also an indication that OpenAI continues to take safety seriously, although the track record has been that every level of filtering and safety they've put in so far has been able to be worked around up until now. Will be interesting to see whether the new prompt hierarchy puts that problem to rest once and for all. The bad: For application developers, this may be an indication that GPT output will become even more unpredictable and opaque. The many layers of rules will undoubtedly cause corner cases where previously tuned programs and prompts stop working. The ugly: Even with 4 levels of prompts now, there's no dedicated context or RAG message.
4
34
198
47,466
How does @OpenAI's RAG do on PDFs? First test: table extraction. As we found with secinsights.ai, handling tables often requires special logic. @AnthropicAI clearly does that, and OpenAI Assistants doesn't, so most of the numbers it outputs are wrong.
7
30
188
37,118
Please don’t do this. You don’t want your customer’s first impression of your product to be “my boss just got a wtf from the security/branding/marketing/partnerships team”
2
1
178
4,727
ChatGPT now has built in Retrieval Augmented Generation (RAG)! Can’t wait to put it through its paces!
8
19
170
56,963
What's the difference between the temperature, top P, and top K settings in LLMs? This 2020 @huggingface article does probably the best job I've seen in terms of explaining what goes on when you adjust those settings. huggingface.co/blog/how-to-g…
1
36
144
52,828
Some people may have noticed that LLMs tend to lose some of their intelligence when temperature is set to 0. This article does a good job of explaining why. It was so convincing that we changed our default temperature in @llama_index Python and TS to 0.1.
What's the difference between the temperature, top P, and top K settings in LLMs? This 2020 @huggingface article does probably the best job I've seen in terms of explaining what goes on when you adjust those settings. huggingface.co/blog/how-to-g…
5
18
145
39,734
Great article on the pros and cons of fine tuning. Too often I see people reaching for fine tuning when they should really try k-shot prompting and retrieval augmented generation first, tidepool.so/2023/08/17/why-y…
3
21
141
25,426
Replying to @ArmandDoma
Check out the demographics for the UAE.
6
118
9,123
First take: Llama 3.2 is going to really expand the ColPali-style retrieval strategies. ColPali is currently built on a 3B vLLM (PaliGemma), so super interested to see how it scales to 90B.
6
14
133
13,090
Is the @GoogleDeepMind Gemini announcement the first time prompt engineering is so prominently advertised in a LLM? The CoT here stands for “chain of thought” and it looks like it’s one of the key reasons for Gemini’s record breaking performance.
15
14
113
22,067
Simple AI Agent math: On a 10 step agent loop: If your LLM error rate is 5%, then your overall success rate is 0.95^10 = 60% If error rate is 10%, then your success rate is 0.9^10 = 35% But if you can just reduce your error rate to 3% (w/ retrieval) success rate becomes 74%
14
11
116
30,115
If you’re new to using LLMs like ChatGPT, it’s important to know what they are or are not good at. Good at: Text generation with recall (RAG) Entity extraction Coding Decision making with <5 choices Not good at: Math Numerical tasks (counting) Decision making with >20 choices
15
12
108
20,303
Got to show off @seldo's @MongoDB starter at the hackathon today. github.com/run-llama/mongodb… Complete LlamaIndex RAG setup including Flask backend and Next frontend and one line deployment to @render
25
104
17,980
The @GoogleDeepMind Gemini paper introduces this concept of "uncertainty-routed" Chain of Thought. Still trying to digest it, but looks like the Gemini model does significantly better once "uncertainty-routing" is added to the technique. Why doesn't GPT-4 benefit in the same way? Is it because GPT-4 (and other OpenAI models) no longer return token probabilities? Need to dig into it more. Full paper here: storage.googleapis.com/deepm… Experts feel free to chime in.
3
9
95
20,781
The RAG/Prompt Engineering vs. Finetuning/Distillation question hinges on whether LLMs are still improving exponentially or we have reached a plateau. If we are still in exponential growth then RAG/Prompt Engineering is more likely to pay off and vice versa if we’ve plateaued.
7
9
94
26,288
What should you do if you're retrieving over multiple documents that have similar, but distinct content? Here, @thesourabhd shows off what I consider one of @llama_index's less known superpowers: the SubQuestionQueryEngine. By wrapping the documents into tools and letting the LLM choose which document to query, we ensure that $AAPL's financials aren't confused for $MSFT's. Check out his full video here: piped.video/watch?v=2O52Tfj7… and our documentation for the SubQuestionQueryEngine here: ts.llamaindex.ai/modules/hig…
2
13
88
16,951
Replying to @garybasin
It’s a better indication of the noise/biases of interview process than anything. Anybody who thinks they can determine someone, much less 99% of the people, they spend an hour talking to is “is dead weight” is wildly overconfident in their own abilities.
2
2
80
6,227
Saw that we just hit 1K stars on secinsights.ai! 🚀 If you're interested in building a production ChatGPT application with data (RAG), you definitely want to check it out. github.com/run-llama/sec-ins…
11
82
7,509
So perhaps the most interesting thing to come out of the spelunking I did yesterday into OpenAI’s RAG architecture is the revelation that it looks like they’re not embedding the whole query. Instead they’re doing almost a sort of an advanced keyword extraction before embedding. I didn’t find any explicit instructions on how they do this. @jxnlco thinks it may have been fine tuned into the model and I’m tempted to agree with him. This goes against pretty much every RAG query embedding example I’ve seen up until this point. Does it actually help performance? In any case it deserves more exploration.
3
10
80
11,368
It's bittersweet to leave the Messaging Apps team at Apple, a team I co-founded 7 years ago. It was, and still is, an incredible team of engineers. Over the years, we shipped Apple's official account on WeChat, the Apple Business Chat chatbot, and two weeks ago Apple's WeChat mini-program. cnbc.com/2023/07/11/apple-la… Internally we were known as the cowboys 🤠, and I loved our posse. But this last year, the world has changed with the explosion of capabilities from Large Language Models, and I knew that this would be the the defining tech of the decade. So when the call came, I had to join @llama_index LlamaIndex is not only one of the open source leaders in the LLM space, it's also a really amazing community. When I talk to @jerryjliu0 and @disiok their commitment to quality, developer experience, and listening to customers is always in our conversations. So I'm really excited to build LlamaIndex.TS (we shipped this Monday!) and DevRel with them. LFG
11
3
75
21,262
2023 is the year of the AI demo. 2024 is the year we will see LLM backed applications widely deployed in production.
5
6
66
6,257
After 6 weeks at @llama_index all of the following are true: It's the most work I've taken on in my career. It's the most interesting work of my career. It's the most work/day I've completed in my career. It's the most behind I've been on my work in my career. Sounds fun? Join us! dub.sh/llama
1
4
68
17,008
Bit of a personal milestone for me. Up until earlier this year, I spent my time on this site without a profile picture or a bio, primarily to read the insights of others w/ <200 followers. A testament to the LlamaIndex and AI community for how quickly that changed. That said, if you have any account recommendations, please list them below. Would love to follow another couple hundred folks who I can learn more from.
9
1
65
7,256
How was your week? My answer, for almost every week of the last year at @llama_index, was "it's been a crazy week." Well, today, it's the last of those weeks, and I cannot help feeling a little sad to be leaving such an amazing team. 🙏 Jerry and Simon for the opportunity.
13
63
6,851
🪙Tokens are one of the first things people hear about when they come across large language models like ChatGPT. What is a token? a sequence of unicode characters converted into a number. These numbers are then how the text you send to the LLM are represented internally inside the model. For example, if we use the @OpenAI tokenizer, platform.openai.com/tokenize…, we see that the first three words of this tweet are separated into three tokens: ["Tokens", " are", " one"]. However, not all words are like that: for example, if you try the word strawberry, you may see that the word is separated into three words: ["st", "raw", "berry"]. But wait, if you just add a space before the word, it ends up being one token: [" strawberry"]. 🍓 What's going on here? This is one of the quirks of the way tokenization works. Spaces and punctuation also need to be tokenized. So GPT is trained on a lot of sentences with the word strawberry in the middle, but fewer sentences with the word strawberry, uncapitalized, at the beginning of a sentence or paragraph. This is a good reason why you don't want to have extraneous spaces (and depending on what you're trying to do newlines) at the end of your prompts. By leaving a space at the end of your prompt, you make it harder for the model to complete with many common words.
6
7
59
25,968
What is RAG? Retrieval Augmented Generation is three intimidating words for newcomers to the world of ChatGPT and LLMs. Today, @tonykipkemboi and I coined an acronym live at @streamlit. BOWS: Better Output with Search. Thanks @CarolineFrasca! Very fun! piped.video/watch?v=PLKkudXY…
2
9
54
8,688
What are the best ways to keep LLM responses concise? Llama2 ignores "Please be concise" or even "PLEASE BE CONCISE!" But "Prefer shorter answers. Keep your response to 100 words or less." seems to work OK. github.com/run-llama/LlamaIn…
4
9
53
6,711
Interesting change from @OpenAI with consequences for RAG: max_tokens will affect generated output in the future.
2
5
50
25,490
Interestingly, it looks like the caching is rather aggressive. So the LLM responds with the same joke whether you say "Tell me a joke." or "Tell me a new joke." This indicates OpenAI is also doing some kind of query clustering, and not just 1 to 1 query->response caching.
8
1
51
6,781
Replying to @netcapgirl
More like k dot vs. vanilla ice
1
45
26,152
There are basically three groups of vector databases today: 1. Vector DBs built from scratch. 2. Vector DBs built on top of existing DBs 3. Vector search built on top of existing search platforms It'll be really interesting to see which use cases end up using which.
8
4
49
10,942
Do LLMs need intrinsic facts for decision making and what does that mean for RAG (data driven LLM apps)? LLMs store a lot of facts in their parameters. Many more than your average human knows, from string theory to ballroom dance. Some people, including @sama, have suggested that in the future we may be able to remove those facts from the LLM itself, distill the LLM to a pure reasoning engine and using RAG, allow the LLM to then retrieve only the facts it needs. Is this possible? Not currently and I’m also skeptical long term. From what we know about human decision making, knowledge acquisition is an integral part of making good decisions. You cannot put a super intelligent grade schooler in a corporate role and expect them to do as well or better than a veteran (I was in a gifted program so knew some very intelligent grade schoolers.) My bet is that LLMs will always need some baseline of intrinsic facts. What does that mean for RAG? When you are building a RAG application, there is always a tension between what a model “knows” and what data you give it. Oftentimes we want to instruct the model to only consider the data we give it but it can lead to degradation in performance in certain use cases if we are too strict (as we found out when we pushed a new prompt in 0.8 and then reverted). If we aren’t strict enough in the prompt then the model may “hallucinate.” Unfortunately there’s no clear answer at the moment to the exact balance between letting or preventing the LLM from using its intrinsic facts. This is something we are actively working on at @llama_index but in the meantime the key to production RAG will be to properly tune the strictness to your own data and use cases.
3
8
48
11,915
What’s Retrieval Augmented Generation? Search, Give, Get. For those of us coming from a traditional software development background RAG can sound intimidating, but it really is a simple concept: Search for the relevant data Give the data to GPT Get a better response Of course, finding the most relevant data is not straightforward. That’s why if you’re new to LLM App development it’s best to use a framework like @llama_index or ts.llamaindex.ai Best part? It’s all open source so you can see exactly what we are doing.
2
9
51
23,028
Another @scale_AI hackathon in the books! Built a tool to convert any typescript function with JSDoc into a callable @OpenAI GPT3.5/4 function. Thanks @amasad for the inspiration and push to get this started, and @atroyn for the push to clean this up. Source coming this week!
3
50
10,133
It looks like @OpenAI's code interpreter runs in a Jupyter notebook. So does @juliusai. So have we as an industry decided that Jupyter notebooks are the best way to sandbox potentially malicious Python code? What a wild couple of decades for @IPythonDev
4
2
48
2,148
OK, absolutely worst time to publish this, but just pushed a new version of @llama_index TS with day one support for @AnthropicAI Claude 2.1 including a new Anthropic specific RAG template. npm.im/llamaindex Lots of other great community contributions also. Will wait to highlight them until after everyone's finished digesting/celebrating the Sam/Greg/OpenAI news.
2
3
48
11,042
Replying to @emilychangtv
Leaked footage of @sama negotiating.
45
18,907
Replying to @amasad
Wasn't that excited until I saw that it was Python compatible. 🔥 Could be the next Kotlin.
1
43
18,698
Prediction: "small to big": embed smaller, context bigger will become the default retrieval augmented generation technique by 2024. Great job @jxnlco and team.
This might be the first time ChatGPT (+@jxnlco) helped us come up with a better retrieval algorithm for RAG: 1️⃣ Create a hierarchy/graph of “parent chunks” -> smaller chunks. Also link adjacent chunks together. 2️⃣ During query-time, first retrieve smaller chunks with embedding similarity. 3️⃣ Merge leaves: If any subset of these chunks is a major portion of a larger chunk, return the parent chunk instead. Result 💡: Dynamically retrieve less disparate / larger contiguous blobs of context *only when you need it*. Helps the LLM synthesize better results, but avoids always cramming in as much context as you can. We’ve implemented these ideas in @llama_index. We created a HierarchicalNodeParser to parse unstructured text into a node hierarchy, and then a AutoMergingRetriever to “merge in parent chunks” during query-time. Full guide here: gpt-index.readthedocs.io/en/… Again, full credits for this idea go to @jxnlco - not only Python wizard, but also a ChatGPT Code Interpreter whisperer 🪄
1
6
45
10,352
Not sure why this tweet hasn't gotten more attention, but looks like @OpenAI presented some info on how they improved their RAG pipeline. A good roadmap think about how to optimize RAG for production, with a lot of the steps we've been evangelizing at @llama_index
.@openAI has put in a lot of work into RAG (retrieval augmented generation). Accuracy has jumped from 45% to 98% with things like retaining, prompt engineering, and query expansion.
3
4
43
4,207
Replying to @yi_ding @OpenAI
If you set a high n value, you can still get the "raw" GPT-4 response. Another sign that the response is not cached is that your latency goes up a lot. h/t @goodside for the temp=2 trick.
1
40
7,350
If you’re building an agent today it needs to be on GPT-4. Because agents compound LLM errors, small differences in single trip capability become glaringly obvious.
5
3
40
6,450
Context windows🪟: You may see models advertised as 4K, 8K, 16K, 32K, 100K. One of the key things to remember about context windows is that they count the input and output combined. The other thing to remember is they're based on tokens, not words. What that means is that the more you input, the less you can receive as output. If you're using Anthropic 100K this might not matter so much, but if you're using the default ChatGPT (3.5-turbo) model, you're limited to 4096 tokens. And you will be rather abruptly reminded of that fact, because if you set max_tokens to a larger number it will simply refuse to process the request at all. So context window management becomes a rather important part of building LLM applications. If the input is too large, then not enough space is left for the output. If we ask the LLM to provide too large of an output, it may error out. Thankfully, frameworks like @llama_index (Python and TS) have already solved the context window management problem for you.
6
4
43
8,538
Happy 2nd birthday to ChatGPT's knowledge cutoff! 🎂
2
10
41
3,242
“It’s really easy to build a cool demo. It’s really hard to build a robust product.” ⁦@DrJimFan⁩ ⁦@NotionHQ
3
5
39
4,419
Cool article about building a PandasAI-like interface using @llama_index's Pandas Query Engine. Personal note: my ML journey started with R dataframes, and it's amazing to see that many years later the data structure has held up. kdnuggets.com/build-your-own…
1
8
38
5,113
Retrieval augmented generation is basically giving a talk to a LLM and then having a pop quiz at the end.
1
6
36
4,101
Move over JSON/YAML. Markdown seems to be the clear winner when it comes to LLM interactions. Thank you Aaron. 🙏
3
4
37
7,599
2024 will be the year of LLMs in production. Those of us building tools will need to be ready for the shift.
Replying to @arizeai
1. LLM adoption has reached a tipping point. Over two-thirds (66.9%) of developers and ML teams are planning production deployments of #llm applications in the next 12 months or “as fast as possible” – and 14.1% are already in production.
5
34
5,868
Our @llama_index playground llama-playground.vercel.app/ is a great place to get started with your RAG journey. We made it even easier to use: - Temperature and Top P options. 🙏 @PusarlaSamay - Tooltips with plain lang explanations. Did you know ChatGPT's temperature goes up to 2?
1
4
33
4,295
Replying to @itsandrewgao
My guesses? Quantization, caching, speculative execution, and maybe a bit of distillation also.
Q: Do you think ChatGPT is using 4 bit quantization today?
2
3
32
12,267
TL;DR: if you’re using LLMs to make decisions then it’s agentic.
❓What is an agent? I get asked this question a lot, so I wrote a little blog on this topic and other things: - What is an agent? - What does it mean to be agentic? - Why is “agentic” a helpful concept? - Agentic is new Check it out here: blog.langchain.dev/what-is-a…
4
3
34
15,493
One of the hard things with remote learning is finding opportunities to meet your professors. In my case, I finally got some office hours with @AndrewYNg after 12 years. 🙏 @philipvollet for the photo and @DeepLearningAI_ for the hospitality. Learners first!
3
1
33
10,591
Going to give away some prod alpha tonight at my "LLM Quirks Mode" talk @mlopscommunity Thanks for the invite @RahulParundekar and look forward to the other talks by nbox.ai and run.ai lu.ma/llmops-overcoming-hurd…
3
11
33
6,371
What we've all been waiting for. GPT-4-0125-preview finally has a new joke!
4
4
30
6,065
One of the things we want to do with @llama_index TS is to help you see how LlamaIndex can help you choose the right text splitting, retrieval, and combination strategy. To that end, we're building llama-playground.vercel.app/. OSS and built with @nextjs and @shadcn
3
8
33
11,113
Looks like GPT-4 is doing pre-processing on the query before sending the search to the backend. @jxnlco you were right.
1
33
4,925
In honor of the @AnthropicAI Claude 2 @cerebral_valley hackathon I implemented Anthropic support in @llama_index TS: github.com/run-llama/LlamaIn… Overall it was really straightforward spent a bit over an hour and most of that was making sure the examples worked. 🧵
1
6
30
11,199
Meet the AI Agent Avengers. @joaomdmoura @crewAIInc @mlejva @e2b_dev @rushing_andrei and Nikita Saurov.
1
3
30
4,656
create-llama is not only the easiest way to get started with @llama_index it’s the easiest way to bootstrap your RAG application period. If you haven’t tried it yet, all you need is 5 minutes today. Supports both Python and Typescript!
`npx create-llama` lets you quickly generate LLM apps using @llama_index. Did you know you can open a generated project directly in VS Code? The video shows generating a FastAPI endpoint and opening it in a dev container. Code 🛠️: github.com/run-llama/create-…
1
7
31
7,346
A trend to watch: Retrieval in the Loop. Had a good discussion with @_superAGI @ishaanbhola @philipvollet @mlejva @silennai regarding this subject and finally got some time to get some thoughts down. We currently think of retrieval mostly as providing context in one shot, either at the beginning of a chat or the beginning of an agent process. However what we have found at @llama_index is that it’s often more useful to do retrieval at every step, either every chat in ContextChatEngine or every agent step in our Data Agents. Why? Because the LLM can almost inevitably benefit from new relevant context. Remember at this point ChatGPT’s knowledge cutoff is almost 2 years old. I saw @atroyn predict that in the future we will be doing retrieval on every token or even in the attention heads themselves. While I’m not sure they’ll go that far for latency reasons among others, I would very much welcome it if the LLM providers would formalize a “context message” just like the “system message” today. If you’re building a LLM application today the key thing to ask yourself is “would the LLM benefit from more context?” In most cases the answer is yes.
2
7
29
5,287
The most well written contract in the world doesn’t beat having a trustworthy counterparty.
1
3
29
6,987
Even though I've only been on the team for 3 weeks, I have to say that the team at @llama_index is the fastest shipper, pound for pound, that I've seen in my career. To get this magnitude of a change in this quickly from inception to completion is 🔥.
Replying to @llama_index
All credits go to @LoganMarkewich and @disiok for all the amazing work on 0.8.0. We'll be doing deeper feature highlights of all the changes in the upcoming week. End result: less tuning/customization, better out of the box support 🎁
1
2
29
6,643
Like many things in life, garbage in, garbage out. If you want your large language model, like GPT to provide good output, you need to be very careful to only give it the most relevant context as input.
3
3
27
4,287
For those of us in the AI space, this was the most interesting part of Apple's M3 announcement yesterday. You can now run the biggest open source LLM (Falcon with 180 billion parameters) with low quality loss on a 14 inch laptop. What a time to be alive. huggingface.co/TheBloke/Falc…
4
8
29
2,492
A few thoughts on the OpenAI situation after seeing some bad hot takes (my own included, which I've deleted.) 1. In my conversations with the people I know at OpenAI I've always been struck by how committed they are to the mission of creating safe AGI. The events of this weekend reinforce that for me. 2. It's clear that OpenAI had a rather large lead on the rest of the industry, and from what we've seen this weekend their next innovations looked like another massive step up from where we are today. 3. @satyanadella has once again proven that he's the best dealmaker in the tech world. As @yishan pointed out, he now owns both halves. 4. Despite that, I hope that the OpenAI team gets back together again. This technology is so important and that team was executing so well that it would be a shame for human disputes to slow down or halt that work. 5. I will continue on my personal, and our company's , mission to help productionize this technology so that it can provide the most benefit to the widest set of people on Earth. No matter what happens this week, the future is bright for all of us.
3
1
29
7,866
Coding in the car so we can have a family vacation this weekend. #StartupLife
4
1
28
5,092
Using LlamaIndex is now as simple as running npm create llama npm.im/create-llama Works whether you’re using Typescript or Python!
2
5
30
4,437
Is 4 bit about to be the standard for Large Language Models? Big implications for on device ML if true: A little primer: LLMs, like most deep learning models, are built with 16-bit floating point point numbers. 4 bit quantization reduces the memory use by 4x. Practically, that means that the smallest models, like Llama2-7B can be run using as little as 3.5GB of memory, or on the latest iPhone Pros without an internet connection. Maybe even non-Pros, although you're only left with 500MB for the rest of the system/apps. It also means that Llama2-13B can be run using 6.5GB of memory, or basically on almost any laptop for sale today. Of course, there's a tradeoff: a 4 bit floating point number can only represent 16 total values: 2^4 = 16. How possibly could these highly sophisticated models function when only given 16 values to work with? Through a compression mechanism called GPTQ. Full paper here: arxiv.org/pdf/2210.17323.pdf If you want to try out 4 bit models yourself, check out llama.cpp, exllama, and ollama.
Just found out @replicatehq is using @turboderp_'s 4bit quantization on their Llama2 70B endpoint by default now. github.com/a16z-infra/cog-ll… 🤯
1
6
29
8,066
Finally get to reveal some awesome news that I’ve been dying to share. Laurie Voss, @seldo, is joining us at @llama_index as VP of Developer Relations! While Laurie is someone who needs no introduction, a quote from one of his coworkers said it best “Nobody … fully understands how much Laurie was the brains and the conscience of NPM.” Excited to work with Laurie to make LlamaIndex into the best tool for data driven LLM applications! medium.com/llamaindex-blog/l…
2
4
29
6,294
Replying to @ArmandDoma
Pfft. I could put together a better team with governors of Illinois alone.
1
27
1,172
Google Bard now has retrieval augmented generation (RAG). Do you?
We’re adding extensions to Bard so you can connect it to your favorite Google apps including Gmail, Drive + Docs for even deeper collaboration. We’re also updating how we validate the claims in Bard’s responses with an improved “Google It” button + more. blog.google/products/bard/go…
2
1
28
4,056
The @llama_index webinar with @jxnlco on @OpenAI functions and @pydantic capped out at 100 participants 2 minutes in.. For those who can’t come in it’s being recorded and will be on YouTube soon! We’ll have to do another one!
2
4
27
4,901
Replying to @OpenAI
Did not have @LHSummers joining the OpenAI board on my bingo card.
25
10,529
First is best for retrieval when using ChatGPT models. Not only can info be "lost in the middle," GPT-3.5-turbo and GPT-4 both show significantly higher accuracy when it's the relevant info is in the first chunk. arxiv.org/abs/2307.03172
3
29
3,051
Inspired by the LLM hackathon we are hosting with @streamlit, we're having a "How to Win a LLM Hackathon" panel tomorrow (Friday, 9/8) at 9AM Pacific Time w/ @AlexReibman, @RahulParundekar, @CarolineFrasca, and yours truly. lu.ma/ps6rdkyl
2
6
27
7,645
Gotta step away to do some real work, but really agree with @simonw that both @OpenAI and @AnthropicAI should open source their RAG platforms. There are no AI safety issues here, and it would be a real boon to the entire AI application development community.
Replying to @yi_ding @OpenAI
is there any repo/docu to learn more about this approach from @AnthropicAI?
1
4
27
10,588
We are hiring at @llama_index! Don’t miss out on the opportunity to work at the center of LLMs and data.
We’re looking for founding engineers! Specifically: Founding backend engineer 🧱: architecting and scaling cloud services, with data infra experience a bonus. Founding frontend/full-stack ⚒️: Full-stack work on our managed offerings + our open-source ecosystem (LlamaHub, LlamaIndex.TS). Experience with React, NextJS, TS/JS. Do you want to dive headfirst into the LLM developer ecosystem in a fast-moving sub-10 person team? This might be a good role for you! Check out our job descriptions below 👇 FS/FE eng JD: pretty-sodium-5e0.notion.sit… BE eng JD: pretty-sodium-5e0.notion.sit… Apply here: docs.google.com/forms/d/e/1F…
3
2
25
2,539
It's not a big blue doofus badge. It's a tastefully sized indicator of a stupid, incompetent, or foolish person.
8
22
7,798
ChatGPT is a Sputnik moment. Every family, company, city and country, should be thinking about how to prepare themselves for the AI race.
1
5
20
2,366
Prediction: more AI engineers will be building with TS than Python in a few years.
Breakout AI startup looking for an AI engineer with typescript experience. If you want to kill google, DM your github.
5
1
27
5,777
Chinese microblogging platform Weibo, the Chinese equivalent of X, formerly known as Twitter, the future American equivalent of WeChat, formerly known as QQ, formerly known as OICQ, a clone of the Israeli ICQ.
“Chinese microblogging platform Weibo, the Chinese equivalent of X, formerly known as Twitter”
1
6
25
2,794
Something I didn’t mention last night: I wonder how many developers get confused because top_k in a LLM and top_k in a vector DB have very different effects. Something for #LLMQuirksMode part 2? Thanks for the invite @RahulParundekar @mlopscommunity @southpkcommons
2
2
27
3,335
Almost all of the most useful production LLM use cases use RAG under the hood @OpenAI GPTs, @GitHubCopilot @perplexity_ai, @bing Copilot, @Google Bard, @sweepai @mendableai, @AnthropicAI Claude. If you’re new to LLMs like ChatGPT learning RAG should be 1st priority.
What’s Retrieval Augmented Generation? Search, Give, Get. For those of us coming from a traditional software development background RAG can sound intimidating, but it really is a simple concept: Search for the relevant data Give the data to GPT Get a better response Of course, finding the most relevant data is not straightforward. That’s why if you’re new to LLM App development it’s best to use a framework like @llama_index or ts.llamaindex.ai Best part? It’s all open source so you can see exactly what we are doing.
1
9
24
5,616
Now that we know @OpenAI is doing vector similarity search internally, the (multi-) million dollar question for VCs is "which vector DB are they using?" Anyone know?
7
24
1,935
Replying to @ArmandDoma
I was confused too. Then I looked up Wikipedia and yeah… 🤷‍♂️
3
23
2,587
One of the most exciting areas of research is the idea of "Retrieval in the Loop" where every input to the LLM is augmented with retrieved chunks/features. Here we see a novel approach where the retrieval is built into the model and then the model is tuned for multi-turn chat.
Can we pre-train LLMs with Retrieval Augmentation? 🤔 RETRO was a research by @GoogleDeepMind, which included retrieval into the pre-trainng process. Now @NVIDIA continues this research by scaling RETRO to 48B🤯 🧶
2
25
5,142
Good thread on the trade offs between smaller and larger chunk sizes and how to take advantage of both. Glad to see Langchain and us both working on this problem from different angles. SentenceWindowNodeParser is slightly more flexible IMHO (config # of chunks, no separate store) but I think there's a lot of interesting things still to be explored in this area.
Replying to @rsrohan99
The issue: - smaller chunks reflect more accurate semantic meaning after creating embedding - but they sometimes might lose the bigger picture and might sound out of context, making it difficult for the LLM to properly answer user's query with limited context per chunk.
2
3
25
11,512