This is a dumb take that tells me you never ran a RAG.
RAG and in long context LLM are 2 different things. Big big difference in cost, functionality, and applications.
RAG is deterministic.
LLM is stochastic.
RAG is about data, and it requires ETLs with cleaned, enriched data, and it can be close to real time.
LLM is about intelligence, it requires costly pre-training, fine tuning, and in context learning, and costly inference.
RAG is a simple vector search + semantic search + filters on top of a DB with an index, partitioning, sharding. That’s much faster.
LLM is a latent space traversal, it runs multiple vector searches, it’s mathematically slower than a vector search.
RAG is useful for summaries and sentiment analysis of large datasets, it’s good for time series charts. The answer of a RAG can be layered with SLM. Also useful for anything that is repeated a few times.
If you need to parse content, today you can manually use an LLM with large context, tomorrow we’ll have alpha evolve like LLMs that write code to parse the data.. like a RAG.
So no. RAG is not going anywhere.