Thrilled to release 🌟STaRK 🌟 - A large-scale LLM retrieval benchmark on semi-structured knowledge bases.
While LLMs excel at reasoning and semantic retrieval, they struggle with more complex tasks. Especially when real-world user queries require a combination of unstructured (text) and structured (relational) knowledge. How do we track the capability of LLM-driven systems in handling such tasks❓
STaRK features large-scale knowledge bases with natural-sounding, diverse, and useful queries, which involves a blend unstructured and structured information. Moreover, we develop 📷 an automatic pipeline to generate the ground truth answers.
STaRK presents significant opportunities by offering a comprehensive retrieval testbed on 🎁product recommendations, 📜academic paper searches, and 💊precision medicine inquiries.
Spoiler Alert!
With STaRK, we found that current LLM retrieval systems CANNOT accurately retrieve information
➡️ More powerful systems are needed!
Preprint:
arxiv.org/abs/2404.13207
Github:
github.com/snap-stanford/sta…