SID.ai (YC S23) reposted this
70% of production queries fail with Naive RAG. Spoiler: It's not the retrieval model. No, that's not a typo. That's what we found after: - Processing over 800 billion tokens - Analyzing 10,000 real-world queries - Countless hours of research Sure, you can hit 98% accuracy in a demo with a curated dataset and softball questions. But the real world is messier, more complex, and far less forgiving… Let's break down 10,000 real-world queries: 1️⃣ Only 32% of queries are answerable by vector search alone. 2️⃣ 22% are "meta-queries" (e.g., "Show me recent emails about X") 3️⃣ 20% are off-topic, with no answer in the data store 4️⃣ 10% demand multiple sub-queries (e.g., "Compare X and Y") 5️⃣ 16% are other edge cases See the problem? Most naive RAG systems – relying solely on vector + keyword search – are shooting blanks 68% of the time. They're fundamentally limited, even with perfect search capabilities on that 32%. These systems not only choke on time-based queries and comparisons, they can't admit when they're out of their depth. Worse, they pollute unfit queries with irrelevant info, leading to hallucinations - confidently delivered as fact. It's simple to see why adding naive RAG can DECREASE your AI's accuracy. RAG still holds immense potential – the key is moving beyond basic implementations, towards more sophisticated systems that feature: - Advanced query understanding - Intelligent routing mechanisms - Meta-data query capabilities - Adaptive, context-aware processing - Robust off-topic detection - Among many others 🙂 Unsurprisingly – just like web search – RAG with high accuracy requires LOTS of long-tail engineering efforts. Thoughts? 🎥 Curious to learn more? Watch as my co-founder Max Rumpf breaks this down in detail in Humanloop High Agency Podcast!