Why We Ditched RAG for Elasticsearch: Building a Production-Ready Sales Agent
January 31, 2026
The Trap of the “Default” Stack
When we started building our AI Sales Agent, the architectural choice seemed obvious. Like everyone else riding the GenAI wave, we reached for RAG (Retrieval-Augmented Generation) immediately. The blueprint is standard: chunk your data, embed it, store it in a Vector Database, and let semantic search handle the rest.
But as we moved from prototype to production, the reality of retail data hit us hard.
The Problem: When Semantic Similarity Fails
- The Upsert Nightmare: Prices change frequently. In a VectorDB, every price change requires re-embedding and upserting vectors. This process proved to be both computationally expensive and operationally impractical for real-time syncing.
- The “One-Word” Nuance: In retail, a single word variation in a product name (e.g., “Pro” vs. “Max” or “128GB” vs. “256GB”) can mean a drastic difference in price. Semantic similarity is great for concepts, but it struggles with this kind of rigid specificity. A vector search might return a product that looks semantically similar but has a completely different price tag.
In a sales context, hallucinating a price or retrieving the wrong SKU isn’t just a bug - it’s a deal-breaker.
The Pivot: Back to Elasticsearch
We realized we didn’t need “similarity”; we needed precision. We made the call to switch our core product retrieval engine to Elasticsearch.
This shift solved our immediate infrastructure headaches:
- Zero Embedding Costs: We stopped paying to embed millions of SKUs.
- Rapid Indexing: Text-based indexing is significantly faster than vector indexing.
- Real-Time Sync: We implemented CDC (Change Data Capture) to keep our Elasticsearch index in perfect sync with our transactional database.
The Real Challenge: The Agentic Workflow
Solving the storage problem was easy. The harder challenge was the retrieval logic. How do we ensure the Agent picks the exact right product without overwhelming the user?
We decided to stop treating the Agent as a search bar and start treating it like a human sales representative. We moved away from “One-Shot Search” to a Multi-Turn Filtering Strategy:
1. The “Investigative” Phase
Instead of executing a search immediately, the Agent analyzes the user’s intent. If the request is vague, the Agent is programmed to ask follow-up questions. It drills down until it gathers the Minimum Viable Information needed to construct a single, precise Elasticsearch query.
2. The “Discovery” Phase
If the user truly doesn’t know what they want, the Agent switches modes. It asks for just one key filter (e.g., budget or category), retrieves the Top-K results, and presents them as suggestions. This also allows us to handle “lowest price” queries deterministically - something vector search handles poorly.
The Hybrid Architecture
We didn’t kill RAG entirely. We just scoped it correctly.
Our final architecture is a hybrid system:
- Elasticsearch Tool: Handles all product lookups, pricing, and stock queries where precision is non-negotiable.
- Campaign Tool: A specialized tool to fetch and suggest active marketing campaigns.
- RAG Tool: Kept strictly for general knowledge, FAQs, and unstructured data unrelated to pricing.
Conclusion
This use case might seem specific, but the lesson is universal for AI engineers: Semantic search is not a silver bullet. When building Agents that deal with structured, high-stakes data like pricing, sometimes the “old school” stability of keyword search and structured queries beats the probabilistic magic of vectors.
The best AI Agent isn’t the one with the most complex vector store — it’s the one that knows how to ask the right questions.