CAI Technology
Menu ☰
rag · · 11 min read

RAG vs BM25 keyword search: when each is the right call

A CTO decision matrix: when fast, cheap BM25 keyword search beats hybrid RAG, and when investing in semantic RAG is justified.

CAI Technology · Last reviewed: 4/30/2026
RAG vs BM25 keyword search: when each is the right call

RAG vs BM25 keyword search: when semantic investment pays off and when it does not

One of the most expensive mistakes we see at clients is the implicit assumption that „RAG is superior to classic search”. In specific scenarios, yes. In others, a plain BM25 running on Elasticsearch with a decent tokeniser delivers better, faster, cheaper, and more auditable results than a RAG pipeline with embeddings, vector DB, and reranker.

This article gives executives a clear matrix: which use case calls for classic keyword search, when to move to hybrid, and when full semantic RAG investment is justified.

TL;DR

Three search types, three different profiles

BM25 is the standard algorithm in lexical search engines (Elasticsearch, OpenSearch, Solr). It runs on statistical principles: TF (term frequency), IDF (inverse document frequency), document-length normalisation. No embeddings, no neural networks.

Works excellently when:

Limitations:

Semantic RAG (dense embeddings + LLM)

Semantic RAG uses dense embeddings to capture meaning, not letters. An encoder turns query and documents into vectors; cosine similarity measures semantic closeness.

Works excellently when:

Limitations:

Hybrid (BM25 + dense, fused)

Hybrid search runs both algorithms in parallel and fuses results using Reciprocal Rank Fusion (RRF) or learned weights. In production this is the industry standard for large corpora and varied queries.

Works excellently when:

A concrete decision matrix

Based on experience across more than 30 client search projects, we offer the following guide:

Use caseCorpus volumeRecommendation
Corporate-website search bar1K–10K pagesBM25 with manual synonyms
Help centre with FAQ100–1000 articlesBM25 + synonyms + manual redirect
Internal knowledge base (Confluence/SharePoint)10K–100K pagesHybrid (BM25 + dense, no reranker)
Legal / fiscal chatbot100K–10M documentsFull semantic RAG (hybrid + reranker + grounding)
E-commerce product search10K–1M productsHybrid + filters + popularity rerank
Procurement / SEAP search100K–1M noticesHybrid + filters + reranker
Internal code assistant10K–100K filesSemantic RAG + AST chunking

How to compute the ROI

The RAG-vs-BM25 ROI is not abstract. It is computed on three axes:

1. Quality

2. Cost

3. Latency

For a system running 100,000 queries/month:

The gap between hybrid and full RAG with LLM is not in search itself — it is in the cost of generating the answer. If users want only the list of relevant documents (classic search), hybrid is enough and cheap. If users want a synthesised answer (chatbot), the LLM cost appears.

Signals from real usage

How do you know it is time to move from BM25 to hybrid or RAG? These are the signals we track with clients:

Conversely, signals you do NOT need RAG:

The „state-of-the-art” trap

A common mistake: technical teams want to deploy full RAG because it is „modern” and because they want to use embeddings. The result: increased cost, operational complexity, no business gain.

Our standard recommendation: start with BM25 plus manual synonyms. Measure no-result rate, top-3 CTR, user satisfaction. Move to hybrid when data tells you BM25 is missing. Move to full RAG only when users ask for synthesised answers, not link lists.

Decision diagram

Have a search engine?
  ├── No → BM25 with Elasticsearch/OpenSearch (start here)
  ├── BM25 + frequent conversational queries
  │     → Add dense encoder + RRF (hybrid)
  ├── Hybrid + users want synthesised answers
  │     → Add LLM with citation grounding (full RAG)
  └── Full RAG + critical legal accuracy
        → Add reranker + validation gate

Operational conclusion

RAG is not superior to classic search. It is an option with a different profile. The decision is made on volume, query type, quality requirements, and budget. Teams that decide „on trend” spend 5–20× more without justification. Teams that decide on real usage data get a favourable cost-benefit.

For CAI Technology client engagements, the standard advice is: prototype rapidly with BM25, measure for 4 weeks, decide on upgrade based on signals. This discipline eliminates most RAG projects that fail at cost-benefit.

External sources

Next step

For a free assessment of your search stack (4-week audit with quantitative metrics), the CAI Technology team offers a 30-minute consultation at no charge.

We start with a 30-minute conversation.

Free AI-readiness audit for companies with 50+ employees. We reply within 24 hours.