Replacing keyword search with semantic understanding across a 2M SKU catalog
A large EU fashion retailer (~2M-SKU catalogue, engagement 2024) where 23% of search queries returned zero results and another ~40% returned the wrong category, shoppers used natural language ("flowy summer dress with pockets") while the catalogue was tagged with supplier codes and rigid taxonomy terms. A fine-tuned bi-encoder (sentence-transformer base, trained on the retailer's own ~12-month query-product click stream) drives dense vector search over the full 2M SKUs in Qdrant; a lightweight re-ranking layer applies business rules (margin, stock level, trend score) without degrading relevance. p95 latency 78ms. Zero-result queries 23% → 2%. Search-to-purchase conversion +34%.
23%→2%
Zero-result queries
+34%
Search-to-purchase conversion
<80ms
p95 search latency
The Challenge
23% of search queries returned zero results, not because the products didn't exist, but because customers used natural language ("flowy summer dress with pockets") while the catalog was tagged with supplier codes and rigid taxonomy terms. Another 40% of queries returned results from the wrong category entirely.
Our Approach
We proposed replacing the keyword index with a bi-encoder semantic search architecture. Feasibility validated that the catalog could be fully re-embedded in 72 hours on available infrastructure.
The architecture: a fine-tuned sentence-transformer embedding model (trained on the retailer's own query-product click data), dense vector index via Qdrant, and a lightweight re-ranking layer that incorporates business rules (margin, stock level, trend score) without degrading relevance.
Outcome
Zero-result queries dropped from 23% to 2%. Search-to-purchase conversion increased 34%. p95 latency: 78ms on the full 2M SKU catalog. The business rules re-ranking layer let the merchandising team influence results without touching the model.
What We Learned
01
Fine-tuning on your own click data outperforms generic embeddings for domain-specific catalogs.
02
Business rules and ML can coexist, the re-ranking layer is the right place for them.
03
Indexing 2M vectors is a data pipeline problem, not a model problem.
Stages Engaged
Feasibility Call
Discovery & Blueprint
Concept Validation
Production Build
Total Duration
4 months total
Artifacts Delivered
PRD
Search Architecture Blueprint
Embedding Model Spec
WBS
SOW
Start with a Feasibility Call
2 hours. No cost. We'll tell you honestly whether AI makes sense for your case.