Rebuilding recommendation relevance from the data layer up
A UK mid-market fashion retailer (engagement 2024, ~520k MAU, ~2M-SKU catalogue) running a recommendation engine that was misfiring not because of the ranker but because customer behaviour sat in seven disconnected systems with no shared key. Six weeks of data engineering, identity resolution against a deterministic-then-probabilistic ladder, a clean event stream, a single 360° profile, before a single ranking model was retrained. Two-stage retrieval (Qdrant ANN over 2M items → LightGBM re-ranker on 80+ contextual features) holds p95 at 34ms over 10M+ daily predictions. Click-through 1.2% → 4.8%; revenue per session +28%.
Architect on debrief: "The ML problem was visible. The data problem was the actual one, and identity resolution is the kind of work nobody photographs."
4×
Recommendation relevance
+28%
Revenue per session
10M+
Daily predictions
The Challenge
The recommendation engine was returning products from categories users had never browsed. Internal analytics showed a 1.2% search-to-purchase click-through against an industry benchmark of 4–6%. Two prior ML attempts had stalled before reaching production.
Our AI Readiness Audit found the real problem: user behavioural data existed in 7 separate systems, the e-commerce platform, CRM, loyalty app, email campaigns, support tickets, returns data, and a legacy catalogue. None shared a user identifier; "anonymous" sessions were ~63% of all traffic and were silently being treated as different users every time.
Our Approach
Six weeks of data engineering before a single model was trained. We unified the user identity graph (deterministic match on email/phone first, then probabilistic match on device+IP+behavioural tuple), built a clean event stream, and created 360° behavioural profiles.
The scar: our first cut of the recommendation architecture used a single dense-only retrieval pass. It hit accuracy targets on the held-out set, but p95 latency was 180ms, three times the contractual ceiling. We split into two-stage retrieval-ranking: fast ANN narrows ~2M items to 200 candidates, then a LightGBM ranker re-scores those 200 on the full contextual feature set. Held p95 under 50ms.
Concept Validation proved the approach on 3 product categories before rolling to the full catalogue.
Outcome
Recommendation click-through: 4.8% (up from 1.2%). Revenue per session: +28%. The system handles 10M+ predictions per day at p95 latency of 34ms. The unified user identity graph became a company-wide asset used by email and CRM teams, the data pipeline outlived the recs project itself.
What We Learned
01
AI is downstream of data. If your data is wrong, your AI will be confidently wrong.
02
Identity resolution is unglamorous engineering that unlocks every personalisation capability.
03
Two-stage retrieval-ranking solves the latency-accuracy tradeoff at scale; dense-only does not.
Stages Engaged
AI Readiness Audit
Discovery & Blueprint
Concept Validation
Production Build
Total Duration
7 months total
Artifacts Delivered
PRD
Data Architecture Blueprint
WBS
Identity Graph Design
SOW
Commercial Proposal
Start with a Feasibility Call
2 hours. No cost. We'll tell you honestly whether AI makes sense for your case.