Turning 12 years of internal project documentation into a queryable knowledge base
A mid-size professional services firm (engagement Q3 2024) where senior partners leaving meant institutional knowledge leaving with them. Project learnings, client-specific methodologies, and past proposal strategies sat in ~80,000 files across SharePoint, email archives, and local drives, 12 years of work in 14 formats across 6 document repositories. Junior staff spent hours asking seniors questions that already had documented answers somewhere. We built a document-processing pipeline (normalise, deduplicate, classify by type and project) feeding a RAG system with two-tier retrieval (document-level for context, passage-level for specific answers) plus query routing that decides single-document answer vs. synthesis across projects. 89% query satisfaction (thumbs up/down feedback over 3 months); junior staff resolve questions independently ~3× more often. New project documentation ingests automatically at project close, the system self-improves.
12yr
Knowledge captured
89%
Query satisfaction rate
4wk
From kickoff to production
The Challenge
When senior partners left, institutional knowledge left with them. Project learnings, client-specific methodologies, and past proposal strategies existed in unstructured documents across SharePoint, email archives, and local drives, 80,000+ files spanning 12 years. Junior staff spent hours asking seniors questions that already had documented answers somewhere.
Our Approach
Discovery & Blueprint scoped the ingestion challenge: 80,000 files in 14 formats across 6 document repositories. We built a document processing pipeline that normalises, deduplicates, and classifies documents by type and project.
The knowledge system uses RAG with a two-tier retrieval layer: document-level retrieval for context, passage-level retrieval for the specific answer. Query routing determines whether the question needs a single-document answer or synthesis across multiple projects.
Outcome
89% query satisfaction rate (measured by thumbs up/down feedback over 3 months). Junior staff resolve questions independently 3× more often than before. Senior partners report fewer interruptions. The system adds new project documentation automatically at project close, it self-improves.
What We Learned
01
Document quality matters more than document volume, deduplication and normalisation are non-negotiable first steps.
02
Query routing (single-document vs. synthesis) dramatically improves answer quality.
03
Self-improving systems require an ingestion pipeline that runs continuously, not just at setup.
Stages Engaged
Discovery & Blueprint
Concept Validation
Production Build
Total Duration
3 months total
Artifacts Delivered
PRD
Knowledge Architecture
Document Processing Pipeline
WBS
IT Runbook
Start with a Feasibility Call
2 hours. No cost. We'll tell you honestly whether AI makes sense for your case.