Reducing document processing errors by 93% at enterprise scale
A US Tier-2 mortgage servicer (Q3 2024 kickoff) clearing ~47,000 income- and asset-verification submissions per month, rejecting 11–13% of them, with a 2.7-day average from upload to a credit-team-ready packet. Two prior vendors had failed at the same problem. We caught a class of scans-of-scans in week 3 of Concept Validation that would have torpedoed the model and switched to a layout-aware OCR pre-pass before classification. Final: 0.8% rework, four-hour turnaround, ~3× the original volume on the same back-office headcount.
Their Head of Operations, end of week 3 of Validation: "Stop apologising and tell me which 8% of inputs you can't fix. We'll route those manually."
93%
Error Reduction
4hr
vs 2.7-day turnaround
3×
Volume handled
The Challenge
The client's back-office team was manually processing ~47,000 financial documents per month, loan applications, asset and income verification, compliance forms, contracts. An 11–13% rework rate was creating downstream audit failures and regulatory compliance risk. The 2.7-day average turnaround was blocking loan decisions that competitors were making in hours.
Two previous vendors had attempted AI solutions. Both failed to move past prototype, the models worked in isolation but couldn't handle the document variety (12 document types, 6 languages, variable formatting) at production fidelity.
Our Approach
We started with a Feasibility Call, then an AI Readiness Audit. The audit revealed the real problem: not a model problem, but a data labelling problem. ~40% of the training data was inconsistently labelled by different annotation teams.
Discovery & Blueprint produced a three-pronged architecture: (1) a document classification layer that routes to specialised extractors by type; (2) a human-in-the-loop validation queue for edge cases; (3) a confidence-scoring system that flags low-confidence extractions before they enter the production pipeline.
Concept Validation ran for 7 weeks with 3 document types. The mid-flight redirect: in week 3 we discovered ~8% of submissions were scans-of-scans (faxed photocopies of faxes), and the planned LayoutLMv3 fine-tune was hallucinating fields on them. We added a layout-aware OCR pre-pass and a "manual route" lane for documents below an image-quality threshold, the architect's call was to NOT try to fix that 8% with the model. We hit 98.4% accuracy on the in-spec test set before moving to production.
Outcome
Production system processes all 12 document types in 4 languages. Error rate: 0.8%. Turnaround: 4 hours. The system now handles ~3× the original volume with the same team size, freeing back-office staff for exception handling and the deliberate "manual route" cases.
The client owns the full architecture. All models and pipelines are documented and transferable.
What We Learned
01
Data quality is almost always the root cause. Fix the data before you touch the model.
02
Routing by document type outperforms a single large model trying to handle everything.
03
When a class of inputs is genuinely out of distribution, the right answer is sometimes "do not solve this with the model", design a manual lane.
Stages Engaged
Discovery & Blueprint
Concept Validation
Production Build
Total Duration
5 months total
Artifacts Delivered
PRD
Technical Blueprint
WBS
Data Labeling Guide
SOW
Start with a Feasibility Call
2 hours. No cost. We'll tell you honestly whether AI makes sense for your case.