One of the most common conversations we have with startup founders in 2026 goes like this: they have a product that's working, users are growing, but there's a manual step somewhere in the workflow — usually involving reading documents, classifying data, or extracting information — that is becoming a real bottleneck. They've seen what AI can do, but they're not sure where to start or what it would actually cost.
This is a post about how we approached exactly that problem for a recent client, and what we learned.
The Problem
The client had an internal workflow where their team was manually reviewing and classifying incoming documents — contracts, invoices, compliance forms — before routing them to the right department. With volume growing, this was taking hours each day and becoming a source of errors and delays.
They didn't want a new product. They wanted this specific task automated and integrated into the tool their team already used.
The Approach: Scope First, Build Second
We always start with a 30-minute discovery call before writing a single line of code. In this case, we spent the first week mapping the exact documents involved, the classification categories, the error tolerance, and what "done" actually looked like for the team.
This scoping step is where most AI projects go wrong. People jump straight to "let's use GPT-4" without first defining what success looks like. Our rule: if you can't describe the output precisely, you're not ready to build.
The Build
We used Azure OpenAI (GPT-4o) for the classification and extraction layer, with a retrieval-augmented approach for the client's specific document taxonomy. The pipeline looked like this:
- Document arrives via existing upload endpoint
- Pre-processing extracts clean text (handling PDFs, scanned images via OCR)
- GPT-4o classifies the document type and extracts key fields using a structured JSON output schema
- Confidence score below threshold → flags for human review automatically
- Result written back to the client's existing database and triggers their workflow routing
Total new code: around 400 lines of Python and a thin API layer. No new infrastructure — we deployed it as an Azure Function that hooks into what they already had.
Results After 4 Weeks
The classification accuracy came in at 94% on their real document corpus in testing. Manual review time dropped from several hours daily to around 20 minutes — covering only the flagged edge cases. The team adopted it immediately because it didn't change their existing workflow; it just removed the slow part.
What This Means for Your Business
Most AI integration projects don't need to be big, expensive transformation programmes. The highest-value use cases are usually narrow: one specific manual task, automated well, integrated cleanly into what you already have. That's where we focus.
If you have a workflow step that involves reading, classifying or extracting information from documents — or any repetitive task that a person does by reading and making a decision — it's worth a 30-minute conversation to see whether AI can handle it.
A Few Things We'd Do Differently
We underestimated the variability in document formatting in the early testing phase, which meant we spent an extra three days on the pre-processing layer. In future projects we now ask for a sample of 50–100 real documents before scoping — not five clean examples.
We'd also build the human review flagging earlier in the process. It's not just a fallback — it's the mechanism that builds client confidence in the system over time.
Conclusion
AI integration doesn't have to mean rebuilding your product. The best projects start with a specific problem, a defined output, and a realistic timeline. If you're curious whether something in your workflow is a good candidate, reach out — we're happy to give you an honest answer before you commit to anything.