CASE STUDY · DOCUMENT AI

AI document classification pipeline

A document-intensive operations team was receiving hundreds of scanned PDFs a week — invoices, contracts, technical specs, compliance certificates — all in mixed Thai and English. Every document needed to land in the right OneDrive folder, tagged correctly, and logged for audit. Manually, this was eating multiple hours a day. We built a classification pipeline that does it end-to-end, and does it for a fraction of what you'd expect.

n8nClaude HaikuClaude SonnetMicrosoft OneDriveGoogle Sheets audit log

KEY METRICS

Manual handling time

~Zero

Languages handled

Thai + English

Cost approach

Haiku/Sonnet hybrid

ON THIS PAGE

01 The problem
02 What we built
03 The Haiku/Sonnet hybrid
04 What we'd do differently
05 Why it matters for outbound

/ 01

The problem

Document routing seems like a boring problem until you live it. The operations team was spending its days opening scanned PDFs, reading the first page, deciding which of 20+ folders it belonged in, renaming it, and moving it. Mistakes meant compliance risk. Volume meant burnout. Outsourcing meant sending sensitive documents to third parties. And off-the-shelf OCR-plus-rules tools couldn't handle the mix of Thai and English content, the variety of document types, or the edge cases that come up daily.

/ 02

What we built

An n8n workflow that watches the intake folder, sends each incoming document through Claude's vision API, and classifies it into one of the defined document types with a confidence score. Claude reads the scan directly — no separate OCR step, no template matching — and handles Thai, English, and mixed-language documents with the same pipeline.

Based on the classification, n8n renames the file to a structured convention, moves it to the correct OneDrive folder, and writes a full audit row to a Google Sheet: timestamp, original filename, classification, confidence, target folder, and the final filename. If confidence is below the threshold, the document is routed to a human review queue instead of being auto-filed.

/ 03

The Haiku/Sonnet hybrid

The cost optimisation is the piece we're proudest of. Running every document through Claude Sonnet would have been correct but expensive. Running everything through Haiku would have been cheap but too error-prone on complex documents.

Instead, we do a first pass with Haiku: fast, cheap, and correct on the majority of documents that are obviously one type. If Haiku returns low confidence or an ambiguous case, the document is escalated to Sonnet for a second opinion. The result is Sonnet-class accuracy at close to Haiku-class cost, because most documents never need the expensive model.

/ 04

What we'd do differently

The initial audit log was a flat Google Sheet, which became unwieldy once we crossed 10,000 rows. In the next iteration we'd start with a proper database (even just a small Postgres or Supabase instance) and treat the sheet as a view for the ops team. The lesson: 'good enough for v1' in data storage tends to be a problem you'll carry for longer than you think.

/ 05

Why it matters for outbound

This is not a sales/BD system — but it's the kind of operational win that shows up in a BD conversation. When we sit down with a prospect and they ask 'can you show me something you've actually built?', this is one of the examples we reach for. It demonstrates that we ship production AI at enterprise scale, that we think about cost discipline, and that we understand Thai-language operational reality. Credibility compounds.

NEXT CASE STUDY

Thai government tender tracker

READY TO BUILD SOMETHING LIKE THIS?

Start with a 30-minute pipeline audit.

Book your audit