Skip to main content
Subhajit Bhar - IDP Engineer

Subhajit Bhar

AI Document Processing | PDF, OCR & Invoice Extraction | IDP

I help companies turn messy documents into structured, usable data. Most of my work is around document automation, OCR, and data extraction from PDFs, invoices, lab reports, contracts, and email attachments — the kind of documents where standard tools often fail.

Most clients find me after something partly working stops working. The extraction script handled 80% of documents fine, then broke on the rest and nobody can explain why. That’s what I fix.

My longest engagement ran two years on retainer — a water consultancy processing lab reports across 10+ changing layouts. Cut their manual data entry by 75%. Based in Durham, UK.

“Thanks to Subhajit’s work, we are saving countless hours having to manually enter results into our own template.”

Want to talk about your documents? Book a free 30-minute call. I’ll tell you whether automation makes sense for your case.

Book a Free 30-min Call →

Recent

Certificate of Analysis Data Extraction: A Production Guide

·8 mins
A certificate of analysis (CoA) is one of the most information-dense documents in regulated industries. It carries test results, method references, accreditation details, chain-of-custody information, and the laboratory’s sign-off — all in a format designed for human reading, not machine parsing.

Contract Data Extraction: Pulling Structured Data from Legal Documents

·8 mins
Contracts are the hardest document type to extract data from reliably. Invoices have a predictable structure. Lab reports have defined fields. Contracts are natural language documents, and the information you need — key dates, party names, payment terms, renewal clauses, termination conditions — can appear anywhere, phrased in many different ways, across documents that range from two pages to two hundred.