Your parser is not your product
The consultation call was about emails. A parts-sourcing platform for industrial components, the kind of business where a buyer sends a bill of materials as three paragraphs of prose and somebody on the other end retypes it into a quote system before lunch

Introduction
The consultation call was about emails. A parts-sourcing platform for industrial components, the kind of business where a buyer sends a bill of materials as three paragraphs of prose and somebody on the other end retypes it into a quote system before lunch. They wanted the retyping gone. Extract the parts, the quantities, the spec codes from RFQ emails, return clean structured data through an API. A parser. We shipped version one, and it worked.
Then a customer photographed a spreadsheet with a phone and emailed the photo.
Version two read images. Version three read PDFs, because supplier documents arrive as scans of printouts of exports. Three versions in, the roadmap was turning into a list of file formats, and a list of file formats is a treadmill. The world invents new ways to mangle a document faster than any team can parse them.
(We kept waiting for someone to send a fax. Nobody did, but nobody on the team would have bet against it either.)
The fields nobody sent
Somewhere around version three, the interesting thing happened. The extraction was correct and the result was still unusable. A part number came through exactly as the buyer typed it, character for character, and the quote stalled anyway, because the email never said which of four compatible variants the buyer meant. The voltage rating lived in the buyer's head. The document was faithfully reporting less than the truth.
No parser can extract what the sender never wrote.
So the work changed shape. Cross-reference each part against catalog data. Infer the missing fields from what the rest of the order implies. Score every line, and when an inference is not safe, hand that single line to a reviewer, seconds of their attention replacing what used to be a ten-minute hunt. That layer, enrichment sitting on top of data-quality gates, is where the platform stopped being a typing-saver and became something customers trusted with purchase orders.
The treadmill is getting cheaper by the month
Twenty-five months of work on that platform produced eleven shipped deliverables. The parser was one of them. The other ten lived downstream: the enrichment layer, the validation gates, part-number lookup, vendor price comparison across every source the platform had ever seen.
Now the part worth being honest about. Today's multimodal models read emails, photos, and PDFs out of the box. The three versions that consumed our early months can be stood up in about a week, by anyone, including two founders in a garage who have just decided your market looks friendly. A meaningful slice of what we billed for in year one is now a commodity feature, and pretending otherwise would be marketing. The downstream layer is the opposite case. It is domain knowledge, encoded: which fields your industry routinely leaves out of its documents, what each one costs when guessed wrong, what a safe inference looks like for your vertical and an unsafe one. A frontier model can read any document you give it. It has no idea what your buyers chronically forget to write. Teaching the system that took two years alongside a client who knew his parts catalog cold, and there is no shortcut we have found that compresses it.
If your roadmap is mostly input formats, you are spending the year building the part that is racing to zero.
Where the quote actually comes from
The platform's customers never paid for parsing. They paid for the quote that came back right: enriched, checked, priced against the cheapest vendor, with the risky lines flagged for a human eye. The parser was the doorway. The product was everything the data walked through after it.
That is how the consultation call that started with emails ended somewhere else entirely: not "extract this format too," but "what should happen to a line item we cannot fully trust." At TensorLabs we have built that downstream layer enough times to map yours in a single conversation. The formats treadmill, you can leave to the models. They have already won it.
You might also like
Keep reading from the journal.
June 10, 2026AI
95% accurate and completely useless
The model shipped on a Thursday, and the team did the thing teams do. Screenshot of the dashboard, 95% accuracy, posted in the company channel with a rocket next to it. The fraud system worked. Everyone moved on to the next thing.
June 6, 2026AI
The map that ran out of memory
Somewhere between the demo and the third customer, the product started dying.
June 8, 2026AI
Teaching your website to answer agents
On May 19, 2026, at Google I/O, Chrome announced WebMCP, and as of Chrome 149 it ships as an origin trial. WebMCP lets a web page expose structured tools to a browser-based AI agent, so the agent calls a function you defined instead of guessing its way through your DOM