Tensor LabsTENSORLABS

Your parser is not your product

The consultation call was about emails. A parts-sourcing platform for industrial components, the kind of business where a buyer sends a bill of materials as three paragraphs of prose and somebody on the other end retypes it into a quote system before lunch

June 11, 20264 min read4 sectionsBy Tensor Labs
Your parser is not your product

Introduction

The consultation call was about emails. A parts-sourcing platform for industrial components, the kind of business where a buyer sends a bill of materials as three paragraphs of prose and somebody on the other end retypes it into a quote system before lunch. They wanted the retyping gone. Extract the parts, the quantities, the spec codes from RFQ emails, return clean structured data through an API. A parser. We shipped version one, and it worked.

Then a customer photographed a spreadsheet with a phone and emailed the photo.

Version two read images. Version three read PDFs, because supplier documents arrive as scans of printouts of exports. Three versions in, the roadmap was turning into a list of file formats, and a list of file formats is a treadmill. The world invents new ways to mangle a document faster than any team can parse them.

(We kept waiting for someone to send a fax. Nobody did, but nobody on the team would have bet against it either.)

The fields nobody sent

Somewhere around version three, the interesting thing happened. The extraction was correct and the result was still unusable. A part number came through exactly as the buyer typed it, character for character, and the quote stalled anyway, because the email never said which of four compatible variants the buyer meant. The voltage rating lived in the buyer's head. The document was faithfully reporting less than the truth.

No parser can extract what the sender never wrote.

So the work changed shape. Cross-reference each part against catalog data. Infer the missing fields from what the rest of the order implies. Score every line, and when an inference is not safe, hand that single line to a reviewer, seconds of their attention replacing what used to be a ten-minute hunt. That layer, enrichment sitting on top of data-quality gates, is where the platform stopped being a typing-saver and became something customers trusted with purchase orders.

The treadmill is getting cheaper by the month

Twenty-five months of work on that platform produced eleven shipped deliverables. The parser was one of them. The other ten lived downstream: the enrichment layer, the validation gates, part-number lookup, vendor price comparison across every source the platform had ever seen.

Now the part worth being honest about. Today's multimodal models read emails, photos, and PDFs out of the box. The three versions that consumed our early months can be stood up in about a week, by anyone, including two founders in a garage who have just decided your market looks friendly. A meaningful slice of what we billed for in year one is now a commodity feature, and pretending otherwise would be marketing. The downstream layer is the opposite case. It is domain knowledge, encoded: which fields your industry routinely leaves out of its documents, what each one costs when guessed wrong, what a safe inference looks like for your vertical and an unsafe one. A frontier model can read any document you give it. It has no idea what your buyers chronically forget to write. Teaching the system that took two years alongside a client who knew his parts catalog cold, and there is no shortcut we have found that compresses it.

If your roadmap is mostly input formats, you are spending the year building the part that is racing to zero.

Where the quote actually comes from

The platform's customers never paid for parsing. They paid for the quote that came back right: enriched, checked, priced against the cheapest vendor, with the risky lines flagged for a human eye. The parser was the doorway. The product was everything the data walked through after it.

That is how the consultation call that started with emails ended somewhere else entirely: not "extract this format too," but "what should happen to a line item we cannot fully trust." At TensorLabs we have built that downstream layer enough times to map yours in a single conversation. The formats treadmill, you can leave to the models. They have already won it.