Schema before prompt
A stage director can rehearse the play to perfection. The actors hit their marks. The lighting cues are timed to the breath. But if the prop master walks out with a sword when the script calls for a letter, the scene collapses, and the audience walks out blaming the actor.`

Introduction
That is most “the model isn’t following instructions” tickets.
The actor is reading the right script
The instinct, when an LLM-powered feature returns the wrong answer, is to look at the prompt. Maybe it needs to be more specific. Maybe the system message should be longer. Maybe a few-shot example would help. The team rewrites. Tests pass on the cherry-picked example. Six hours later, a new class of input breaks again.
This pattern repeats because the prompt is the easy lever. It is the part of the system the engineer wrote, the part they understand, and the part they can change without a migration. Of course they reach for it.
But the model is rarely the misbehaving actor.
It is reading the script you gave it. The problem is the props.
When a column holds three things
On a recent client project, one field in a production database had a generic name and a single declared type. In practice, it held free-text answers for some inputs, JSON arrays of selected options for others, and JSON blobs with nested objects for the rest. There was no discriminator. No type column to switch on. Nothing in the row indicated which of the three shapes the value would take.
The work was to write a prompt that classified errors based on the input. The request was reasonable. The data was not. No prompt, however carefully tuned, was going to read three different shapes from one column and consistently extract the same kind of feature. The model wasn’t failing. It was being asked to do a job the schema had made impossible.
A two-day migration to split the column into typed fields produced a bigger accuracy gain than a week of prompt iteration would have. The hard part of LLM-powered features is almost never the prompt.
It is the data the prompt is asked to read
The lever you can actually pull
Most teams that complain their model isn’t following instructions have not opened a single sample of the data they are asking it to read. They have read the prompt fifty times. They have not looked at five rows of the raw input. (Anyone who has worked on LLM-powered features long enough has done this. So has every engineer they know.) The shape of the problem reveals itself in the data, not in the prompt log.
To be fair, prompt choice does matter at the margins. With a clean schema, the difference between a sloppy prompt and a careful one is real. And small teams without ML backgrounds reasonably reach for prompts first because that is the lever they understand. None of this is wrong. It just isn’t where the gain lives.
The gain lives in the schema. In the rename of an overloaded column. In the discriminator field nobody added because the original developer didn’t know the data would later be read by an LLM. In the migration that should have happened last quarter.
Walk into the prop room
The director can rehearse all night. If the prop master is handing out the wrong objects, the play does not improve. Send the director home for the evening. Walk into the prop room. Look at what is actually on the shelves. That is where the show gets fixed.
Plenty of practitioners have spent afternoons rewriting prompts that should have been afternoons of writing migrations. The model was always going to be fine. The schema was the show.
You might also like
Keep reading with more notes from the journal.

#Agents #Tech #Simplicity #Engineering
Not everything needs an agent
In 1931, Rube Goldberg won a Pulitzer Prize for drawings of machines that accomplished simple tasks through spectacular chains of unnecessary steps. A self-wiping napkin apparatus involving a parrot, a lit candle, a swinging pendulum, and seven other components.

#AI #Data #Search #Engineering #TechLeadership
The million-token search engine that isn’t
A library with no card catalog has every book you could want and no way to find any of them. Bigger libraries don’t fix search. They make search worse. That is most of what is wrong with the current marketing of long context windows.

#Hiring #Teams #Tech #Leadership #Growth
They skipped the boring part
Last year, we hired a junior engineer with fifteen GitHub repositories, three of them with actual stars. He shipped fast, had opinions about architecture on day one, and could produce working code in the time it took me to finish my coffee. We were delighted.
