The million-token search engine that isn’t
A library with no card catalog has every book you could want and no way to find any of them. Bigger libraries don’t fix search. They make search worse. That is most of what is wrong with the current marketing of long context windows.

The boast that wasn’t
A new model release ships with a million-token context window. The marketing slide reads like the end of retrieval. Why bother building a vector store if the whole codebase fits in the prompt? Why think about chunking, embeddings, or rerankers if the model can hold an entire quarterly report in its head?
Because attention isn’t recall. The model can read every page of every book in the library and still miss the one you needed. The published benchmarks are clear about this, and almost nobody reads them.
Where the model loses the thread
On a recent project, the same questions ran against the same 600,000 tokens of documentation through two pipelines. The first pasted everything into the model’s context. The second was a sixty-line retrieval pipeline: chunk, embed, top-k, rerank. The retrieval version answered multi-hop questions correctly noticeably more often, and ran an order of magnitude cheaper per query. The big-context version did not get worse the more context it was given. It got worse the more relevant context it was given alongside irrelevant context.
That is the part the marketing skips. The model sees every token. It does not give every token equal attention. Past a certain density, multi-hop questions degrade visibly. Independent benchmarks (NIAH variants, RULER, the Needles series) report effective context that is a fraction
of the advertised window. The 1M-token boast is a capacity figure. The number that matters in production is the recall figure.
(The hardest version of this is the team that ran the benchmark, saw the degradation, and shipped the long-context version anyway because it was less work to integrate.)
What the catalog actually does
A retrieval pipeline does more than fetch the right pages. It shrinks the search space the model has to reason over. It removes distractors. It lets the model spend its attention on the hundred tokens that matter, not the hundred thousand around them.
That is what a card catalog did in a library. Not store the books. Let the librarian find them. The catalog was a small thing. It made the entire library work.
To be fair, long context wins in some cases. Documents that need to be read end-to-end (legal arguments, narrative analysis, code review on a small file) benefit from the model seeing everything at once. Conversations with months of memory benefit from never dropping context. Some workloads are not retrieval problems and shouldn’t be forced into one.
But “paste the whole repo” is not engineering. It is the laziest possible answer to the questionof how the model should access information.
Bigger libraries, smaller answers
The library with no catalog is the same library either way. The books are still there. The question is whether you can find one.
Most teams that have shipped both architectures arrive at the same conclusion. The retrieval version ages better. The bigger the library got, the more the catalog earned its keep, not less.
You might also like
Keep reading with more notes from the journal.

#Agents #Tech #Simplicity #Engineering
Not everything needs an agent
In 1931, Rube Goldberg won a Pulitzer Prize for drawings of machines that accomplished simple tasks through spectacular chains of unnecessary steps. A self-wiping napkin apparatus involving a parrot, a lit candle, a swinging pendulum, and seven other components.

#PromptEngineering #DataModeling #Tech #In
Schema before prompt
A stage director can rehearse the play to perfection. The actors hit their marks. The lighting cues are timed to the breath. But if the prop master walks out with a sword when the script calls for a letter, the scene collapses, and the audience walks out blaming the actor.`

#Hiring #Teams #Tech #Leadership #Growth
They skipped the boring part
Last year, we hired a junior engineer with fifteen GitHub repositories, three of them with actual stars. He shipped fast, had opinions about architecture on day one, and could produce working code in the time it took me to finish my coffee. We were delighted.
