Back to blog

Context Is The Easy Part

Everyone's talking about context engineering right now, but most of the conversation is focused on the wrong thing. Read the blog posts, the guides, the thought leadership. They're all asking the same questions: What should I include in the context window? How do I manage tokens efficiently? How do I curate what the model sees?

These are valid questions. They're also the easy part.

The hard part isn't deciding what context to include. It's building systems that deliver that context reliably, with provenance, at scale, every single time. That's not a context problem. That's an engineering problem. And engineering means something specific.

I spent fifteen years building data infrastructure at Palantir and Citadel, working on defense and intelligence systems where wrong answers weren't acceptable. Here's what I learned: knowing what data you need is trivial. Everyone knows what data they need. The hard part is building systems that deliver it validated, traceable, reproducible, and on demand.

The data world spent two decades learning this lesson. You don't solve data problems by choosing the right data. You solve them by building the infrastructure that makes data trustworthy. Context is the same problem wearing different clothes.

Watch how people approach context engineering today. They're selecting documents. Tuning retrieval parameters. Adjusting chunk sizes. Experimenting with what to include and what to leave out. Managing memory hierarchies. Optimizing token usage. This is prompt engineering with more inputs. It's not engineering.

Engineering means provenance: where did this context come from, and can I trace every piece back to its source? If the output is wrong, can I figure out which input caused it? Engineering means versioning: the context I retrieved yesterday and the context I retrieve today, are they the same, and if not, why? Can I reproduce last week's answer? Engineering means validation: how do I know the context is accurate before it reaches the model, what checks exist, and what happens when something fails? Engineering means determinism: same query, same context, every time, not mostly the same, but exactly the same. Engineering means observability: when context retrieval breaks, how do I know, how do I debug it, and how do I fix it without guessing?

Most context engineering today has none of this. It's artisanal, manual, and fragile. It works in demos and falls apart in production.

The failure mode is predictable. Someone builds a system that retrieves context and generates answers. It works well enough in testing, so they ship it. Then the context changes. A document gets updated. A source goes stale. A retrieval pipeline silently returns different results. The model keeps generating answers with the same confidence, but now the answers are wrong. Nobody notices until a customer does.

This is the same failure mode the data world lived through. Dashboards that showed numbers nobody could trace. Reports that changed when you refreshed the page. Metrics that meant different things to different teams. We solved it by building infrastructure: pipelines with lineage, transformations with tests, data contracts, observability. The unsexy plumbing that makes data trustworthy. Context needs the same infrastructure, and almost nobody is building it.

The current generation of AI tools is fragile in ways that aren't obvious. They work until they don't. They're right until they're wrong. And when they break, there's no way to diagnose why. The teams treating context engineering as an optimization problem will keep hitting this wall. They'll tune retrieval, adjust chunk sizes, swap embedding models, and still get inconsistent results they can't explain. The teams treating it as an engineering problem will build systems that are auditable, reproducible, and debuggable. They'll know where their context comes from. They'll know when it changes. They'll be able to defend their outputs. One approach scales. The other doesn't.

The term context engineering is new. The problem isn't. It's the same problem the data world faced: how do you build systems that deliver the right information, reliably, at scale? The answer was never "choose the right data." The answer was always "build the infrastructure that makes data trustworthy."

Context is data by another name. The lessons are the same. The question is whether we're going to spend another decade relearning them.