Why Data Pipelines Need Stronger Guardrails in the AI Era

“Move fast and break things” is a startup mantra for good reason. Spending months hardening something whose value you haven’t proven is its own kind of failure. But as a former coworker liked to remind me, it’s “move fast and break things, not move fast and drive off a cliff.” That distinction has never mattered more. AI has increased how fast a small team can ship, but it has also increased how quickly you can introduce silent, structural damage you won’t catch until much later.

That asymmetry bites harder in data pipelines. Pipelines sit as far upstream as it gets, and every table, dashboard, and AI-generated answer downstream inherits whatever they produce. A single corrupted column can break trust with a user whose report suddenly shows incorrect revenue, and that kind of damage is exactly what Kepler is built to prevent. When a frontend deploy breaks, users see a 500 and you ship a fix in an hour. When a pipeline silently corrupts data, you might not notice for a week, and the cleanup takes days of careful backfills. Bigger blast radius, longer feedback loop.

Most recently, following these principles enabled us to rewrite our SEC document processing pipeline, cutting filing -> processed time from over an hour to just 30 seconds. We were able to tackle what was effectively major surgery on our pipeline in just two days, with confidence that the result was working and safe.

The question isn’t whether to use AI to move fast. That ship has sailed; in a hot space, you don’t get to opt out. The question is how to set yourself up so AI accelerates you safely. Two things have mattered most in building Kepler’s pipeline from scratch: code quality and testing. Neither is a new topic, but both look different when an agent is doing a lot of the writing.

Code Quality

Humans and AI are fundamentally good at the same thing: pattern matching. That fact alone means code quality matters more in the AI era than ever before. The patterns in the structure of your code (naming conventions, boundaries, abstractions) are signals an agent reads just like a human does. When that signal is noisy, the agent’s output gets noisier with it.

Naming consistency. If you use a coding agent like Claude Code, you’ll watch it grep constantly to locate where a function or variable is used. Inconsistent naming makes the agent work harder at best, and miss relevant call sites at worst. It also corrupts what the agent writes next: consistent conventions dramatically increase the odds the agent matches the pattern when introducing new code. This sounds like a nit, but it compounds. Every new piece of code either reinforces the convention or muddies it, and codebases that drift become harder for any agent to reason about.

Organic abstraction. Coding agents know shared logic should be factored out; it’s programming 101. But they don’t consistently apply that knowledge during organic growth. You build implementation A. Later you ask the agent to add implementation B, and more often than not it writes B as a parallel copy of A rather than refactoring both behind a shared abstraction. Here’s the interesting part: if you already have a well-abstracted A and B, then ask the agent to add a C, it will notice the pattern (‘ah! There’s a shape to match here’) and organically add the new component in a clean way. The abstraction lives where the agent can see it just as a human would. The secret here is that your agent may not consider the initial abstraction opportunity when adding B; it’s on you to pay attention and coerce good structure as the agent goes.

Testing

Testing earns a special place early in the development cycle. During ideation, you’re wandering in a fog of ideas. As you experiment, that fog slowly coalesces into more coherent pieces that fit together to form something meaningful. Those pieces change so dramatically and so often that, for a while, tests aren’t even worth it. However, those same pieces will often be important and serving early customers before they’re fully baked. At that point, you’ll want testing in place to ensure those customers stay happy as you continue whittling away at the pipeline. When agents are driving most of this implementation, it’s important to have the right testing philosophy to ensure development is safe and the codebase is moving in a direction of improved quality.

A proper staging environment: This is the single most important investment for pipeline work, and AI makes it more important, not less. An agent can produce a change that looks correct, passes review, and passes unit tests, but still breaks the pipeline or taints the data. A staging pipeline that mirrors production on separate infra is the only reliable check against that class of failure. It’s especially valuable during major surgery or refactoring, when smaller tests carry little weight. Agents are also very proficient at doing A/B comparisons between staging and prod datastores, a task that would normally be tedious and slow for a person. For example, you can direct an agent to “compare table entries between Table-X in prod and staging to determine the impact of our change,” and it will produce a comprehensive comparison for you in just a few moments.
Integration tests: By the time you’re serving real customers, you should have a coarse shape of your pipeline components even if they’re rough around the edges. At that point, integration tests (tests that capture the ins and outs of those components without caring much about implementation details) are the most valuable. When you commit work, the changes to those tests show you how the shape and intent of the code have changed, which at this stage is the heart of what you’re working on. If you focus on a handful of integration tests, those tests carry huge weight and consideration when committing. Agents will also take note of this and give more consideration to integration tests, especially if their significance is called out in comments.
Unit tests: They still have their place, but sparingly. They earn their keep on small, load-bearing functions where the behavior is the contract. String normalization is the canonical example in our codebase. A handful of input/output examples is as useful to an agent as it is to a human. The trap is letting an agent generate dozens of unit tests for every code path, which devalues the integration tests that are the real weight-bearing tests. When you’re directing your agent to refactor the boundary components, you want the integration tests to carry the weight, not the unit tests.

Closing

None of this is about slowing down. It’s about not driving off the cliff while you’re moving fast. Code quality and a meaningful testing framework are what let an agent write three days of code in an afternoon without leaving you a week of cleanup later. They’re not brakes; they’re guardrails that keep you on the road during pipeline work, and they help you move faster than ever before.

These are the principles that allow us to build data pipelines that surface new reports dropped within 30 seconds, giving the analysts on our platform a massive competitive advantage.

If this is the kind of problem you want to work on, we’re hiring.