Back to blog

Trust in the Age of AI

Trust has always been a moving target. What we trusted yesterday – a firm handshake, a reputable letterhead, a recommendation from a colleague – doesn't necessarily hold up today. And now, with AI systems increasingly handling knowledge work, the question of trust has become both more urgent and more complicated.

When we talk about trust in AI, we're really talking about three interrelated things: accuracy (does the output reflect reality?), security (is the system handling sensitive data responsibly?), and credulity (should we believe what it tells us?). Getting any one of these wrong can be costly. Getting all three wrong can be catastrophic.

The junior employee analogy

It's tempting to think of AI systems like junior employees. There's something to this comparison: both need oversight, both are expected to handle rote tasks, and both occasionally produce work that makes you wonder if they understood the assignment at all.

But the analogy only goes so far. Junior employees learn institutional knowledge over time. They pick up on context clues. They know when to ask clarifying questions. AI systems don't – at least not in the same way. They’re remarkably good at producing confident nonsense when asked to do things out of their comfort zone.

Still, there's one lesson from managing junior employees that transfers perfectly: their work needs verification. The difference is that systematic verification of human work is expensive enough that most organizations don't bother. Instead, we rely on gut feel, spot checks, and the occasional post-mortem when something goes wrong. This approach saves resources most of the time, but it also means errors compound silently until they surface as full-blown crises.

The documentation advantage

Here's where AI actually has an edge over human workers: it doesn't get bored.

Ask an analyst to document the source for every single cell in a spreadsheet, along with a qualitative explanation of how they arrived at each figure, and you'll get pushback. It's tedious work. It slows everything down. So we don't ask for it, and we accept the traceability trade-off.

AI systems have no such objections. They'll happily record provenance for every data point, trace every inference back to its source, and explain their reasoning in exhaustive detail. The marginal cost of documentation drops to nearly zero.

This matters more than it might seem. There's a practice in Japanese rail systems called "pointing and calling" – train conductors physically point at each signal and call out its status aloud. It feels redundant, almost theatrical. But studies have shown it reduces errors by up to 85%. The act of explicit acknowledgment forces attention and prevents the kind of autopilot mistakes that creep in during routine tasks. This has been so successful that other rail systems have now adopted the practice, for example in the New York subway system.

The same principle applies to AI systems. Forcing a model to explicitly cite its sources and articulate its reasoning isn't just about creating an audit trail for humans. It's about making the system do the cognitive work that prevents errors in the first place. This is not unlike how forcing a large language model to generate code to perform arithmetic significantly improves reliability.

The edge case problem

This documentation discipline becomes even more critical when you consider how AI systems fail.

Large language models are improving rapidly, but they still exhibit strange edge cases that are difficult to predict. Feed a model a transcript from an earnings call, and it might start unconsciously echoing the CEO's framing rather than analyzing it critically. Ask it to recall a specific figure, and it might produce a number it "semi-remembers" – close enough to sound plausible, wrong enough to cause problems.

The challenge is that these failures often look identical to successes. The model doesn't flag uncertainty. It doesn't caveat its responses. It delivers hallucinated data with the same confidence as verified facts. And testing whether a given model can reason reliably about arbitrary inputs is genuinely hard. Even a test harness doesn’t help much in building confidence when the behavior is highly unstable given changing inputs.

To give a concrete example: Netflix announced a 10:1 stock split in October 2025, effective November. This creates a problem – LLMs lag on current data, so they have a 'gut feel' for share counts that's now wildly off. Left unchecked, they'll lazily recall the wrong numbers. And even after learning the new figure, historical queries will present the same problem in reverse. A junior analyst might make that mistake once. LLMs will repeat it on every query.

This is why you have to force the AI to actually do the work – to ground every claim in verifiable sources, to show its reasoning, to make its assumptions explicit. Not because humans will check every single output (they won't), but because the discipline itself changes the nature of the output.

Building systems that earn trust

At Kepler, this is the problem we've built our entire approach around.

We force-ground our models, requiring explicit source attribution for every claim. We double-check outputs systematically, not through spot-checks but through structured verification. We use models of different provenance – including specialized systems trained on structured data formats like XBRL – to catch the kind of systematic errors that a single model might consistently get wrong.

The goal isn't to eliminate human oversight. It's to make human oversight actually effective. When an analyst reviews AI-processed data, they should be able to see exactly where each figure came from, understand the reasoning chain that produced it, and quickly identify anything that doesn't hold up. The AI does the exhaustive documentation work that humans won't do; humans provide the judgment and contextual understanding that AI can't reliably provide.

But here's the thing: even if no human ever looks at the source citations, forcing the model to produce them still matters. The act of grounding – of requiring the system to acknowledge specific facts rather than generate plausible-sounding content – is itself a mechanism for maintaining accuracy. It's pointing and calling for the age of AI.

Trust in AI isn't something we achieve once and forget about. It's something we build through systems and processes, through verification and transparency, through acknowledging both what these tools can do and where they still fail. The organizations that figure this out will be the ones that actually capture the value AI promises. The ones that don't will learn some expensive lessons about the cost of misplaced confidence.

Kepler builds data infrastructure for accurate, source-grounded AI processing. We'd love to show you what source-grounded data extraction actually looks like. Book a demo.