The AI Always Takes the Shortest Path. That's the Problem.

A field report from someone who has probably consumed more tokens than you.

I'm Learning a Lot of Things That Don't Work

I am not an AI skeptic. I am the opposite of an AI skeptic.

I have been building software for over 35 years, starting as an economist before discovering that engineering was just applied problem-solving with better syntax. My previous telecom startup made the Inc 5000 five consecutive years. I am the solo founder of a new platform — currently sitting somewhere around 3 million lines of code — designed from day one to be AI-first in every meaningful sense: multi-tenant, event-driven, with graph architecture, APIs, SDKs, and security baked in at the foundation, not bolted on later.

My token consumption is, conservatively, in the top 1% of all Claude users. My PR frequency is right there too. My velocity is enabled by my approach.

I am telling you this not to flex, but to establish something important: when I say I hit a wall, it is not the wall a curious non-technical professional hits when they ask an AI to write a cover letter. It is the wall that someone who has gone all in hits when they try to scale AI to the size of a real, complex, production system.

And I hit it hard.

The Symptoms

At first, everything felt like magic.

Tasks that took hours took minutes. Boilerplate that required a dozen Stack Overflow tabs got scaffolded in seconds. Patterns I knew well but would have spent an hour typing appeared clean and correct on the first generation. The productivity gain was real, and it was dramatic.

But somewhere around the time the codebase got serious — when the surface area expanded, when the architectural decisions started compounding, when the system started to have opinions about itself — something shifted.

I started noticing violations of the Single Responsibility Principle. Functions that were supposed to do one thing doing three. Utilities that were self-contained suddenly importing dependencies from places they had no business touching. Logic that belonged in the domain layer surfacing in the API layer. Not because the AI couldn't write good code. It could, and often did. But because it was solving the problem in front of it, not the problem in context of the whole.

I would catch it, fix it, write more detailed steering docs, add more guidance. And then it would happen again, slightly differently, in the next session.

The feedback loop was real but it was slow. And I was the bottleneck.

The question I eventually had to sit with was uncomfortable: if I am a top-tier AI practitioner, if my system was designed specifically for AI, if I have refined my prompting discipline for months — and I still cannot escape this pattern — is the problem me? Or is the problem something more structural?

The answer, I came to believe, is structural. And once I understood why, I started rethinking everything.

Why AI Always Takes the Shortest Path

Here is the honest diagnosis, without any optimism layered over it.

The model never saw the forest. It only ever sees the trees.

An AI language model generates its next response based on what is in its context window. That is not a limitation that good prompting overcomes. It is the physics of how the thing works. Whatever is not in the context window does not exist. Your architectural principles, your module boundaries, your years of accumulated system design decisions — if they are not in the window, right now, they are not influencing the output.

Your system being 1000x the size of a context window is not a scale problem you will eventually solve. It is a structural mismatch between the nature of language models and the nature of large software systems. The model is stateless. Architecture is stateful. Those two things are in fundamental tension.

Steering docs are hints. Hints decay.

This is the trap I spent months in. The steering doc approach — write down your architectural principles, include them in every context, keep them updated — is better than nothing. But it is passive guidance, not enforcement. The model reads your README at the start of the session and then its influence attenuates as the conversation deepens and the immediate problem takes over. It's like trying to enforce code quality with a CONTRIBUTING.md instead of a linter. The CONTRIBUTING.md decays. The linter doesn't.

AI optimizes locally. Architecture is a global property.

Single Responsibility, DRY, SOLID — these are emergent properties of a codebase. They cannot be evaluated by looking at one file or one function in isolation. They require holding the whole in mind simultaneously and making tradeoffs that sacrifice local convenience for global coherence. The model can know about these principles. It cannot feel the pressure of violating them across a system it cannot fully see. That feeling — the discomfort of adding one more responsibility to an already-overloaded class — is an architectural intuition that comes from context. And context is exactly what is missing.

The AI is not broken. It is doing exactly what it was built to do: generate a correct, efficient solution to the local problem as specified. The problem is that "correct locally" and "correct architecturally" are not the same thing at scale.

The Self-Deprecating Codebase

While most people building with AI are busy adding things — more steering docs, more prompt templates, more context files — I went the other direction. A few months ago I started removing documentation.

The reasoning was simple: if the model is good enough to figure out what code is doing from the code itself, then documentation is redundant at best and noise at worst. Every token spent on an inline comment is a token that isn't spent on something the model couldn't infer on its own. And in a large system, that noise accumulates fast.

But the moment you remove the scaffolding of documentation, you expose something uncomfortable: the code has to actually be clean enough to be read as documentation. Structure, naming, typing, module boundaries — these stop being style preferences and become load-bearing. The code has to say what it means because there's nothing else doing that job.

That realization crystallized into what I now think of as the self-documenting codebase as a discipline, not a nice-to-have. And with it came a harder realization: if you don't have architectural enforcement built in from the start, the shortest path problem doesn't just create messy code. It creates something worse.

It creates a self-deprecating codebase.

Not self-deprecating in the comedic sense. Self-deprecating in the structural sense — a system that, through the accumulation of locally correct but globally incoherent decisions, begins to actively undermine its own foundations. Functions that do too much make the boundaries unclear, so the next generation respects the boundaries even less. A dependency that crosses layers once becomes a precedent. A violation of a pattern becomes the new pattern. And because AI generates at token throughput speed rather than human typing speed, the compounding happens faster than any engineering team has ever had to deal with before.

The danger isn't that the AI writes bad code in any individual session. The danger is that without structural enforcement, acceptable code accumulates into a corrupt architecture. And corruption at that speed is a new problem. We don't have great intuitions for it yet.

Here is what I found works against it: the architecture has to have built-in stops at every layer. Layer one cannot violate the architecture not because you put a rule in a document that says it shouldn't, but because the shape of the system makes it structurally difficult. Constants cannot bleed into business logic because the module boundaries don't allow imports in that direction. Responsibilities cannot stack up in a single class because the interface contracts define exactly one. The AI can only generate within the space the structure permits — and the structure was designed by a human with the whole in mind.

This is the difference between architecture as intention and architecture as enforcement. Intention lives in documentation. Enforcement lives in the system itself. And if you are building AI-first at any serious scale, intention is not enough.

Where That Leaves You

If you are building at real scale with AI — and by that I mean a system with genuine complexity, not a prototype — you will eventually face this. The degree to which it surprises you is inversely proportional to how honest you have been with yourself about what AI can and cannot do.

Here is what I have come to believe:

The human bottleneck is not a workflow inefficiency waiting to be optimized away. At current capability levels, it is the architecture layer. You are not going to prompt your way out of needing architectural judgment. What you can do — what I am now committed to doing — is raise the abstraction level at which you intervene, so that each decision you make is worth 10x or 100x what it is now.

What I Am Doing About It

This is not a "I solved it, here's the playbook" section. It is more honest than that. It is a "here is what I am now doing, with the understanding that I will probably refine this for the next year" section.

Separating the architect step from the coder step explicitly.

Before any generation, there is now a planning pass: where does this belong, what does it depend on, what does it expose, what pattern applies. That plan gets human review — mine. The coder generates strictly within those constraints. I am reviewing architecture 10x as frequently as I review code, and individual code review shrinks because the design was already validated. This is a multi-agent separation of concerns problem, not a prompt engineering problem.

Making constraints hard, not soft.

Soft constraints are prose. Hard constraints are structure. If the file organization enforces module boundaries, if the interface contracts are defined before implementation begins, if the dependency rules are executable rather than aspirational — then the AI cannot easily violate them, not because it chose not to, but because the shape of the problem doesn't allow it. I am moving from describing architecture to encoding it.

Building a compact, lossless system representation.

This is the highest-leverage infrastructure work I am doing. My system cannot fit in a context window. That means I need a layer of abstraction that accurately compresses its structure — typed dependency graphs, module manifests, domain model sketches — dense enough to be relevant in context, accurate enough to actually constrain generation. Not a README. More like a schema for the system itself. Every generation context gets the relevant slice of this, not hopeful prose.

Treating generated code as compiled output.

At 1.5 million lines, I should not be accumulating AI-generated patches as the source of truth. The source of truth should be higher-level — specs, schemas, event contracts, type definitions — and implementations should be generated from those. Regenerating from spec gives you consistency guarantees that accumulated patching never will. This is a shift in mental model as much as it is a workflow change.

Writing about it.

Which is what this is.

There is something about committing a practice to writing that forces an honesty the practice alone doesn't require. I do not have this completely figured out. I expect to look back at this post in six months and find places where I underestimated the problem or overcorrected in my response. But I am done pretending the steering doc approach is sufficient, and I am done being surprised when the AI takes the shortest path.

It always will. The work is in designing a system where the shortest path and the right path are the same thing.