AI-Generated Code: A Maintenance Nightmare or a Productivity Boost? |

Translate: 🇫🇷 French 🇸🇦 Arabic 🇨🇳 Chinese 🇪🇸 Spanish

AI coding tools are everywhere. GitHub Copilot, Cursor, Claude, and ChatGPT are just a few of the names that have become commonplace in a developer’s toolkit. We are told they make us faster, more productive, and more efficient.

But a more important question often goes unasked. What happens after the AI has written the code? What happens when the next developer has to understand it, change it, and maintain it? Does AI help with this, or are we busy creating an AI-slop-driven maintenance nightmare? That is the topic of this article.

The Crushing Cost of Maintenance

Maintainability is critically important because maintenance, not initial development, is the dominant cost in software. It’s naive to imagine that you can develop software once and never revisit it again.

Estimates vary, but a common consensus is that maintenance costs somewhere between three to four times more than the initial cost of development over the lifetime of a software system. Maintenance represents 50% to 80% of the total cost of ownership. Most of the money, time, and risk in software, therefore, happens after the first version has shipped. This is why developing software that’s easy to change makes commercial sense, while optimizing for short-term, dumb metrics like feature count is a silly tradeoff.

Yet, most AI studies stop at superficial questions. Did the developer finish faster? Did they type fewer characters? That isn’t engineering. That’s just measuring typing speed.

A Controlled Experiment on Maintainability

A pre-registered, controlled experiment was designed to look at the downstream maintainability of AI-generated code. This wasn’t a toy experiment. In fact, it’s one of the more thorough looks at the impact of AI on software development so far, involving 151 participants, 95% of whom were professional software developers. This is unusual, as most studies are based on students who are easier to recruit.

The research asked these professionals to create and maintain a realistic Java web application in two carefully controlled phases.

Phase One: Developers were asked to add a feature to some rather unpleasant, buggy code. Some used AI assistants, while others did not.
Phase Two: A different set of developers were then randomly assigned code produced in phase one. They were asked to evolve it without knowledge of whether it was originally written with AI’s help. No AI assistance was allowed in this phase.

This is the key. The study wasn’t measuring how fast the same person works with AI. It was measuring how easy the code is for someone else to change later on. This is a much better proxy for maintainability and general code health, simulating a more realistic view of real-world development.

The researchers measured several factors to get a complete picture:

How long the next developer took to evolve the code.
The objective code quality based on CodeScene’s code health metric.
Test coverage.
Perceived productivity using the SPACE framework.

This range of measures is important because maintainability is multi-dimensional. There isn’t one magic number.

The Surprising Findings

So, what did the study find? The headline result is that there was no significant difference between the cost of maintenance for AI-generated and human-generated code.

This is interesting and perhaps not what many would have expected. Code written with AI assistance was no harder to change, no easier to change, no worse in quality, and no better in quality. From a downstream perspective, AI didn’t break anything. Given some of the fear-mongering, that’s a pretty significant result.

Now, here’s where things align more with previous studies. In phase one, AI users were about 30% faster to get to a solution. Habitual AI users were closer to 55% faster. So yes, AI clearly speeds up initial development. The real question was whether that speed came at a hidden cost. In this study, there was no evidence that it did.

But there was something else the study found that was very interesting. When experienced developers—people who already knew what they were doing—used AI habitually, their code showed a small but measurable improvement in maintainability later on. It’s not a huge effect, but it is consistent. One explanation is that AI tends to produce boring, idiomatic, and unsurprising code. And boring code is good. Surprise is usually the enemy of maintainability.

AI as an Amplifier, Not a Replacement

What is very clear is that AI does not automatically improve code quality. We can’t stop caring about engineering discipline. Junior developers can’t simply “vibe” their way to good systems. In fact, the study shows that developer skill matters more than AI usage.

This aligns strongly with a message backed up by recent DORA research: AI code assistance acts as an amplifier.

If you’re already doing the right things, AI will amplify the impact of those things.
If you’re already doing the wrong things, AI will help you dig a deeper hole faster.

Tools amplify capability; they don’t replace it. Jason Gorman explores this idea in a brilliant article, breaking down what “doing the right things” means in the context of getting the best from AI coding assistants.

He says the key practices are:

Working in small batches, solving one problem at a time.
Iterating rapidly with continuous testing, code review, refactoring, and integration.
Architecting highly modular designs that localize the blast radius for changes.
Organizing around end-to-end outcomes instead of around role or technology specialisms.
Working with high autonomy, making timely decisions on the ground instead of sending them up the chain of command.

For regular readers of this publication, none of this will come as a surprise. It’s possible this inadvertently had an impact on the findings, as most participants in the study were recruited from an audience already predisposed to these better development practices.

The Slippery Slope to Disaster

The study’s authors highlight two other things that represent a slippery slope towards disaster.

Code Bloat: When generating code becomes almost free, the temptation is to generate far too much of it. Volume alone is a massive driver of complexity, and AI makes it easier than ever to drown in your own codebase.
Cognitive Debt: If developers stop thinking—really thinking—about the code they create, then over time, understanding erodes, skills atrophy, and innovation slows. This is exactly the long-term risk that doesn’t show up in a sprint metric.

Conclusion: Thinking Still Matters

The conclusion is clear. AI assistants improve short-term productivity. Contrary to popular opinion, they do not appear to damage the maintainability of the systems they help produce. They might even slightly improve it when used well.

However, they do not remove the need for good engineering. They don’t remove the need for good design and the broad experience that allows us to produce it. And they certainly don’t remove the need for thinking hard about the problems we face.

This includes how to decompose problems into small pieces that allow our AI assistants to do a good job and how to guide them toward solutions we are happy with. This compartmentalization through decomposition is the central, fundamental skill of building software. It is this, rather than the speed of typing, that differentiates good software development from slop—whether AI-generated or not.

As always, tools matter, but how we use them matters more.