Your AI Coding Agents Need a Constitution

In the era of LLM-written code, your engineering rules matter more than ever. Why every solo developer and team should define a coding constitution for their AI agents—and the one I use myself.

Kamil Chmielewski

• 2026-04-21 • 9 min read • AI

Your AI Coding Agents Need a Constitution

Prompting changes behavior. A constitution changes engineering taste.

LLMs can write code shockingly fast. That is exactly why the rules you give them matter more now, not less.

When an agent can generate a refactor, a package layout, a retry strategy, and a test harness in one pass, small mistakes scale quickly too. Vague guidance does not stay vague for long. The model fills the gaps with generic patterns that look reasonable in review, then start to crack under production load and a few rounds of maintenance.

That is why a strong coding agent is not just a model plus tools. It is a model plus tools plus a constitution: a compact set of engineering laws that shapes how it plans, implements, reviews, and reacts to failure.

Why AI Coding Makes This More Important

Human teams already have unwritten rules. Senior engineers carry them around without needing to say them out loud.

They know:

how much abstraction is too much
when to retry and when to fail fast
what a useful error looks like
whether package layout should follow the framework or the domain
when "validation" should really become parsing

LLMs know none of that.

They have broad priors. They have seen a lot of code. They have not lived through your outages, your bad refactors, your migration mistakes, or the six-month cleanup after someone got clever with "generic infrastructure."

So if you do not define the rules, the model will improvise, and at codebase scale that means importing assumptions that were never actually yours. That is why I think every serious AI-assisted team should have an engineering constitution: not a 40-page process manual nobody reads, but a compact doctrine that makes the trade-offs explicit. What do we optimize for? What do we reject? What should fail review? What kind of software are we trying to own?

Start Personal, Then Make It Team-Wide

If you are a solo founder or principal engineer, start with a personal constitution.

That is the fastest way to make an agent code more like you think. It lets you encode the instincts you usually apply during review: cut the abstraction, move parsing to the boundary, keep operational failures visible, stop hiding domain logic behind framework folders.

Once that works, turn it into a team artifact.

That team version is one of the highest-leverage AI assets you can create. It gives every engineer a shared set of defaults, and it exposes something that usually stays fuzzy: people are already steering models with personal, unwritten assumptions.

A team constitution makes those assumptions reviewable.

It turns this:

"I don't like this refactor."

into this:

"This violates the constitution because it adds speculative abstraction, hides operational failure, and throws away error context."

That is a much better conversation. It is also something you can teach to a reviewer agent before a human ever opens the diff.

The Constitution I Actually Want My Agents to Follow

Mine is not universal. It reflects the kind of software I want to maintain. The point is not to copy it line by line. The point is to make the rules explicit enough that an agent can apply them consistently.

1. KISS Beats Architecture Theater

Prefer the simplest design that fully satisfies the task.

If one concrete implementation solves the problem cleanly, start there. Do not generate interfaces because they might be useful later. Do not split logic into five files because "separation of concerns" sounds adult. Do not build a framework when the task asked for a function.

LLMs are especially prone to respectable-looking overengineering. They can produce cleanly formatted nonsense faster than a junior engineer can open a whiteboard.

Small, direct, boring code is a feature.

2. Separate Domain Failures From Operational Failures

I want agents to distinguish between three different classes of failure:

domain failures: expected business or user-facing outcomes
operational failures: network, filesystem, database, subprocess, and infrastructure problems
programmer errors: broken assumptions and impossible states

These should not be handled the same way.

Expected domain failures often deserve explicit modeling. Operational failures should usually bubble up with context intact. Programmer errors should not be smoothed over as if they were normal runtime conditions.

This matters a lot in AI-generated code because models love inventing fallback paths that sound responsible and make no sense for the actual system. They catch the failure, return a default, retry at the wrong layer, or quietly switch behavior, and in the process they hide the one signal that could have told you what is actually broken.

I wrote more about this in Let It Crash - Turning Failure into Your Most Reliable Signal, but the short version is simple: do not bury operational failure deep in leaf code just because the model wants to be helpful.

3. Steal Good Taste From Proven Traditions

I do not care whether a principle came from Python, Go, Rust, Erlang, or somewhere else. Good engineering ideas travel.

The Zen of Python is still useful far outside Python:

explicit is better than implicit
simple is better than complex
flat is better than nested
readability counts
errors should never pass silently

So are the Go proverbs. The one I especially want agents to internalize is this:

The bigger the interface, the weaker the abstraction.

LLMs love giant surfaces. They like to build "flexible" APIs that support five hypothetical futures before the first real use case exists. Small, focused components are almost always better.

A constitution is where you make those ideas explicit, so the agent follows your engineering standards instead of defaulting to generic patterns.

4. Parse at the Boundary, Do Not Re-Validate Forever

This is one of the clearest ways to improve AI-generated code.

At system boundaries, parse loose input into stronger domain types. After that, pass around the stronger type. Do not keep re-checking the same shape in every layer like nobody trusts the previous function.

That is also how you make the compiler work against agent mistakes. Once the boundary turns loose input into a stronger type, illegal states become harder to represent and easier for the type system to reject.

This is the Parse, don't validate idea, and it is one of the best rules you can give a coding agent. It pushes code toward stronger invariants and fewer illegal states.

Without this rule, agents often produce code that is technically cautious and practically messy. Everything is "validated" everywhere, and nothing is actually modeled cleanly.

5. Keep the Domain in the Center

I want package layout to reflect the problem domain first, not the framework first.

That means thin entrypoints, domain logic in the middle, and infrastructure pushed to the edges. I do not want random folder soup generated from generic starter templates. I do not want a structure where the HTTP framework is more visible than the business model.

This is the same instinct behind Ben Johnson's standard package layout advice in Go, generalized beyond Go itself: make dependency direction obvious, and keep the important logic near the center of the system.

One useful test is migration cost. If you need to move from MySQL to Postgres, that should mostly mean adding a new postgres package, removing the old mysql package, and doing a bit of wiring at the edges. If the change drags database details through the whole codebase, the boundaries are in the wrong place.

When agents ignore this, they tend to produce codebases that look organized on day one and become awkward on day ninety.

6. Preserve Error Context

Weak errors make AI-written code painful to operate.

When something fails, I want the error to preserve:

what failed
where it failed
under what conditions it failed
the underlying cause

Rust's anyhow-style context layering is a good mental model even outside Rust. Expected domain failures may deserve explicit result types. Operational failures should bubble up with context, not get flattened into "something went wrong" halfway through the stack.

This matters in review because an LLM can easily replace a precise failure with a generic wrapper that sounds friendlier and is vastly less useful.

If an agent "improves" an error message by removing the original context, it did not improve anything.

Where This Pays Off: Review

Generation is only half the story. The bigger payoff is review.

A good constitution gives your reviewer agent something sharper than "look for bugs." It gives it real gates.

It can ask:

Is this simpler than the alternative, or just more abstract?
Are operational failures being hidden here?
Should this input be parsed once at the boundary instead of re-validated later?
Does this package layout make dependency direction clearer or worse?
Did this change preserve error context or throw it away?
Is this interface small because the problem is small, or large because the model got ambitious?

That is when the constitution stops being a nice essay and becomes an engineering tool.

Prompting helps generation. Doctrine helps review.

How to Write Your Own

Do not start from philosophy. Start from recurring pain.

Look at your last ten reviews, incidents, or cleanup refactors. What keeps showing up?

Maybe it is:

hidden retries
too many abstractions
weak observability
framework-first layout
untyped config
validation that never turns into modeling
errors that lose the original cause

Write those down as rules.

Then tighten them until they are specific enough that an agent can apply them. "Write clean code" is useless. "Do not add an interface unless there are at least two real implementations or a clear boundary requirement" is a rule. "Preserve underlying error context when wrapping operational failures" is a rule. "Parse raw external input into domain types at the boundary" is a rule.

Then use the same constitution in two places:

generation prompts
review prompts

That second part is important. If the rules only shape generation, they will drift. If the rules are also used as explicit review gates, they start to stick.

The Design Shift That Matters

The important shift is this: in the era of LLM coding, prompting is not enough.

You need doctrine.

A good constitution gives your agents taste, constraints, and standards. It helps them produce code that looks less like the average internet answer and more like software your team can actually live with six months later.

The best programming ideas we have learned over decades do not become obsolete because AI writes more code. They become more valuable, because now we can encode them directly into the systems generating and reviewing that code.

Your AI Coding Agents Need a Constitution

Your AI Coding Agents Need a Constitution

Why AI Coding Makes This More Important

Start Personal, Then Make It Team-Wide

The Constitution I Actually Want My Agents to Follow

1. KISS Beats Architecture Theater

2. Separate Domain Failures From Operational Failures

3. Steal Good Taste From Proven Traditions

4. Parse at the Boundary, Do Not Re-Validate Forever

5. Keep the Domain in the Center

6. Preserve Error Context

Where This Pays Off: Review

How to Write Your Own

The Design Shift That Matters

Menu

Settings