Skip to main content

When Specifications Need More Than Prose

· 9 min read

When Specifications Need More Than Prose

Specification-Driven Development has settled on .md files for a good reason. Markdown is flexible, lightweight, and easy to reshape while a feature is still forming, for humans and LLMs alike. Expand a section, reorganize a few thoughts, add an open question. Prose is excellent at helping you think.

But behavioural specifications have a different job.

Once a spec starts describing behaviour that must be tested, verified, and changed safely over time, flexibility stops being enough. In an AI-assisted workflow, every meaningful change to a prose-heavy spec may need to be interpreted again before it turns into code or automated tests.

That interpretation may be perfectly reasonable. Most of the time, it probably will be. But the resulting verification still depends on a fresh reading of the prose, and that leaves room for subtle drift that can be difficult to spot.

This is where .feature files written in Gherkin deserve renewed attention.

They do not ask you to give up prose. A feature file can still carry context, motivation, explanation, and the “why” behind the behaviour. But it also adds something .md files lack: a deterministic link between behavioural steps and the code that verifies them.

Why deterministic mappings matter more in AI-assisted development

It is tempting to think that LLMs make structured behavioural formats less necessary.

If a model can read prose, infer intent, and generate code or tests from a Markdown spec, why worry so much about the shape of the specification?

The answer is mundane but important: the connection between the spec and the test is being recreated every time, by a process that is probabilistic by design. Most of the time it works. Sometimes it doesn't. And when it doesn't, the symptom is a passing test suite that is no longer testing the thing the spec describes.

This is where Gherkin changes the picture.

GivenWhen, or Then step line in a .feature file is not a sentence the test framework re-reads each time. It is a stable anchor that maps to a specific code implementation in a fixed, explicit way. Once that anchor exists, the prose around the step can move, expand, and get rephrased without disturbing the verification layer. The behavioural contract stays where you put it.

This is why Gherkin gets more relevant, not less, as more code generation enters the workflow. The more interpretation you delegate, the more you benefit from having one layer that nothing has to re-interpret.

Gherkin: prose and a deterministic mapping

.feature file is not a sterile sequence of Given/When/Then. It can carry context, motivation, rule descriptions, and the why behind the behaviour. The structure is there for the parts that need to be exact. The prose is there for everything else.

prose Feature: Checkout
Customers should only complete checkout when their basket is ready to become an order.
This helps prevent invalid purchases and gives the customer clear feedback when something
still needs attention before payment can begin.
 
prose Rule: A customer cannot check out with an empty basket
Starting checkout without any items should be blocked immediately.
The customer should understand why the action was rejected and what they need to do next.
 
prose Scenario: Attempting to check out without any items
The customer goes straight to checkout with a basket that has never contained any items.
 
mapped Given the customer has an empty basket
When the customer attempts to check out
Then the checkout should be rejected
And the customer should see "Your basket is empty"
 
prose Scenario: Returning to checkout after removing the last item
The customer previously had items in the basket but removed the final item before trying again.
 
mapped Given the customer had items in the basket
And the customer removes the last item from the basket
When the customer attempts to check out
Then the checkout should be rejected
And the customer should see "Your basket is empty"
 
prose Rule: A customer must provide delivery details for shippable items
Orders that require shipping must include enough delivery information to reach the customer.
Missing details should be caught during checkout, before the order can continue.
 
prose Scenario: Attempting to check out with an incomplete delivery address
The customer has items ready to purchase, but the delivery details are missing a postal code.
 
mapped Given the customer has the following basket:
| product | quantity |
| Espresso cups | 2 |
| Coffee grinder | 1 |
And the customer provides the following delivery details:
| full name | street | city | postal code | country |
| Anna Reed | 14 Market Street | Bristol | | United Kingdom |
When the customer attempts to check out
Then the checkout should be rejected
And the customer should see "Enter a valid postal code"

The prose lines explain. The Given/When/Then lines, quoted parameters, and data tables anchor. You do not have to pick one. That is the actual advantage — not that Gherkin is "structured prose," but that it lets explanation and verification share the same artifact without blurring into each other.

Start inside the .feature file

One of the easiest mistakes to make is to assume that prose belongs in .md files, and Gherkin only becomes useful later, once the behaviour is clear enough to express in Given/When/Then steps.

That is too narrow a view of what a .feature file can do.

You can start directly inside the .feature file.

A Feature description can capture the business capability and why it matters. A Rule description can explain the business constraint or expectation. A Scenario description can hold a half-formed case you are still thinking through, before any step lines exist.

Feature: Checkout
Customers should only complete checkout when their basket is ready to become an order.
This helps prevent invalid purchases and gives the customer clear feedback when something
still needs attention before payment can begin.

Rule: A customer cannot check out with an empty basket
Starting checkout without any items should be blocked immediately.
The customer should understand why the action was rejected and what they need to do next.

Scenario: Attempting to check out without any items
The customer goes straight to checkout with a basket that has never contained any items.

Scenario: Returning to checkout after removing the last item
The customer previously had items in the basket but removed the final item before trying again.

That is a valid, runnable .feature file. It has no executable steps yet. It is doing exactly the job a .md spec would do at this stage — except it is already living in the format that will carry the verification later.

When the behaviour sharpens, you add steps in place:

Feature: Checkout
Customers should only complete checkout when their basket is ready to become an order.
...

Rule: A customer cannot check out with an empty basket
...

Scenario: Attempting to check out without any items
The customer goes straight to checkout with a basket that has never contained any items.

Given the customer has an empty basket
When the customer attempts to check out
Then the checkout should be rejected
And the customer should see "Your basket is empty"

That is the key move.

The prose is already in place. The behavioral context is already in place. And now the specification starts to gain deterministic anchors.

So the workflow is not "write prose first, then convert it later."

It is "start with prose inside the .feature file, then progressively elaborate it into verifiable behaviour."

That is a much stronger fit for modern AI-assisted development.

It keeps explanation and verification close together. It avoids unnecessary format switching. And it lets the same specification evolve from early thinking into something concrete enough to map directly to tests and code.

What the anchor actually buys you

It helps to be concrete about what "deterministic mapping" means in practice.

A Gherkin step like Then the customer should see "Your basket is empty" is not a sentence the test layer needs to interpret on every run. It corresponds to one specific piece of verification — a function, a method, a binding, depending on your stack — that says, in code, here is what "should see" means for this system. Once that correspondence exists, two useful properties fall out:

  • Prose around the step is free to evolve. Rewrite the Rule description, expand the scenario narrative, sharpen the explanation. None of it touches the verification. The test still proves what it proved yesterday.
  • A change to the step itself is a visible change. If Then the checkout should be rejected becomes Then checkout is rejected with an empty-basket error, that is no longer a wording tweak — it is a behavioural edit. Good tooling makes that edit impossible to miss.

That second property is the one that matters most in an AI-assisted workflow. With a pure prose spec, an LLM can rephrase a sentence in passing and quietly change what the test is supposed to check. With Gherkin, the wording of a step is the anchor — changing it changes the contract, and that is visible.

The cost of getting these properties is real. The mapping between a step and its code needs to be explicit enough that the step means one concrete thing when it appears in a spec. But unlike interpretation costs, that cost does not compound: once the mapping exists, every future scenario that uses the step inherits the same anchor for free.

Closing takeaway

Prose is having a moment, and rightly so. For discovery, rationale, and early thinking, it is the right tool — and LLMs make it more useful, not less.

But once behaviour needs to be verified and safely evolved, prose on its own leaves the verification layer being rebuilt from interpretation every cycle. By contrast,  .feature files give you the same prose plus one thing prose can't: a deterministic anchor between what is described and what is checked.

The more generation you put into your workflow, the more that single explicit anchor is worth.


If you are curious about a modern, compile-time way to put this into practice in a JVM project — Gherkin in, JUnit tests out, no runtime glue — that is what SpecBinder is for. More at specbinder.dev.

About the author
Dmytro Stasyuk
SpecBinder author