When Specifications Need More Than Prose

June 17, 2026 · 9 min read

Specification-Driven Development has settled on .md files for a good reason. Markdown is flexible, lightweight, and easy to reshape while a feature is still forming, for humans and LLMs alike. Expand a section, reorganize a few thoughts, add an open question. Prose is excellent at helping you think.

But behavioural specifications have a different job.

Once a spec starts describing behaviour that must be tested, verified, and changed safely over time, flexibility stops being enough. In an AI-assisted workflow, every meaningful change to a prose-heavy spec may need to be interpreted again before it turns into code or automated tests.

That interpretation may be perfectly reasonable. Most of the time, it probably will be. But the resulting verification still depends on a fresh reading of the prose, and that leaves room for subtle drift that can be difficult to spot.

This is where .feature files written in Gherkin deserve renewed attention.

They do not ask you to give up prose. A feature file can still carry context, motivation, explanation, and the “why” behind the behaviour. But it also adds something .md files lack: a deterministic link between behavioural steps and the code that verifies them.

Why deterministic mappings matter more in AI-assisted development

It is tempting to think that LLMs make structured behavioural formats less necessary.

If a model can read prose, infer intent, and generate code or tests from a Markdown spec, why worry so much about the shape of the specification?

The answer is mundane but important: the connection between the spec and the test is being recreated every time, by a process that is probabilistic by design. Most of the time it works. Sometimes it doesn't. And when it doesn't, the symptom is a passing test suite that is no longer testing the thing the spec describes.

This is where Gherkin changes the picture.

A Given, When, or Then step line in a .feature file is not a sentence the test framework re-reads each time. It is a stable anchor that maps to a specific code implementation in a fixed, explicit way. Once that anchor exists, the prose around the step can move, expand, and get rephrased without disturbing the verification layer. The behavioural contract stays where you put it.

This is why Gherkin gets more relevant, not less, as more code generation enters the workflow. The more interpretation you delegate, the more you benefit from having one layer that nothing has to re-interpret.

Gherkin: prose and a deterministic mapping

A .feature file is not a sterile sequence of Given/When/Then. It can carry context, motivation, rule descriptions, and the why behind the behaviour. The structure is there for the parts that need to be exact. The prose is there for everything else.

prose  ╭  Feature: Checkout
     │      Customers should only complete checkout when their basket is ready to become an order.
     │      This helps prevent invalid purchases and gives the customer clear feedback when something
     ╰      still needs attention before payment can begin.
 
prose  ╭    Rule: A customer cannot check out with an empty basket
     │        Starting checkout without any items should be blocked immediately.
     ╰        The customer should understand why the action was rejected and what they need to do next.
 
prose  ╭      Scenario: Attempting to check out without any items
     ╰          The customer goes straight to checkout with a basket that has never contained any items.
 
mapped ╭        Given the customer has an empty basket
     │          When the customer attempts to check out
     │          Then the checkout should be rejected
     ╰          And the customer should see "Your basket is empty"
 
prose  ╭      Scenario: Returning to checkout after removing the last item
     ╰          The customer previously had items in the basket but removed the final item before trying again.
 
mapped ╭        Given the customer had items in the basket
     │          And the customer removes the last item from the basket
     │          When the customer attempts to check out
     │          Then the checkout should be rejected
     ╰          And the customer should see "Your basket is empty"
 
prose  ╭    Rule: A customer must provide delivery details for shippable items
     │        Orders that require shipping must include enough delivery information to reach the customer.
     ╰        Missing details should be caught during checkout, before the order can continue.
 
prose  ╭      Scenario: Attempting to check out with an incomplete delivery address
     ╰          The customer has items ready to purchase, but the delivery details are missing a postal code.
 
mapped ╭        Given the customer has the following basket:
     │            | product         | quantity |
     │            | Espresso cups   | 2        |
     │            | Coffee grinder  | 1        |
     │          And the customer provides the following delivery details:
     │            | full name | street            | city    | postal code | country        |
     │            | Anna Reed | 14 Market Street  | Bristol |             | United Kingdom |
     │          When the customer attempts to check out
     │          Then the checkout should be rejected
     ╰          And the customer should see "Enter a valid postal code"

The prose lines explain. The Given/When/Then lines, quoted parameters, and data tables anchor. You do not have to pick one. That is the actual advantage — not that Gherkin is "structured prose," but that it lets explanation and verification share the same artifact without blurring into each other.

Start inside the `.feature` file

One of the easiest mistakes to make is to assume that prose belongs in .md files, and Gherkin only becomes useful later, once the behaviour is clear enough to express in Given/When/Then steps.

That is too narrow a view of what a .feature file can do.

You can start directly inside the .feature file.

A Feature description can capture the business capability and why it matters. A Rule description can explain the business constraint or expectation. A Scenario description can hold a half-formed case you are still thinking through, before any step lines exist.

Feature: Checkout
  Customers should only complete checkout when their basket is ready to become an order.
  This helps prevent invalid purchases and gives the customer clear feedback when something
  still needs attention before payment can begin.

  Rule: A customer cannot check out with an empty basket
    Starting checkout without any items should be blocked immediately.
    The customer should understand why the action was rejected and what they need to do next.

    Scenario: Attempting to check out without any items
      The customer goes straight to checkout with a basket that has never contained any items.

    Scenario: Returning to checkout after removing the last item
      The customer previously had items in the basket but removed the final item before trying again.

That is a valid, runnable .feature file. It has no executable steps yet. It is doing exactly the job a .md spec would do at this stage — except it is already living in the format that will carry the verification later.

When the behaviour sharpens, you add steps in place:

Feature: Checkout
  Customers should only complete checkout when their basket is ready to become an order.
  ...

  Rule: A customer cannot check out with an empty basket
    ...

    Scenario: Attempting to check out without any items
      The customer goes straight to checkout with a basket that has never contained any items.

      Given the customer has an empty basket
      When the customer attempts to check out
      Then the checkout should be rejected
      And the customer should see "Your basket is empty"

That is the key move.

The prose is already in place. The behavioral context is already in place. And now the specification starts to gain deterministic anchors.

So the workflow is not "write prose first, then convert it later."

It is "start with prose inside the .feature file, then progressively elaborate it into verifiable behaviour."

That is a much stronger fit for modern AI-assisted development.

It keeps explanation and verification close together. It avoids unnecessary format switching. And it lets the same specification evolve from early thinking into something concrete enough to map directly to tests and code.

What the anchor actually buys you

It helps to be concrete about what "deterministic mapping" means in practice.

A Gherkin step like Then the customer should see "Your basket is empty" is not a sentence the test layer needs to interpret on every run. It corresponds to one specific piece of verification — a function, a method, a binding, depending on your stack — that says, in code, here is what "should see" means for this system. Once that correspondence exists, two useful properties fall out:

Prose around the step is free to evolve. Rewrite the Rule description, expand the scenario narrative, sharpen the explanation. None of it touches the verification. The test still proves what it proved yesterday.
A change to the step itself is a visible change. If Then the checkout should be rejected becomes Then checkout is rejected with an empty-basket error, that is no longer a wording tweak — it is a behavioural edit. Good tooling makes that edit impossible to miss.

That second property is the one that matters most in an AI-assisted workflow. With a pure prose spec, an LLM can rephrase a sentence in passing and quietly change what the test is supposed to check. With Gherkin, the wording of a step is the anchor — changing it changes the contract, and that is visible.

The cost of getting these properties is real. The mapping between a step and its code needs to be explicit enough that the step means one concrete thing when it appears in a spec. But unlike interpretation costs, that cost does not compound: once the mapping exists, every future scenario that uses the step inherits the same anchor for free.

Closing takeaway

Prose is having a moment, and rightly so. For discovery, rationale, and early thinking, it is the right tool — and LLMs make it more useful, not less.

But once behaviour needs to be verified and safely evolved, prose on its own leaves the verification layer being rebuilt from interpretation every cycle. By contrast, .feature files give you the same prose plus one thing prose can't: a deterministic anchor between what is described and what is checked.

The more generation you put into your workflow, the more that single explicit anchor is worth.

If you are curious about a modern, compile-time way to put this into practice in a JVM project — Gherkin in, JUnit tests out, no runtime glue — that is what SpecBinder is for. More at specbinder.dev.

Why deterministic mappings matter more in AI-assisted development​

Gherkin: prose and a deterministic mapping​

Start inside the .feature file​

What the anchor actually buys you​

Closing takeaway​

Why deterministic mappings matter more in AI-assisted development

Gherkin: prose and a deterministic mapping

Start inside the `.feature` file

What the anchor actually buys you

Closing takeaway