# Working Non-Deterministically In A Deterministic System

Published `May 17, 2026`\
By `Ted L (Zaneris), Co-founder of Villagely`

AI systems are probabilistic. Production software is not.

We as software developers have been building deterministic systems our entire lives, a + b = c. For every possible input we can test and confirm every possible output. The worst things we ever had to deal with were the dreaded "edge cases" often resulting from human input into our controlled forms getting mixed into the loop, but these were still systems we had control over. Now we're tasked with wiring, effectively, non-deterministic systems into our programs (I know some of you will say: but AI models *are* deterministic! and I'll get into that later). Your first instinct is to wrap it with control and try to force it to do our bidding (OpenAI's own guidance recommends using structured outputs), layers upon layers of structure and enforcement, but more often than not, this actually leads to a far more brittle overall system.

Over the past 2 years of integrating AI into legacy systems (and not just chat bots, AI doing real work hidden from users), I've learned the best approach is not trying to force these models to do your bidding, but to instead, lean into their non-deterministic nature.

## Split the Workflow by Real Risk

Multi-step AI workflows are necessary, but stages shouldn't exist just because we like local parsing. They should reflect actual shifts in operational risk.

Getting models to reliably search the web for specific information is notoriously tricky. If you give a model a complex task and a web search tool, it will often wander into the weeds or skip the search entirely.

When building a translation service, the standard text wasn't the problem. The problem was translating the names of real organizations with real official bilingual names; so, we broke the web search out into a highly restricted flow:

1. **Isolate the targets (No web tool):** A standard model scans the text and extracts only the candidates needing external verification.
2. **Perform the search (Search-specific model):** Pass those candidates to a dedicated model (like `gpt-5-search-api`) whose *only* job is to search the web and return the official truth.
3. **Generate the final output (No web tool):** Feed those grounded search results back into a standard model to produce the final translation.

This pattern isolates the risk. What *doesn't* work is adding extra stages whose primary purpose is just to locally reinterpret model output before handing it to the next step.

## Send Smaller, Context-Rich Inputs

Models perform better when the payload matches the task. Task-shaped DTOs like a lightweight record beat heavy, database-mapped entities almost every time.

* **Strip the noise:** Remove unrelated properties, persistence artifacts, and internal identifiers.
* **Maximize context:** Pass relevant background information alongside your prompts. Swap arbitrary values (like enum ints) for their descriptive names to ground the model.
* **Format for readability:** Indented JSON is worth the tokens. It dramatically improves both debugging and model behavior.

## Lean Into Non-Determinism in the Middle

Treat intermediate AI outputs as *guidance*, not as mini-databases that local code must formalize.

The traditional, "responsible" approach looks like this: *parse -> normalize -> relabel -> validate -> reserialize -> pass to next step*. Every single one of those decisions invites path normalization bugs, alias drift, and fragile parser logic.

Instead, keep the handoffs thin. **Model output drifts; that is normal.**

* Accept plain JSON or fenced JSON.
* Tolerate harmless wrapper text.
* Feed raw responses directly forward when the next model step can interpret them just fine.

Guidance can stay loose. If you tolerate response drift in these advisory stages, you can strip out a massive amount of needless parsing from the middle of your workflow.

## The Illusion of Model Determinism

Getting back to my point from the intro, yes, strictly speaking, LLMs are deterministic. If you freeze your prompt, set the temperature to zero, lock the seed, and use a fixed model version, you will get the exact same output every time.

Production workflows don't run in a vacuum. The moment we start injecting external, unpredictable data into our prompts, like raw database rows, or live web search results, the model effectively becomes a probability engine. We can guess at the range of outputs we might get, similar to guessing how a real human might respond, but we simply can't predict every structural permutation.

Because of this, parsing failures are inevitable. You can't prevent them entirely; you can only reduce their impact through a reiterative feedback loop. You log the raw real-world responses, see what the models are actually doing in the wild, adjust your parsing logic to tolerate those harmless variations, fail gracefully when the output is truly unusable, and apply retry logic where it makes sense.

## Keep Determinism at the Boundary

The model should not directly mutate application state. That is where deterministic code still matters deeply.

Your application boundary is the exact moment model output stops being "text" and starts being "state." Local code should be ruthlessly strict here: only extract what you actually need, and ignore minor structural variations that don't matter toward the end goal.

If the model tweaked something outside the narrow surface you care about, don't reject the whole response. Deterministically extract the target fields and throw the rest away.

## Operational Realities: Diagnostics and UX

* **Log Artifacts, Not Summaries:** Detailed logging turns a fragile workflow into a repeatable one. Capture the exact request JSON per stage, prompt instructions, raw model responses, and the final applied values.
* **Test the Boundary:** The best tests verify if the final parser tolerates expected response drift and if the workflow degrades gracefully when an early stage fails.
* **Show the Work:** Long-running operations need visible progress. If you are rendering this in a UI, real step labels (e.g., "Identifying candidates...", "Searching official records...") beat generic loading spinners. Don't obscure progress with a modal.

## Closing Thoughts: Stop Fighting the Model

As developers, we spend our entire careers learning how to build strict contracts. We use strong typing, rigid schemas, and fail-fast validation to make our systems predictable. So when we introduce an LLM into a workflow, our immediate instinct is to treat it like just another microservice that needs to be wrestled into a deterministic box.

But fighting the probabilistic nature of these models is a losing battle. The more rigid the plumbing between your AI steps, the more brittle the entire system becomes. You don't make an AI workflow reliable by forcing it to act like traditional code at every single handoff.

The shift that actually works is letting go of control in the middle. Let the models be a little messy. Let the intermediate payloads drift. Let the system run non-deterministically where it's safe, and save your hard, unyielding constraints for the exact moment that text turns back into application state.

The winning architecture isn't the one that finally forces the AI to behave perfectly. It's the one that stops trying.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://villagely.gitbook.io/blog/2026/working-with-ai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
