A founder showed me a feature last month. It used a large language model to decide whether a number was greater than another number. The call took 800 milliseconds, cost a fraction of a cent every time, occasionally got it wrong, and replaced a comparison operator that would have run in nanoseconds, free, forever, correctly. The investor deck called it "AI-powered."
This is the tax of the moment. Every product is racing to add AI, and a large share of what gets added makes the product slower, more expensive, and less reliable than the code it replaced. The pressure to ship "AI features" is real — boards ask, competitors announce, the market rewards the press release. None of that changes the engineering question: does this feature need a probabilistic model, or did you just attach one to look modern.
The default should be deterministic
Start from the opposite of the current fashion. Deterministic code is the default. AI is the exception you justify.
Deterministic code has properties that are easy to take for granted until you give them up. It returns the same output for the same input, every time. It runs in microseconds, not seconds. It costs nothing per call. It fails loudly and predictably when it fails. You can unit-test it to exhaustion. You can reason about it.
The moment you replace that with a model call, you trade all of it away. You get nondeterminism, latency measured in hundreds of milliseconds to seconds, a per-call cost that scales with usage, failure modes that are confident and silent, and a component you can only test statistically. Sometimes that trade is worth it. Often it is not. The discipline is making yourself say why every time.
Where deterministic code wins, decisively
Anything with a correct answer. Math, sorting, filtering, formatting, date arithmetic, parsing structured input, applying business rules. If the answer is computable, compute it. A model that does arithmetic is slower, costlier, and less accurate than the arithmetic. There is no axis on which it wins.
Anything that must be exactly right every time. Tax calculations. Permission checks. Pricing. Financial reconciliation. Anything where "right 99 percent of the time" is a euphemism for "wrong on one customer in a hundred, silently." A probabilistic component in a deterministic requirement is a liability with a marketing label.
Anything latency-sensitive in the hot path. If a user is waiting on this, every model call you add is hundreds of milliseconds they feel. Validating a form field, autocompleting from a known list, routing a request — these want to be instant, and instant rules out a network round-trip to a model.
Anything high-volume and cheap-per-unit. A model call costs a fraction of a cent. At ten requests a day, irrelevant. At fifty million requests a day, that fraction of a cent is a budget line that can dwarf your entire compute spend — to do something a lookup table did for free.
Where AI earns its cost and latency
AI is worth its tax when the problem is genuinely one that deterministic code is bad at — where the input is unstructured, the rules are too numerous or too fuzzy to enumerate, and "approximately right, most of the time" is an acceptable and useful outcome.
Unstructured input that resists rules. Extracting meaning from free-text support tickets, summarizing a long document, classifying sentiment, pulling fields out of a messy PDF, understanding a natural-language query. You cannot write the regex for "what is this customer actually upset about." A model can approximate it, and approximate is useful here.
Open-ended generation. Drafting, rewriting, translating, suggesting. There is no single correct output, so nondeterminism isn't a defect — it's the point. The human in the loop catches the misses.
Fuzzy matching and ranking at the edges of structure. Semantic search, deduplication of records that don't match exactly, recommending related items. Where the relationship is real but not expressible as an equality check.
Problems where the rule set is genuinely intractable. Some classification problems have ten thousand edge cases and no clean decision boundary. A model trained on examples can outperform any rule set a human could maintain. The test is whether you've actually tried and failed to write the rules, not whether writing them sounded tedious.
The litmus test
Before any AI feature ships, run it through four questions. If you can't answer all four, you're decorating, not building.
-
Could deterministic code do this acceptably well? If yes, build that. The bar is "acceptably," not "perfectly" — a rule set covering 95 percent of cases plus a fallback often beats a model covering 97 percent with worse latency, cost, and debuggability.
-
What does being wrong cost? Map the downside. If a wrong output is a mild annoyance the user can ignore or correct, AI's error rate is tolerable. If a wrong output is a wrong invoice, a compliance breach, or a safety issue, the error rate is not tolerable and you need either deterministic logic or a human gate.
-
Can you afford the latency and cost at your real scale? Not demo scale. The volume you'll hit in eighteen months. Multiply the per-call cost and the per-call latency by that number and see if you still like the feature.
-
Can you measure whether it's working? If you can't define and track an accuracy or quality metric, you can't tell when the feature degrades. Shipping an unmeasurable model into production is shipping a thing you will never know is broken.
The reputational cost of confident wrongness
There is a failure mode specific to AI that deterministic systems mostly don't have: being confidently, fluently wrong. A crashed feature tells the user it's broken. A model that hallucinates a wrong answer tells the user, in complete sentences and a confident tone, something false — and the user believes it, because it's well-written.
This is worse than a visible error. A 500 page erodes patience. A confident wrong answer erodes trust, and trust doesn't come back. A user who catches your AI feature inventing a fake citation, a wrong number, or a policy that doesn't exist stops trusting every output it produces — including the right ones. You don't get graded on average accuracy. You get graded on the worst answer a user remembers.
Deterministic systems fail safe and visible. AI systems can fail confident and invisible. That asymmetry is the strongest argument for keeping the AI surface small and well-gated, and for putting validation and human review around anything where confident wrongness has a cost.
What fixed looks like
A product where AI shows up exactly where it earns its place and nowhere else. The arithmetic is arithmetic. The permission checks are deterministic. The pricing is computed. The form validation is instant and free. And the AI is doing the things only AI can do — reading unstructured input, generating drafts, understanding fuzzy queries — each one gated by a measured quality metric and, where the cost of error is real, a validation layer or a human.
Every AI feature in the product survived the four-question test. None of them are there because the deck needed a bullet point. The latency budget is spent where users get value, not on a model deciding whether five is bigger than three.
This is for you if
You're shipping a product where AI is part of the roadmap and you want it placed by engineering judgment instead of board pressure. The work — auditing where AI belongs, building the deterministic core properly, and gating the genuine AI surface with measurement and fallbacks — typically sits inside a $50k+ build, and the savings often show up as the AI features you didn't ship and the compute bill you didn't sign up for.
This is not for you if your goal is an "AI-powered" label for a fundraise regardless of whether it improves the product. That's a positioning exercise, and you don't need an engineering firm for it. It's also not for you if you've already decided every feature needs a model and you want someone to implement that — we'll tell you which half to delete, and that's not the engagement you asked for.