Article · 19 June 2026

The Real Total Cost of Renting Intelligence

Per-token pricing hides the exit fee, the audit gap and the lock-in.

Author

Micky Irons

Published

19 June 2026

Follow Micky Irons

LinkedIn X

ai total cost of ownershipsovereign aicloud egresscompliance auditvendor lock-in

A per-token price is the most flattering number a vendor can show you. It is small, it is precise, and it is the part of the bill you will spend the least money on over the life of a system. I have watched serious procurement teams approve a platform on the strength of that single figure, then spend the next two years discovering every other line that was never on the quote. The headline rate is the coin in the open hand. The real cost sits in a second ledger under the table, and nobody hands you that one at signing.

This is not a complaint about cloud AI being expensive. Plenty of cheap things are worth buying. It is an argument about what a true cost model has to contain before a regulated organisation can call a decision sound. Build that model honestly and the per-token rate shrinks to a rounding error while four other numbers start to dominate. Egress. Evidence. Control. Exit. Price those, and the whole conversation about renting intelligence changes shape.

The quoted price is the part you will spend the least on

Start with the arithmetic vendors are happy to do for you. Tokens in, tokens out, multiply by a published rate. It is clean, it is auditable, and it is honest as far as it goes. The trouble is that inference cost is the one line in an AI programme that behaves predictably and falls every year. Competition and model efficiency push it down. If your whole comparison rests on the number that is both the smallest today and the fastest shrinking, you have anchored a multi-year decision to the least decisive variable in the system.

AI total cost of ownership is the discipline of refusing that anchor. It asks what the system costs to run, to leave, to defend in front of an auditor and to keep working when the supplier changes something you depend on. Inference is one input among several, and rarely the one that decides whether a deployment was a good idea. The numbers that decide that are quieter, they arrive later, and they are almost never on the slide that won the budget.

Hermes seated at a polished marble counting-table in void black and satin gold, one hand holding a small light coin purse, the other resting on a hidden ledger of unpaid tolls beneath the table edge. — The coin in the open hand is the per-token rate. The second ledger, the one under the table, is where the programme actually pays.

Egress is the toll you only read about on the way out

Data has to move for an AI system to be useful. It moves into the model, and the outputs, embeddings, logs and derived datasets move back out to wherever your business actually runs. Ingress is usually free, because the supplier wants your data in. Egress is metered, because the supplier would prefer it stayed. That asymmetry is not an accident. It is the commercial design of the rented model, and it compounds quietly with every workload you add.

For a single experiment the egress line is trivial. For a production system feeding analytics, retraining loops, replicas in another jurisdiction and a downstream warehouse, it becomes a standing tax on your own information. You pay, month after month, to read back data you generated. Few proof-of-concept budgets model this, because in the pilot the volumes are tiny. The bill only turns real at the exact moment the system becomes valuable, which is the worst possible moment to discover it.

The evidence you cannot produce

Here is the line item that costs nothing until the day it costs everything. In a regulated industry the question is never only whether a model produced an answer. It is whether you can prove, later and to someone hostile, what was asked, which model version replied, on what data, under whose authority and whether anything was altered afterwards. That is not a nice-to-have. In finance, health, law and defence it is the difference between a defensible decision and an indefensible one.

Rented intelligence rarely gives you that record in a form you control. You get usage logs shaped for billing, not a tamper-evident account of consequential actions built to survive cross-examination. When the regulator, the litigant or the internal investigator arrives, the gap is not abstract. It is remediation projects, external counsel, forced disclosure and settlements paid because you could not evidence your own process. A real cost model has to price the absence of a verifiable record, and most pretend that line is zero right up until it is the only line that matters.

“A true total-cost model for regulated work has to put a price on the evidence you cannot produce. Once it does, the cheap option stops looking cheap.”
Micky Irons

The model swap you did not authorise

When you rent a model behind an endpoint, the thing on the other side of that endpoint is not yours and does not hold still. Suppliers deprecate versions, retune behaviour, adjust safety filters and retire capabilities on their schedule, not yours. For consumer use that is fine. For a validated clinical pathway, a credit decision or a control referenced in a filed procedure, a silent change to model behaviour is a change to your system that you neither chose nor recorded.

The cost here is the cost of revalidation you did not plan and cannot decline. Every swap you do not control is a potential re-test of everything you certified on the previous behaviour. You are renting a foundation someone else is entitled to move, and you carry the compliance liability for the building on top. That is a structurally bad trade for anyone whose work gets audited, and it shows up nowhere on a per-token quote.

A close cinematic view of the hidden second ledger beneath the marble table, its gold-leaf columns listing exit charges, egress tolls and revalidation fees against a void black ground. — The unpaid tolls add up in the dark. Egress, revalidation and the audit you cannot produce all post to the same hidden column.

Lock-in is a cost even when you never leave

People treat switching cost as a one-off, paid only if you migrate. That is the wrong way to see it. Lock-in is a continuous cost, because it sets the price of every negotiation you have for as long as you stay. A supplier who knows your exit is expensive prices accordingly, and they are right to. Your renewal terms, your rate increases and your leverage on roadmap and data handling are all functions of how cheaply you could credibly walk.

So the migration you cannot afford is not a hypothetical you will probably dodge. It is a number that taxes you every single day through the terms you are too captured to refuse. The deeper your prompts, fine-tuning, embeddings and surrounding integration are entangled with one supplier's proprietary surface, the higher that standing tax climbs. You pay the exit fee whether or not you ever take the exit.

Building the honest model

So price all of it. A real AI total cost of ownership model for regulated work has line items the quote never mentions, and most of them are larger than the inference it foregrounds. Lay them out plainly and the comparison stops being close.

Inference, the published per-token rate, the only line the vendor volunteers and the fastest to fall.
Egress, the standing toll on reading back your own data, trivial in the pilot and material in production.
Compliance evidence, the cost of every audit, investigation or dispute you cannot answer with a verifiable record.
Revalidation, the unplanned re-testing forced by model changes you did not authorise and cannot decline.
Lock-in, the daily tax of weak negotiating leverage plus the eventual migration priced to be unaffordable.
Residency and jurisdiction, the legal and architectural cost of data sitting where someone else decides it sits.

Notice what happens to the ranking. The number that won the budget falls to the bottom of the list by weight, and the items procurement actually gets audited against rise to the top. That reordering is the entire point. You are not choosing the cheapest tokens. You are choosing the lowest defensible total against the lines a regulator will actually inspect.

Why the owned substrate wins the audited lines

This is where I show my hand, because I built the alternative on exactly these line items. Mickai is a Sovereign Intelligence Operating System, the SIOS, and it inverts the rental model by putting the intelligence on the operator's own hardware. Fifty specialised brains run on machines you control, fully offline-capable, so the data does not have to leave to be useful. Egress as a standing tax disappears, because there is no perpetual toll to read back what is already yours.

The audit gap is closed by design rather than promised by policy. Every consequential action is sealed into a post-quantum Open Audit Record under FIPS 204 ML-DSA-65, a tamper-evident account built to survive the exact cross-examination the rented model leaves you unable to answer. The model swap you did not authorise cannot happen to a model you hold, because nobody behind an endpoint is entitled to move your foundation overnight. And lock-in inverts when the substrate is yours, since the thing you would migrate from is something you already own. None of this is free. Owned hardware is real capital and real operational discipline. The point is that it pays into the columns procurement is audited against, instead of the one that merely looks cheap on a slide.

Hermes pushing the light coin purse aside and laying the hidden second ledger open in full satin-gold light on the marble counting-table, every previously concealed toll now legible against void black. — Put the second ledger on the table in the light. Owned intelligence is the model that survives being read in full.

What procurement actually gets audited against

A procurement decision is not judged on the day it is signed. It is judged later, when an auditor, a regulator or a court asks you to account for it. At that point nobody cares what your per-token rate was. They care whether you can produce the record, whether you controlled the system that produced the answer, whether your data sat where it was lawfully required to sit and whether you were captured by a supplier who set your terms. Those are the lines that survive scrutiny, and they are precisely the lines the rented model handles worst.

So build the honest model before you sign, not after the incident. Put every hidden toll on the table in the light. Cost the egress, cost the evidence you cannot produce, cost the swaps you cannot refuse and the migration you cannot afford. When the second ledger is finally legible, the question is no longer which intelligence is cheapest to rent. It is which intelligence you can still defend when someone with authority asks you to prove it. On that question, the only one that ever gets audited, you want to own the answer.

The marble counting-table at rest in void black and satin gold, the light coin purse spent and set aside, both ledgers reconciled and closed, a single gold coin standing upright in balance. — Both ledgers reconciled at last. The cheapest token was never the price. Ownership was.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/the-real-total-cost-of-renting-intelligence. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.