Article · 12 June 2026

The Case Against the Inference API

A remote inference endpoint is someone else's off switch wired into the heart of your product. Here is why the inference that matters belongs on hardware you own, behind a record you can verify yourself.

Author

Micky Irons

Published

12 June 2026

Follow Micky Irons

LinkedIn X

sovereign AIinferenceon-device AIAI securityvendor lock-in

The off switch is not in your hands

Here is the question I ask every founder who tells me their product is built on artificial intelligence (AI): if the company behind your inference endpoint changed its terms tomorrow, raised its prices fivefold, deprecated the model you depend on, or simply went dark, what would your product do? Most of them go quiet. They have built a business on a remote computer they do not own, cannot inspect, and cannot keep running. They call it infrastructure. I call it a tenancy, and the landlord holds the only key.

The inference application programming interface (API) is the most quietly consequential dependency of the decade. It is elegant. You send text, you get text back, you pay by the token. No graphics processing units (GPUs) to rack, no models to host, no operations team awake at three in the morning. That convenience is real. But convenience and control are different currencies, and the industry has spent the last few years trading away the second to buy more of the first. We are building the next generation of critical software on top of remote endpoints that any provider can rate-limit, re-route, retrain, or retire at will. That is not a platform. That is a permission slip, and it can be revoked.

Dependency is just someone else's decision, delayed

When your inference runs on a remote API, you have outsourced four things at once, and most teams only notice when one of them breaks. You have outsourced availability, because the endpoint can be down when your customers are not. You have outsourced economics, because the price per token is set by someone optimising for their margin, not your runway. You have outsourced behaviour, because the model can be swapped or fine-tuned underneath you and the same prompt can quietly start returning different answers. And you have outsourced confidentiality, because every request, including the parts of it you would never put in an email, travels to a machine governed by a contract you did not write.

None of these are hypothetical. We have all watched model versions get deprecated with a few weeks of notice. We have watched providers tighten usage policies and reject the exact category of request a customer built their workflow around. We have watched latency climb during a regional outage that had nothing to do with the customer's own systems. A security realist does not assume bad faith from the provider. The provider does not need to act in bad faith. It only needs to act in its own interest, on its own timeline, and your product inherits every one of those decisions whether you agree with them or not. Dependence is simply someone else's future decision that has not reached you yet.

The threat model nobody puts on the slide

Security people are trained to ask a simple question: who can make this stop working, and what do I have to trust for it to keep working? Point that question at a remote inference API and the honest answer is uncomfortable. You are trusting the provider's uptime, its pricing committee, its policy team, its legal department, its lawful-intercept obligations in whatever jurisdiction its servers sit, and its willingness to keep serving the specific model weights your behaviour was validated against. That is a long trust chain, and every link sits outside your perimeter. In security we have a name for a single party who can read your data and halt your operations on a whim. We do not usually call it a vendor. We call it a single point of failure, and we design to remove it.

There is a quieter risk underneath the obvious ones. When inference is remote, you cannot prove what actually happened. You get an answer, but you cannot independently verify which model produced it, whether it was the version you certified, or whether the request was logged, mirrored, or used to train the next generation. For a regulated business this is not a philosophical worry. From August 2026 the European Union (EU) Artificial Intelligence Act places real obligations on high-risk AI systems, and the broader direction of AI liability law is clear: you will increasingly be expected to evidence how an automated decision was made. "The API told us" is not evidence. It is a shrug with an invoice attached, and a regulator will treat it as exactly that.

Sovereignty is an architecture, not a slogan

The answer is not to romanticise self-hosting or pretend the cloud has no place. The answer is to move the part that must not depend on anyone else inside a boundary you actually control. That means inference that runs on hardware you possess, on weights you hold, producing a record you can verify without phoning anyone. This is the principle Mickai is built on. Mickai is a Sovereign Intelligence Operating System (SIOS), built and in production: fifty specialised brains, twenty-five domain and twenty-five operational, running on the Poseidon silicon substrate, on-device, under the operator's control rather than a remote tenancy. It is not a roadmap or a pitch deck. It runs.

On-device inference closes the four leaks at the source. Availability becomes your responsibility instead of someone else's status page. Economics become a fixed capital decision rather than a metered tap a stranger controls. Behaviour stops drifting, because the weights do not change unless you change them. And confidentiality is structural, because the sensitive request never leaves the boundary in the first place. We are not pretending this is free. Holding your own models and silicon is more work than calling an endpoint. The point is that the work buys you something the endpoint can never sell: the off switch stays in your hands, and so does the decision about when to throw it.

Proof you can hold without trusting the vendor

Owning the inference is necessary but not sufficient. You also need to prove what your AI did, in a way that survives scrutiny and does not ask anyone to take your word for it. This is the part the remote-API model structurally cannot give you, and it is the part I care about most. In Mickai, every AI action is captured in the Open Audit Record (OAR). The action is signed before it executes, not after. The records are hash-chained and append-only, so the history cannot be quietly rewritten. The signatures are post-quantum, using the United States National Institute of Standards and Technology standard FIPS 204 (ML-DSA-65), because a record that becomes forgeable in ten years is not a record. And critically, the whole chain is verifiable offline, in an ordinary web browser, with no trust placed in the vendor at all.

That last property is the one that matters when the lights go out or the relationship sours. A remote API can show you a dashboard. A dashboard is a claim. The OAR is a proof, one you can hand to a regulator, an auditor, or a sceptical customer and have them check independently. Mickai also anchors the audit root to Bitcoin through Pantheon, a sovereign Layer 1 with a fixed five billion supply of its PAN token, so the integrity of the record does not even depend on Mickai's continued existence. The whole design assumes the vendor might disappear, and is built so that your evidence does not disappear with it. We are actively training our own models now, fine-tuning and specialising them on a sealed corpus, and the portfolio behind this approach runs to one hundred and four filed United Kingdom patent applications, roughly two thousand three hundred and forty claims, owned by Mickai LTD.

Choose your dependencies before they choose you

I am not asking anyone to delete their API keys this afternoon. I am asking a more specific question: which parts of your product are you willing to let someone else turn off? Use the remote endpoint for the experiments, the throwaway prototypes, the workloads where an outage is an inconvenience and not a catastrophe. That is a sensible use of rented compute, and I have no quarrel with it.

But the inference your business actually depends on, the requests that carry your customers' confidential data, the decisions you will one day have to defend, those belong somewhere you control, on weights you hold, behind a record you can verify yourself. Convenience rented from a stranger is not a foundation. The sovereign option is harder to stand up and far harder to take away, and in the end the only infrastructure you truly own is the infrastructure nobody else can switch off.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/case-against-the-inference-api. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.