Article · 12 June 2026

The Breach That Changes the Model

Most security thinking still guards the doors and the data. The attack that should worry you walks past both and rewrites the model itself, quietly, while every dashboard stays green.

Author

Micky Irons

Published

12 June 2026

Follow Micky Irons

LinkedIn X

model integrityAI securityweight tamperingpost-quantum cryptographysovereign AI

The breach that leaves no broken window

I have sat through enough incident reviews to know the shape of a normal breach. Something is missing or something is loud. Data leaves the building. A service falls over. A ransom note appears. There is a broken window, and the whole discipline of security is built around finding broken windows quickly and boarding them up. We are good at this now. We are not good at the breach that breaks nothing, and that is the one I want to talk about.

Think about what an artificial intelligence (AI) model actually is. It is a large array of numbers, the weights, learned over an expensive training run and then frozen and shipped. Those numbers are the product. They are also, from a security point of view, the softest target in the entire stack, because almost nobody is watching them. We watch the prompts going in. We watch the answers coming out. We watch the network, the keys, the access logs. The weights themselves sit in a file or a region of memory, and we assume that the model we deployed on Monday is the model still answering on Friday. That assumption is the gap, and it is wide enough to drive an attack through.

Why tampering with the model beats tampering with the data

Most published AI attacks aim at the inputs. Prompt injection, jailbreaks, adversarial examples, poisoned retrieval. These are real, and they get attention because they are easy to demonstrate and easy to film. But they share a weakness from the attacker's side. They are noisy and they are fragile. A jailbreak that works today gets patched tomorrow. A poisoned document gets caught in review. The attacker has to keep showing up at the front door, and every visit is a chance to be seen.

Tampering with the model itself is a different class of move. If I can alter the weights, even slightly, I do not need to show up again. I change the behaviour at the source and then I leave. A handful of adjusted parameters can install a trigger, a specific phrase or pattern that flips the model into a behaviour the operator never sanctioned, while the model performs normally on every test the operator actually runs. This is the part people miss. The benchmark still passes. The demo still impresses. The model answers your hundred evaluation questions correctly and then, on the one input the attacker cares about, it does something else. You did not detect it because there was nothing to detect. The window was never broken. The lock was quietly replaced with one that opens for a key only the attacker holds.

And the surface for this is growing, not shrinking. Models are downloaded from public hubs in enormous volume. They are quantised, merged, fine-tuned, and re-uploaded by people you have never met. A single tampered checkpoint can propagate through that ecosystem the way a poisoned library spreads through a software supply chain, except that with software we at least have the habit of checking signatures and pinning versions. With model weights, most teams pull a file, confirm it loads, see good outputs, and ship. The integrity of the artefact is taken on faith, and faith is not a control.

The reason your monitoring will not save you

Here is the uncomfortable engineering truth. The standard defences are built to measure outputs, and a competent weight attack is designed to keep the outputs looking right. Drift detection assumes degradation, but this is not degradation, it is a clean substitution. Output monitoring assumes the bad behaviour is visible in the distribution of answers you happen to sample, but a targeted trigger is, by construction, absent from everything except the trigger. Even comparing a hash of the file at load time only tells you the file matches the file you loaded, not that the file you loaded is the file you trained. If the compromise happened upstream, in the supply chain, your hash is faithfully confirming the attacker's work.

There is a deeper problem underneath all of this, and it is one a security realist learns early. You cannot audit what you did not record at the moment it mattered. If the only evidence you have is produced after the incident, by the same system that was compromised, it is not evidence. It is a story the system is telling you about itself. A tampered model can produce a perfectly clean log of its own good behaviour, because the thing writing the log is the thing that was changed. Trusting a compromised system to report on its own integrity is the oldest mistake in the book, and we are about to make it at industrial scale with AI.

What regulation is quietly demanding

This is not only an engineering concern, it is becoming a legal one. The European Union (EU) Artificial Intelligence Act brings real obligations for high-risk systems into force through 2026, and the direction of travel is unambiguous. Operators of consequential AI will be expected to keep records, demonstrate integrity, and account for what their systems did and why. Liability for AI harm is rising across jurisdictions in parallel. The question moving from the security team to the boardroom is no longer just whether the model is accurate. It is whether you can prove what the model was, and what it did, at the moment it acted, to someone who does not trust you and is under no obligation to.

Add to this the migration already under way to post-quantum cryptography. The cryptographic signatures we rely on today have a shelf life, and records that must survive for years, audit trails, evidence, anything a regulator or a court might one day open, need to be signed in a way that does not quietly expire when the mathematics moves on. An integrity record that cannot outlive the cryptography that protects it is not an integrity record. It is a deferred problem with a date stamped on it, and that date is closer than most roadmaps admit.

Record first, trust later

So what actually defends against a breach you cannot see? Not a better detector. You cannot detect your way out of an attack designed to be invisible to detectors. The answer is to change what counts as proof. Stop asking the system to tell you it is fine after the fact, and start binding every consequential action to a record that is created before the action happens, that cannot be edited afterwards, and that anyone can check without trusting the people who built the system.

This is the principle we built Mickai on, and I will be plain about why it matters here rather than dress it up. Mickai is a Sovereign Intelligence Operating System (SIOS), built and running in production, and at its core sits the Open Audit Record (OAR). Every AI action is signed before it executes, not narrated after. The records are hash-chained and append-only, so the order is fixed and nothing can be quietly removed or rewritten later. They are signed with post-quantum cryptography, the United States National Institute of Standards and Technology (NIST) standard Federal Information Processing Standard (FIPS) 204, the Module-Lattice Digital Signature Algorithm at security level three (ML-DSA-65), so the proof does not rot as the cryptography advances. And the whole chain is verifiable offline, in an ordinary web browser, with no trust placed in us as the vendor. If a weight had been swapped, if an action did not match the model and the policy it claimed to run under, the record would not reconcile, and you would not need our word for it. You would have the mathematics.

We anchor that audit root through Pantheon, our sovereign Layer 1 network, down to Bitcoin, so the integrity of the record does not depend on any single company continuing to exist or behave. Pantheon carries its own token, PAN, with a fixed supply of five billion. That is the point of the word sovereign. The proof belongs to the operator, not to the supplier. The 50 brains that run on our Poseidon substrate, 25 domain and 25 operational, are subject to the same discipline, and we are actively training our own specialised sovereign models now, hardening them on a sealed corpus while building toward fully native weights, because a clever architecture that still asks you to trust it has solved nothing.

The shift you have to make

I am not telling you to stop watching inputs and outputs. Keep doing it. I am telling you that it is the easy half of the problem, and the industry has spent almost all its attention there because that half is visible and fundable. The hard half is the integrity of the model itself, and the breach that lives in that half is the one that will not trip an alarm, will not show in a dashboard, and will pass the demo on its way to costing you a great deal. We have institutionalised the habit of guarding the door while leaving the thing behind it unattended.

The breach you will not detect changes the model. That sentence is meant to sit uncomfortably, because there is no monitor you can buy that fixes it. There is only a different way of working, where proof is created at the moment of action, sealed so it cannot be rewritten, and made checkable by people who owe you nothing. Build that, and the invisible attack stops being invisible. Skip it, and you are trusting a system to vouch for itself, which is precisely what an attacker who has changed your model is counting on you to do.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/the-breach-that-changes-the-model. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.