MICKAI
Article · 13 June 2026

Red-teaming without verification is theatre

Most artificial intelligence safety claims cannot be independently reproduced. That is not assurance. It is a press release wearing a lab coat, and cryptographic proof is the only honest fix.

Red-teaming without verification is theatre
Author
Micky Irons
Published
13 June 2026
Follow Micky Irons
LinkedInX
AI safetyred-teamingverificationcryptographic assuranceaudit

A passing grade you cannot reproduce is not a result

Here is a small experiment you can run today. Ask any artificial intelligence (AI) vendor for the red-team report behind their headline safety claim. Not the marketing summary. The actual evaluation. The prompts that were tried, the model version that was tested, the date the test ran, the cases that failed, and the cryptographic proof that the report you are holding is the report they produced and has not been edited since. In almost every case you will get a slide, a number, and a confident voice. You will not get something you can check yourself. That gap is the whole game, and most of the industry would prefer you did not notice it.

I have spent enough time building systems and watching them get attacked to hold one opinion firmly. Red-teaming without verification is theatre. It looks like assurance, it uses the vocabulary of assurance, and it produces a feeling of assurance. But a safety claim you cannot independently reproduce is not a finding. It is a press release wearing a lab coat. The rest of this piece is about why that happens, why contracts do not fix it, and what an honest alternative actually looks like.

The comfortable lie of the safety slide

Red-teaming, done honestly, is one of the most valuable things you can do to a model. You put skilled adversaries in front of it, you try to make it leak, deceive, or cause harm, and you write down what broke. The trouble is not the method. The trouble is what happens to the evidence afterwards.

Consider the ordinary chain of custody for a typical safety claim. An internal or contracted team runs an evaluation. Someone selects the results worth showing. Someone writes the deck. A communications team rounds the edges. By the time the claim reaches a regulator, a buyer, or the public, three things have quietly happened. The model under test has probably already been updated, so the artifact you tested no longer exists in production. The failing cases have been summarised into a comforting aggregate. And nothing in the document can be checked against the run that produced it. You are asked to trust the storyteller, not the story.

A security realist learns to distrust exactly this shape. The moment your assurance depends on the goodwill, competence, and memory of the party making the claim, you do not have assurance. You have a relationship. Relationships are lovely. They are also the first thing to fail under pressure, incentive, or acquisition. When the storyteller has every reason to look good and no obligation to be checkable, the prudent assumption is not malice. It is drift, and drift always bends toward the flattering number.

Three failures that make most evaluations unfalsifiable

When I look at why AI safety evidence is so weak, the same structural failures appear again and again. They are not exotic. They are boring, which is why they survive.

  • Version drift. The thing that was red-teamed and the thing you are running are rarely the same artifact. Weights change, system prompts change, guardrails change. A safety result that does not name the exact version it covers is describing a model that may no longer exist.
  • Selective disclosure. You see the cases that passed and a flattering count of the cases that did not. The interesting failures, the ones an attacker would build on, are summarised into oblivion. Absence of evidence is presented as evidence of absence.
  • Mutable records. Even when a full report exists, nothing stops it being quietly rewritten after the fact. Without an append-only, tamper-evident record, a number from last quarter can be improved retroactively and you would never know.

Each of these turns an evaluation from a falsifiable experiment into an unfalsifiable assertion. Science requires that I can, in principle, prove you wrong. Most AI safety claims today are constructed so that I cannot even try. Fix the structure and the failures lose their cover, because every one of them depends on the evidence being soft enough to reshape after the fact.

Close-up of a cracked classical marble face lit by a thin gold rim light against pure black, the fracture line emphasised.
Selective disclosure and version drift fracture an evaluation. The damage is real even when the surface still looks intact.

Contracts protect lawyers, not users

The standard answer to all of this is contractual. The vendor signs an attestation. There is an indemnity clause. There is a compliance certificate and an annual audit. This is the assurance model that most of the technology industry runs on, and I understand why. It is familiar, it is insurable, and it lets everyone sleep.

But a contract is a promise about the past enforced through the courts in the future. It does not stop the harm. It allocates blame after the harm. When an autonomous system makes a decision that hurts someone, an indemnity clause does not un-make the decision. It tells you who pays the lawyers. The European Union (EU) AI Act tightening its high-risk obligations from August 2026, and the broader rise in AI liability exposure, will make those clauses more numerous and more expensive. It will not make them more true. You can have a cabinet full of signed attestations and still have no way to reconstruct what your AI actually did on a given day to a given person.

There is a deeper problem hiding in the migration to post-quantum cryptography that serious institutions are now planning. If the records of how your AI behaved are signed with cryptography that a future computer can forge, then your entire audit history has a shelf life. Assurance that expires is not assurance. It is a countdown. A signature that can be retroactively forged is, in evidentiary terms, no signature at all, and a contract written on top of it inherits the same rot.

Cryptographic assurance changes who you have to trust

The alternative is not more trust. It is less need for trust. Instead of asking you to believe a claim, a serious system should hand you the means to check it, with no faith in the vendor required at all. That is the difference between contractual assurance and cryptographic assurance, and it is the difference between theatre and evidence.

This is the principle we built Mickai on. Mickai is a Sovereign Intelligence Operating System (SIOS), and at its core sits the Open Audit Record (OAR). Every action the system takes is signed before it executes, not after. The records are hash-chained and append-only, so a single altered entry breaks the chain visibly. The signatures use post-quantum cryptography (the United States National Institute of Standards and Technology standard FIPS 204, the Module-Lattice Digital Signature Algorithm ML-DSA-65 scheme), so the evidence does not rot the moment quantum computers mature. And critically, any of it can be verified offline, in an ordinary browser, with no connection to us and no trust in us. If we lied, the maths would tell on us.

Apply that to red-teaming and the theatre falls away. A signed, hash-chained record means the evaluation names the exact model version, because the version is part of what was signed. It means failures cannot be quietly dropped, because the chain would show the gap. It means a result from six months ago cannot be improved retroactively, because the record is append-only and tamper-evident. You stop trusting my summary of the test. You verify the test. That is also why we are training our own models in the open, fine-tuning and specialising open foundations now and building toward fully native weights, so the thing being audited is something we can actually account for end to end.

What I would demand before I believed a single safety claim

You do not need to use what we built to apply the standard. You need to refuse the theatre. So here is the bar I hold our own work to, and the bar I would hold anyone's work to before I let their model near a decision that matters.

A marble hand pressing a seal, an unbroken chain of marble links trailing into darkness, gold rim light on the links.
Cryptographic assurance: signed before execution, hash-chained and append-only, verifiable without trusting the party that made the claim.

Ask for the artifact, not the adjective. Demand the exact version identifier of the model that was tested and confirm it matches what you will run. Require the failing cases, not just the pass rate, because the failures are where the real risk lives. Insist that the record be append-only and tamper-evident, so it cannot be edited after you sign off. Require signatures that survive the post-quantum transition, so the proof does not expire. And demand that you can verify all of it yourself, offline, without trusting the vendor's servers or the vendor's word.

None of this is hostile to good vendors. It is a gift to them. If your safety work is real, verification makes it bankable. If your safety work is theatre, verification is the one thing you cannot survive, which tells you everything about why the industry resists it. The vendors who fight hardest against being checkable are, with remarkable consistency, the ones with the most to hide.

Stop grading the performance, start checking the proof

We are about to hand AI systems real authority over money, health, infrastructure, and law. Sovereignty over those systems does not mean owning a server. It means being able to prove, to yourself and to anyone you answer to, exactly what your AI did and that the proof cannot be edited, faked, or quietly forgotten. That is why the audit root in our architecture is anchored externally through Pantheon, our sovereign Layer 1, which writes the record's root to Bitcoin so its integrity does not rest on our infrastructure either. The conviction runs deep enough that we have filed 101 United Kingdom patent applications, roughly 2,234 claims, owned by Mickai LTD, around this way of building.

Red-teaming is necessary. I will keep doing it and I want everyone else to do more of it. But an evaluation you cannot reproduce, on a version that no longer exists, recorded in a file that can be silently rewritten, is not safety. It is a story told by the people who would be embarrassed by the truth. Demand the signed record. Verify it yourself. If they cannot give you that, you have not been shown a result. You have been shown a performance, and the curtain is about to come down.

Subscribe
Get every new Mickai article by email.

Long-form essays on sovereign AI from Micky Irons. One email per article. No tracking, no marketing, no third parties. Every email includes a one-click unsubscribe link.

Prefer RSS? Subscribe at /articles/feed.xml.

Originally published at https://mickai.co.uk/articles/red-teaming-without-verification-is-theatre. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.
More articles