Article · 13 June 2026

An Unverifiable Model Output Is Not Evidence

When AI-derived results reach a courtroom, disclosure stops being paperwork and becomes the whole case. Why a signed, offline-verifiable record is the minimum standard justice was always going to demand.

Author

Micky Irons

Published

13 June 2026

Follow Micky Irons

LinkedIn X

AI evidencedisclosure obligationsforensic accountabilityOpen Audit RecordEU AI Act high-risk

An Unverifiable Model Output Is Not Evidence

The question a defence lawyer will ask

Picture a courtroom. A model flagged a face in a crowd, ranked a suspect, or scored a piece of seized data as incriminating. The prosecution leans on that output. Then the defence stands up and asks one quiet question. Show me how the machine reached that conclusion, show me the inputs it saw, show me that nobody changed the record after the fact, and show it to me in a way I can check myself, today, without taking the vendor's word for anything. If the answer is a glossy report and a confident expert, the case has a problem. I have built my working life around that single question, because I think it is about to reshape how artificial intelligence (AI) is allowed to touch the justice system at all.

My name is Micky Irons. I founded Mickai and I run it. I am going to make an argument that sounds narrow but is not. When AI output becomes evidence, disclosure is everything, and an output that cannot be verified independently should not survive contact with a serious court. That is not hostility to AI in policing. It is the opposite. It is the only path by which AI in policing earns the right to be there. The systems that survive the next decade of scrutiny will not be the cleverest ones. They will be the ones that can account for themselves under cross-examination, line by line, to a stranger who trusts nobody in the room.

Evidence is not a result, it is a chain

People who do not work in justice tend to think of evidence as a fact. The knife was here. The car was there. But evidence in a courtroom is not a fact, it is a chain of custody plus a chain of reasoning, and both of those chains are on trial as much as the defendant is. A bloodstain proves nothing if the bag it travelled in was open for three hours and signed by nobody. A confession proves nothing if it was obtained in a way the court cannot reconstruct. The substance is necessary. The provenance is what makes it admissible. Strip the provenance away and you are left with a claim, and a claim is not the same animal as proof.

AI output is the same, only worse, because there is no physical object to anchor it. When a forensic model produces a score, that score is the end of a long, invisible process. There were inputs, a specific model version, weights, a threshold, a configuration, an operator who pressed a button at a particular time. Every one of those is a link in a chain. And here is the uncomfortable truth that the AI industry has spent years avoiding. For most deployed systems, that chain does not exist in any reviewable form. The output simply appears. The court is asked to treat the arrival of a number as if it were the arrival of a witness. It is not. A witness can be questioned, contradicted, and tested. A number that materialises from a black box with no recorded lineage can only be believed or disbelieved, and belief is precisely what a courtroom is built to avoid.

What disclosure actually demands

Disclosure is the legal duty to hand over material that could undermine the prosecution's case or assist the defence. In most common-law systems this duty is not optional and it is not negotiable, and failures to disclose are one of the most reliable ways to collapse a prosecution or overturn a conviction on appeal. Disclosure exists because the state holds most of the cards. The person on trial does not have the police database, the laboratory, or the model. Disclosure is the mechanism that forces a rough equality of information so that the contest is fair. Take that mechanism away and the trial stops being a contest and becomes a recital.

Now apply that to an AI-derived result. Proper disclosure of a model output is not the output. It is the inputs the model received, the exact version of the system that ran, the parameters and thresholds in force, the time and identity of the action, and crucially a guarantee that none of this was edited after the event to make the case look tidier than it was. If the prosecution cannot produce that package, it has not disclosed. It has merely asserted. And a court that understands what it is looking at should treat an unverifiable AI output the way it treats hearsay from an anonymous source. Interesting, perhaps, but not something a person should lose their liberty over. The duty does not soften because the source is a computer. If anything it hardens, because the computer cannot be sworn in and cannot be cross-examined.

The opacity problem, stated plainly

Let me be a security realist about the technology itself. Modern machine learning models are not auditable by inspection. You cannot read the weights of a large neural network and narrate why it ranked one person above another, any more than you can read a brain scan and recover a memory. This is not a flaw that better documentation fixes. It is the nature of the thing. So when a vendor says their system is explainable, what they usually mean is that they have bolted a second model on to guess at the first model's reasons. That is a story about the output, not a proof of it. A plausible story and a verified fact look identical on a slide and behave very differently under oath.

This matters because courts run on a particular kind of trust. They trust process, not personality. A fingerprint examiner does not win by being confident. They win because the method is documented, repeatable, and open to challenge by an opposing expert who can run the same steps. AI, as currently deployed in most agencies, fails this test at the first hurdle. The opposing expert cannot run the same steps, because the steps were never recorded, the model version is not pinned, and the logs, where they exist at all, sit inside the same system that produced the result and could in principle be rewritten by whoever controls that system. Trust me is not a forensic standard. It is the absence of one. The whole point of a forensic standard is to remove the need for trust, and a system that smuggles trust back in through the side door has not met it.

A concrete failure, walked through

Take a realistic scenario. A force uses an AI tool to triage thousands of seized images and surface the ones likely to be relevant to a charge. The tool flags forty. An officer reviews them, the case proceeds, and at trial the prosecution describes the AI triage as part of how the evidence was found. The defence asks the obvious questions. Which version of the model ran. How did it behave on this category of material. Were the forty the model's top forty, or were they filtered by a threshold someone chose, and who chose it, and when. Show me the log of that run, and show me that the log has not been altered since. These are not exotic questions. They are the same questions a competent advocate has asked about every forensic method for a century, simply pointed at a new kind of instrument.

In most current deployments, the honest answers are uncomfortable. We are not sure which version. The threshold was a default nobody documented. The run was not logged in a tamper-evident way, and the records we do have live in a database administrators can edit. None of that means the defendant is innocent. It means the AI-derived portion of the case is built on sand, and a careful judge may exclude it, or worse, the whole investigation inherits the doubt. The output was probably right. The problem is that probably right is not a legal category. Demonstrably accountable is. A justice system that cannot tell the difference between the two will eventually convict on the strength of a number it could not check, and that is the failure mode every safeguard in evidence law exists to prevent.

The regulators are arriving, and they are not subtle

This is not a distant worry. The European Union (EU) Artificial Intelligence Act (AI Act) classifies a great deal of law-enforcement and justice-adjacent AI as high-risk, and the substantive obligations for high-risk systems land in 2026. Those obligations are, in plain language, the obligations of evidence. Keep records. Maintain logs over the system's lifetime. Ensure human oversight that is real rather than decorative. Be able to explain and trace what the system did. The regulators have, in effect, written the disclosure requirement into product law. A system that cannot produce a trustworthy account of its own actions will not just lose in court. It will be non-compliant before it ever reaches one.

Run alongside this the steady rise in AI liability generally, and the migration the security world is quietly undertaking toward post-quantum cryptography, because the signatures and seals we rely on today must survive an adversary with tomorrow's computing power. Put those trends together and a clear shape emerges. The record of what an AI system did is becoming the regulated artefact, the litigated artefact, and the artefact that must remain verifiable for years, possibly decades, against attackers we cannot yet see. That is a demanding specification. Most logging was never designed to meet it, because most logging was built to help engineers debug a system, not to defend a conviction against a determined challenge a decade later.

Why logs are not enough

At this point a reasonable engineer says, fine, we will log everything. I have spent enough time near security to know why that is not the answer. An ordinary log is a record written after the action, stored in a place the operator controls, in a format the operator can change. It answers the question what do we say happened. It does not answer the question what actually happened and can you prove nobody touched the answer. Those are different questions, and the gap between them is exactly where wrongful convictions and collapsed prosecutions live. Volume does not close that gap. A million log lines that an administrator could rewrite are no more trustworthy than one.

Three properties separate a forensic record from a convenient one. First, the record must be made before or at the moment of the action, not reconstructed afterwards, so it cannot be quietly back-filled to fit the story. Second, it must be tamper-evident in a way that does not depend on trusting the system that produced it, so that any later edit is mathematically obvious rather than a matter of opinion. Third, it must be verifiable by an outsider with no special access and no relationship to the vendor, ideally with ordinary tools, so the defence expert and the judge can check it themselves. A log that fails any one of these is, for evidential purposes, a story the operator tells about themselves. Useful for operations. Worthless under cross-examination.

What I built, and why it is shaped this way

Mickai is a Sovereign Intelligence Operating System (SIOS), built and live. I will not pretend it is a neutral observer in this argument, because I designed it precisely around the conviction I have just laid out. The core of it is something we call the Open Audit Record (OAR). The principle is simple to state and hard to do. Every AI action is signed before it executes, not after. The signature is hash-chained to the ones before it, so the record is append-only, and any attempt to alter or remove an entry breaks the chain in a way anyone can see. The signing uses post-quantum cryptography, specifically the United States National Institute of Standards and Technology (NIST) standard Federal Information Processing Standard 204 (FIPS 204), the module-lattice digital signature algorithm at security level sixty-five (ML-DSA-65), so the record is built to outlast the cryptography it was born with. Signing before execution is the whole game. A record made after the fact can always be shaped to fit the outcome. A record committed before the action exists cannot.

The part I care about most is the last one. The Open Audit Record is verifiable offline, in an ordinary web browser, with no trust placed in Mickai and no live connection to us. A defence expert can take the record, check the chain and the signatures on their own machine, and satisfy themselves that what they are looking at is exactly what happened, in the order it happened, unaltered. That is the difference between disclosure as a press release and disclosure as a proof. We anchor the root of that audit history to an independent sovereign Layer 1 chain we call Pantheon, which in turn anchors to Bitcoin, so the record's integrity does not depend on our continued existence or goodwill. Pantheon carries a fixed-supply token, PAN, capped at five billion, and it is the one piece of the system still being built. The record, the signing, and the offline verification are live today. None of this is theoretical, and the accountability layer carries weight beyond the engineering. The portfolio behind Mickai runs to one hundred and four filed United Kingdom patent applications, roughly two thousand three hundred and forty claims, owned by Mickai LTD with myself as the named inventor, and I deliberately say filed, not granted, because the honest word matters more than the flattering one.

The honest caveats, because they matter

I want to be careful here, because overclaiming in this field is how trust dies. A signed, verifiable record proves what the system did and that the record was not altered. It does not prove the system was right. A model can be biased, badly trained, or simply wrong, and sign a flawed conclusion with perfect cryptographic hygiene. Accountability is necessary, it is not sufficient. What the Open Audit Record does is make the model's behaviour examinable, which is the precondition for ever catching that it was wrong. You cannot challenge what you cannot see. The record makes it seeable. The challenge still has to happen, conducted by humans who know what they are doing, and no signature absolves anyone of that work.

There is a second caveat I will name plainly. None of this removes the human from the loop, and it should not. The right posture for AI in justice is not automation, it is assistance with an audit trail. The officer still decides. The lawyer still argues. The judge still rules. What changes is that every machine contribution to those decisions carries its own provenance, so that when someone asks the quiet question I opened with, there is a real answer rather than a confident shrug. We are also actively training our own specialised sovereign models now, hardening them on a sealed corpus we control, with funding scaling that work toward fully native weights, and I would rather say that honestly than pretend the substrate is finished in every dimension. The accountability layer is the part I am willing to stake the argument on.

What this means for anyone deploying AI near a courtroom

If you run a force, a laboratory, a regulator, or a company selling into any of them, the practical test is now short. Before an AI output is allowed anywhere near a charging decision or a courtroom, ask whether you can produce, on demand, the inputs, the exact system version, the parameters, the time, and the operator, sealed so that tampering is visible, and verifiable by an outsider who trusts nobody in the room. If the answer is yes, you have evidence. If the answer is no, you have an opinion wearing the costume of evidence, and a competent defence will undress it. This is not a procurement nicety to be added later. It is the difference between a tool that strengthens a case and one that quietly poisons it.

So I will end where I began, with the quiet question. When AI output becomes evidence, disclosure is everything, and disclosure that cannot be independently verified is not disclosure at all. The future of AI in policing does not belong to the most powerful model. It belongs to the most accountable one. A signed record that any stranger can check, offline, against an anchor no single party controls, is not a feature I am selling. It is the minimum standard the justice system was always going to demand, the moment it understood what it had let into the building. I would rather build for that moment now than apologise for missing it later, and I would rather the people in the dock be judged on records that can be checked than on numbers that can only be believed.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/unverifiable-model-output-is-not-evidence. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.