Article · 13 June 2026

Provenance for a Model You Did Not Train

A February 2026 audit found 95.8 per cent of public models missing the records needed to know their origin, so the only provenance left to win is provenance of the action.

Author

Micky Irons

Published

13 June 2026

Follow Micky Irons

LinkedIn X

ai-provenanceeu-ai-actnative-weightsaudit-recordmodel-supply-chain

A February 2026 audit found 95.8 per cent of public models cannot tell you where they came from

In February 2026, an audit of 124,278 artificial intelligence supply chains found that 95.8 per cent of models on the Hugging Face hub were missing the licence text, attribution and provenance records needed to know where they came from. The gap is not cosmetic. Regulators have stopped accepting statements of intent and started demanding evidence, and the European Union (EU) Artificial Intelligence Act now requires supply-chain documentation for high-risk systems. The awkward arithmetic for most organisations is this. You are accountable for what a model does in your environment, but the model arrived with no usable record of its own lineage, and you did not train it. The audit quantified a problem that was already structural. The thing you are answerable for cannot account for itself.

This is the ordinary condition of enterprise artificial intelligence, not an edge case. The overwhelming majority of deployed systems run on models whose training data provenance the operator cannot independently vouch for. The honest position is to admit that you cannot reconstruct a clean origin story for the weights after the fact. The useful position is to notice that you do not need to. You can sign what the model did at the moment it did it. Provenance of the action is achievable even when provenance of the weights is not.

Two different claims that the industry keeps confusing

There are two separate provenance questions, and conflating them is why so much of the current debate goes nowhere. The first is provenance of the weights. Where did this model come from, what data trained it, under what licence, with what attribution. The February audit measured the first question and found it largely unanswerable for public models. The second is provenance of the action. On this input, at this time, under whose authority, this exact output was produced by this exact model version. The first is a property of the artefact's past. The second is a property of an event you control.

The mistake is to treat a failure on the first question as fatal to the second. It is not. A hospital that runs a model it did not train still controls the moment the model touches a patient record. A bank still controls the moment a model approves a transaction. The lineage of the weights may be murky, but the event is yours, it happens on your hardware, and it can be recorded with the same rigour whether the model was trained in your own laboratory or sourced elsewhere. You cannot retroactively clean a model's origin. You can refuse to let its actions be unaccountable.

Why you cannot fix the weights after the fact

It is worth being precise about why the first question resists a clean answer. Training data for most published models was assembled from sources that were never catalogued for downstream legal use. Licences were attached loosely or not at all. Fine-tunes were stacked on fine-tunes, each layer obscuring the one beneath. By the time a model reaches a hub as a set of weights, the information needed to reconstruct its provenance has mostly evaporated, and no certificate bolted on afterward can recover what was never recorded. A maker's mark added retroactively is a claim, not evidence.

This is the trap the audit exposed. Organisations are being asked, increasingly under law, to document a supply chain whose upstream links were never built to be documented. Demanding perfect weight provenance from a public model is, in many cases, asking for a record that does not exist and cannot be manufactured honestly. The responsible engineering move is to stop trying to forge the past and start sealing the present. The question shifts from where did this come from, which you often cannot answer, to what exactly did it do here, which you always can.

Signing the action, not the artefact

Mickai treats the action as the unit of provenance. Mickai is a Sovereign Intelligence Operating System (SIOS), built, live, and production-ready today. It runs on our own specialised sovereign models, hardened on a sealed corpus through fine-tuning and distillation, and Mickai is actively training its own models now. Crucially, the provenance guarantee does not depend on which of those the operator is running. Whatever the lineage of the weights, every action is sealed into the Open Audit Record (OAR) at the moment it executes.

The Open Audit Record is an append-only, hash-chained audit ledger. Before an action runs, it is signed with FIPS 204 ML-DSA-65, a National Institute of Standards and Technology (NIST) post-quantum standard, using operator keys held in a Trusted Platform Module (TPM) on owned hardware. The record captures the input, the output, the model version that produced it, and the authority under which it ran. A browser-resident verifier checks any record offline, so a regulator, an auditor, or a court can confirm the seal without trusting Mickai's word for it. The model's origin may be uncertain. The event is not. You can prove what was produced, on what input, under whose authority, even for a model you did not train.

What the regulator actually needs to see

The EU Artificial Intelligence Act asks for supply-chain documentation for high-risk systems, and the instinct is to read that as a demand for impossible weight provenance. Read more carefully, the obligation that bites in practice is the ability to show what a system did, to whom, and under what controls, when something goes wrong. That is an action-provenance requirement wearing a supply-chain label. An operator who can produce a signed, tamper-evident record of every model action, including the model version, the input, and the human or machine authority that permitted it, has answered the question that matters for accountability, even where the upstream training record is thin.

This is the difference between a statement of intent and evidence, which is precisely the shift the regulators have made. A policy document that says the organisation uses models responsibly is intent. A hash-chained ledger of every action, each one signed before execution and verifiable offline by the regulator, is evidence. The second survives an adversarial audit. The first does not. The February finding that most public models lack provenance records is a reason to make the action layer unimpeachable, not a reason to abandon the field.

Authority, not just a log

A record of what happened is necessary but not sufficient, because some actions should never happen at all. Mickai pairs the audit record with authority-at-execution. Dangerous actions are gated, and several brains must agree before such an action runs. Mickai's architecture comprises fifty brains, twenty-five domain specialists and twenty-five operational, including the eight-brain Chronus Kernel and the two Custodians, MNEMOSYNE and AESCULAPIUS, running on the Poseidon silicon substrate. Sentinel, a Mickai capability, stops agents wiping or exfiltrating data. So the record does not merely note that a model did something irreversible. It demonstrates that the action passed an authority gate before it executed, and that the gate, like the action, is sealed.

This matters most precisely when the model's lineage is unknown. If you cannot fully vouch for how a downloaded model will behave, the defensible posture is to constrain what it is permitted to do and to seal every permission decision alongside every action. Murky provenance upstream is a reason for stronger authority controls downstream, not weaker ones. The operator who owns the hardware, the keys, and the audit chain holds the controls that the model's original trainer never could.

Anchoring the record beyond the operator

A sealed record is strongest when its integrity does not rest on the operator's own goodwill. Mickai anchors the audit chain to Pantheon, a sovereign Layer 1 written in Rust on the Polkadot software development kit (SDK), where the audit record is a native consensus object rather than an afterthought. Fifteen Layer-2 application chains sit above it, and the audit root is anchored to Bitcoin. The native token, PAN, has a fixed supply of five billion. The practical consequence is that the question of whether a record was altered after the fact stops being a matter of trusting the organisation that holds it. The chain settles it.

This closes the loop the February audit opened. The hub cannot tell you where most models came from, and no honest retrofit will change that. What you can build is a substrate where every action a model takes, regardless of its origin, is signed before it runs, gated by authority, recorded in an append-only ledger, and anchored to an external consensus that the operator cannot quietly rewrite. The provenance of the weights remains, for now, an industry-wide gap. The provenance of the action becomes a solved engineering problem.

The achievable standard

The lesson of the February 2026 audit is not that sovereign artificial intelligence is unaccountable. It is that the industry has been asking the wrong artefact to account for itself. You cannot reconstruct a model's training history from its weights, and pretending otherwise produces certificates worth nothing under scrutiny. But you are not accountable for the model's past. You are accountable for what it does under your authority, on your hardware, today. That action can be signed, gated, recorded, and anchored, and it can be verified by anyone, offline, without trusting you. Provenance of the weights may be lost. Provenance of the action is a choice. Mickai makes the achievable standard the operating standard, so the operator can stand behind every output a model produces, even a model they did not train.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/provenance-for-a-model-you-did-not-train. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.