MICKAI
Article · 21 June 2026

Model Collapse and the Provenance of the Action

When data cannot be trusted, the fix is not a better filter. It is a signed, anchored record of where every action came from.

Model Collapse and the Provenance of the Action
Author
Micky Irons
Published
21 June 2026
Follow Micky Irons
LinkedInX
model collapseAI provenancesynthetic dataOpen Audit Recordpost-quantum cryptography

Train a model on the world, and it learns the world. Train the next model on the first model's output, and it learns a copy of a copy. Do this a few times and the distribution narrows, the rare cases vanish, and the system grows confident about a world that no longer exists. Researchers call this model collapse. It is what happens when synthetic data quietly contaminates the pool that future models drink from, and nobody kept a record of which water came from which spring.

The usual framing treats this as a data hygiene problem. Filter harder, watermark the outputs, keep a clean human corpus in reserve. Those are sensible tactics, but they miss the deeper failure. The real question is not whether a given datapoint is synthetic. It is whether you can prove where any datapoint, decision, or model action came from at all. When provenance is unknown, every downstream guarantee is a guess. Model collapse is the headline symptom. The disease is the loss of a trustworthy record of action.

A carved white marble figure of Mnemosyne, goddess of memory, lit by a single hard gold rim light against pure black, holding a still tablet as fragments of lesser copies dissolve into haze behind her
Mnemosyne, memory as the antidote to the copy of a copy. When the record of origin survives, collapse has nowhere to hide.

Why filtering is not enough

Watermarking and classifier based filtering both assume the contamination is legible. In practice it is not. Synthetic text gets paraphrased, translated, summarised, and re run through other models until any signal degrades. A watermark survives one hop and rarely survives five. Worse, the incentives run the wrong way. The cheapest way to scale a dataset is to generate more of it, so the share of machine origin content rises precisely where budgets are tightest and scrutiny is weakest.

The structural fix is to stop asking each datapoint to prove its own innocence and instead bind every consequential action to a record that cannot be quietly rewritten. If a model generated a sample, that event should leave an immutable trace. If a human curated, labelled, or approved it, that should too. Provenance becomes a property of the system, not a forensic guess made after the fact.

From clean data to accountable action

This is the shift Mickai is built around. Mickai is a Sovereign Intelligence Operating System, a SIOS that runs fifty specialised AI brains (twenty five domain and twenty five operational) on the operator's own hardware, fully offline capable. Because the substrate owns the full lifecycle of an action rather than renting it from an opaque API, it can do something an ordinary pipeline cannot. It can seal the action itself.

Themis, blindfolded, sculpted in white marble with bronze scales held level, a hard gold rim light tracing her arm against deep void black, weighing a glowing token against a featureless copy
Themis weighing the sealed action against the unaccountable one. Provenance is not a label on the data, it is a verdict the system can defend.

The Open Audit Record

Inside Mickai, every consequential action is written to the Open Audit Record, the OAR. Each entry is sealed and signed with FIPS 204 ML-DSA-65, the published NIST post quantum signature standard. Mickai did not invent that standard, it adopts it, which is the point. The cryptography is open, reviewable, and resistant to the quantum future that will eventually break the signatures protecting today's data. A generation event, a labelling decision, a model update, each one carries a signature that says, verifiably, this happened, in this order, under this key.

Apply that to the collapse problem and the picture changes. You no longer need to detect synthetic data after it has already poisoned the pool. The record tells you, for any sample, whether it was produced by a model or curated by a person, when, and by which brain. Training sets can be filtered on signed provenance rather than on a fragile statistical guess. The clean spring stays clean because every cup is stamped at the source.

Hephaestus at a dim forge carved in marble and bronze, a single gold rim light catching the hammer mid strike, sealing a glowing mark into a tablet amid volumetric haze and wide dark negative space
Hephaestus stamping the action at the moment it is forged. A signature applied at creation cannot be argued with later.

Anchoring the record so it cannot be rewritten

A signed record is only as trustworthy as its resistance to tampering by the operator who holds it. This is where Pantheon comes in. Pantheon is Mickai's own sovereign, Bitcoin anchored Layer 1, with a native token PAN and a fixed supply of five billion. At intervals it takes a hash commitment of the audit record and anchors that single fingerprint to Bitcoin, borrowing the most expensive to forge ledger in existence as a permanence guarantee.

It is worth being precise about what this does and does not do, because the distinction is the whole architecture. Pantheon does not move Bitcoin and it is not a Bitcoin Layer 2. It commits a hash, nothing more. Anchoring is not spending. The only thing that crosses to Bitcoin is a fingerprint of the record, which is enough to prove after the fact that the history was not altered, and useless for anything else. The provenance of every action becomes checkable against a timeline no single party can quietly rewrite.

What this buys against collapse

Stack the layers and the failure mode closes. The OAR makes each action attributable. Post quantum signatures keep that attribution durable. Pantheon's anchor makes the whole sequence tamper evident. A model trained on this substrate can refuse to learn from samples whose provenance does not check out, and an auditor can later prove which data shaped which weights. Collapse depends on contamination going unnoticed. Here it cannot go unnoticed, because nothing consequential happens without leaving a signed, anchored trace.

Poseidon rising from black water carved in white marble, trident driven down into stone, hard gold rim light along his shoulder, a single bright point anchoring a chain of fainter marks into the depths
Poseidon anchoring the record to bedrock. Once committed, the history of every action holds against the tide.

Provenance as the real moat

The accountability is engineered, not asserted. Mickai's approach is backed by a portfolio of 101 filed UK patent applications carrying around 2,234 claims, owned by Mickai LTD, with Micky Irons named as inventor. The portfolio is evidence that the sealing, signing, and anchoring described here are specified mechanisms rather than marketing, but the mechanisms are what matter.

Model collapse will get worse before the industry takes provenance seriously, because the economics reward generation and punish bookkeeping. The systems that survive the synthetic flood will not be the ones with the cleverest filters. They will be the ones that can prove, action by action, where everything came from. Mickai treats that proof as the foundation rather than an afterthought, which is why the question it answers is not how do we spot fake data, but how do we make every action accountable in the first place.

Subscribe
Get every new Mickai article by email.

Long-form essays on sovereign AI from Micky Irons. One email per article. No tracking, no marketing, no third parties. Every email includes a one-click unsubscribe link.

Prefer RSS? Subscribe at /articles/feed.xml.

Originally published at https://mickai.co.uk/articles/model-collapse-and-the-provenance-of-the-action-when-data-cannot-be-trusted. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.
More articles