A Watermark You Can Crop Is Not Provenance
On 10 June 2026 the European Commission published its Code of Practice on marking artificial-intelligence-generated content under Article 50 of the European Union Artificial Intelligence Act, and the engineering question it raises is whether a label that survives a screenshot is a fact or a hope.
The label arrives before the proof
On 10 June 2026 the European Commission published its Code of Practice on marking and labelling artificial-intelligence-generated content under Article 50 of the European Union (EU) Artificial Intelligence Act. The Act becomes fully applicable on 2 August 2026, and the transparency duty at its centre is simple to state. When a machine produces an image, an audio clip, a video, or a body of text, a person interacting with that output should be able to know a machine made it. The Code gives providers and deployers a route to demonstrate they have met that duty. It is a serious and welcome instrument. It also exposes a question the engineering community has avoided for years, because the most common way to satisfy a labelling rule is to add a watermark, and a watermark you can crop, recompress, or strip is a label, not provenance.
What Article 50 actually asks for
Article 50 of the Act covers transparency obligations for certain artificial-intelligence systems. Providers of systems that generate synthetic audio, image, video, or text must ensure their outputs are marked in a machine-readable format and detectable as artificially generated or manipulated. Deployers who use such systems to produce deepfakes, or text published to inform the public on matters of public interest, carry their own disclosure duties. The newly published Code of Practice is the soft-law bridge between that statutory language and the things an engineer can build. It points toward technical solutions such as watermarking, metadata, and provenance signals, and it asks that those solutions be robust, interoperable, and reliable as far as the state of the art allows.
The phrase that matters is reliable as far as the state of the art allows, because it quietly admits the hard part. A marking scheme is only as trustworthy as its resistance to removal. If the mark can be erased by an ordinary user with ordinary tools, the obligation has been met on paper and defeated in practice. The Code is honest enough to gesture at this gap. The industry response, so far, has not been. A great deal of the labelling discussion treats the mark as the deliverable, when the deliverable a regulator actually needs is an answer that holds up when the file is contested.
Why most watermarks do not survive contact with the world
There are two families of marking in common use, and both fail in instructive ways. The first is metadata, a field written into the file header that records the generating model and a timestamp. Metadata is trivial to read and equally trivial to discard. A screenshot strips it. A re-encode strips it. Uploading to a platform that rewrites files on ingest strips it. The mark was never bound to the pixels or the samples, only stapled to the container, and containers are disposable.
The second family is the perceptual or steganographic watermark, a pattern woven into the content itself so that it survives mild edits. This is more durable, and the better schemes are genuinely clever. But durability is a spectrum, not a guarantee. Cropping removes the regions that carry the signal. Heavy recompression degrades it. Adversarial passes designed to scrub watermarks already exist, and they will improve faster than the watermarks do, because the attacker only has to win once per file while the defender has to win every time. A perceptual watermark answers the question, is this content marked. It does not answer the question that liability turns on, which is who produced this, with what system, at what moment, and whether it has been altered since.
Provenance is a property of creation, not a sticker added afterwards
The distinction the Code of Practice circles without naming is the distinction between a label and a record. A label is applied to a finished artefact and travels with it only as long as nothing removes it. A record is created at the moment the artefact is, signed by the producer, and bound to the bytes by a cryptographic hash, so that any later change breaks the binding and the break is detectable by anyone who checks. Labels degrade. Records either verify or they do not. There is no middle state to exploit.
Genuine compliance with the spirit of Article 50, not merely its letter, has four properties. The provenance must be signed at the moment of creation, not reconstructed afterwards from logs that can be edited. It must be signed under keys the operator holds, so the proof belongs to the party who is accountable and cannot be forged by a third party. It must be verifiable offline by anyone, with no call back to the vendor who issued it, because a proof you can only check by asking the issuer is not a proof, it is a permission slip. And it must be bound to the content, so that cropping, recompressing, or editing the output is not a way to escape the mark but a way to invalidate it visibly. A regulator, a journalist, or a court should be able to take the file and the record, run a check on their own machine, and get a yes or a no.
How Mickai signs the act, not the artefact
Mickai is a Sovereign Intelligence Operating System (SIOS), built, live, and production-ready today. Its answer to content provenance is not a watermarking module bolted onto the output stage. It is the Open Audit Record (OAR), an append-only, hash-chained ledger in which every generated output and every action is signed before it executes. The signature uses Federal Information Processing Standards Publication 204 (FIPS 204), the Module-Lattice Digital Signature Algorithm at security level 65 (ML-DSA-65), a post-quantum standard from the United States National Institute of Standards and Technology (NIST). The operator's keys live in a Trusted Platform Module (TPM) on hardware the operator owns. The record is the proof, and the proof is made at the instant of creation, which is the only instant when it can be made honestly.
Because the OAR is hash-chained, each record is bound to the content it describes and to the record before it. Alter the output and the hash no longer matches. Try to rewrite a past record and the chain after it breaks. Verification does not depend on Mickai being online or even existing: a browser-resident verifier compiled to WebAssembly checks any record entirely offline, with no network, so a third party can confirm what a machine made without trusting the party who made it. The same discipline runs through the rest of the system. The fifty brains, twenty-five domain specialists and twenty-five operational brains, act through that audit layer rather than around it, and Sentinel applies authority-at-execution, gating dangerous actions at the moment they run and requiring several brains to agree before anything destructive proceeds. Provenance here is a property of the substrate that produced the content, fixed at the moment of production, not a marking applied to the content afterwards and hoping nobody peels it off.
From record to public ledger
A record signed and verifiable on the operator's own machine answers the local question of provenance. The harder question, for content that circulates beyond any one operator, is where the record anchors when the file has travelled far from its source. Mickai's answer is Pantheon, a sovereign Layer 1 blockchain written in Rust on the Polkadot software development kit, with the audit record as a native consensus object rather than a payload bolted on top. Fifteen Layer-2 application chains run above it, and the audit root is anchored to Bitcoin on a cadence, so a provenance claim can be checked against a public, tamper-evident root that no single party controls. The network token is PAN, with a fixed total supply of five billion. Pantheon is the one part of the architecture being brought to mainnet; the SIOS that produces the records, and the OAR that signs them, are already live.
What this means for 2 August and after
When the Act becomes fully applicable on 2 August 2026, organisations that generate synthetic media at scale will reach for the cheapest mechanism that lets them say they marked their output. For many, that will be a metadata field and a perceptual watermark, and for a while it will pass. The trouble is that the standard the Code gestures toward, robust, reliable, and as good as the state of the art allows, is a moving target, and the state of the art for stripping marks is moving faster than the art of embedding them. The first time a marked deepfake circulates with its mark cleanly removed, the gap between a label and a record stops being an engineering footnote and becomes a question of accountability under the Act.
The honest engineering answer was always cryptographic provenance signed at creation, under keys the operator holds, verifiable offline by anyone, bound to the content so removal is detectable. The Code of Practice has now given that answer a regulatory reason to exist. A watermark tells you a mark is present until someone removes it. A signed, hash-chained record tells you what was made, by what system, at what moment, and whether it has changed since, and it keeps telling you that on any machine, with the cable pulled, long after the system that produced it has moved on. One of those is provenance. The other is a sticker, and the difference is the whole point of the law.


