Watermarks Wash Out. Signed Logs Do Not: The Real Provenance Stack For 2026
The EU's marking regime leans on metadata and watermarking. Both degrade in transit. The durable layer is the tamper-evident log held inside your own perimeter.
By Micky Irons
Take an image with a SynthID watermark and a C2PA manifest. Upload it to LinkedIn. Download it again. The manifest is gone, stripped in the platform's compression pipeline, and depending on how the image was resized the watermark is degraded too. You are now holding a file that two of the sanctioned provenance layers can no longer speak for. This is not a corner case. For the major social platforms in 2026 it is the normal path a piece of content takes to reach a human being.
I build Mickai, a Sovereign Intelligence Operating System that regulated organisations own and run inside their own walls. Provenance is not an abstract debate for us. It is the thing an auditor asks us to produce when they want to know what a model did, when, and on whose instruction. So I want to be precise about which layers of the emerging provenance stack survive contact with the real world, and which do not.
What the EU is actually asking for
The relevant instrument is Article 50 of the EU AI Act, Regulation (EU) 2024/1689. It requires providers of AI systems that generate synthetic audio, image, video or text to mark the output in a machine-readable format and make it detectable as artificially generated. Those obligations apply from 2 August 2026. The AI Omnibus provisional agreement of May 2026 gave generative systems already on the market before that date until 2 December 2026 to meet the machine-readable marking requirement, so there is a short runway for existing systems, not a reprieve. New systems entering the market on or after 2 August 2026 get no grace period.
To turn the article into practice, the Commission's AI Office is drafting a Code of Practice on the marking and labelling of AI-generated content. The second draft, published on 5 March 2026, sets out a layered approach with secured metadata and digital watermarking at the core, plus optional measures such as fingerprinting, logging and verification protocols to help detect AI-generated material. The draft points to open standards for marking, and C2PA Content Credentials and Google's SynthID sit squarely inside that technical picture.
Read that carefully. The regulator is stacking methods because it already knows no single one holds. That is the honest reading of a layered mandate. It is an admission, written into policy, that each individual method is fragile.
The metadata and watermark layers degrade in transit
The fragility is not a slur, it is measured.
C2PA metadata is cryptographically signed and genuinely useful at the moment of creation. But it lives in the file's metadata, and metadata is the first thing distribution pipelines throw away. Instagram, X, LinkedIn, TikTok and Facebook all reprocess uploaded images and strip C2PA manifests as part of that processing. A screenshot removes them. A format conversion removes them. Re-encoding removes them. C2PA has a sidecar model that stores the manifest outside the file in a separate repository, which helps, but it moves the trust problem to whoever hosts the sidecar and whether the link back to the asset survives.
Invisible watermarking like SynthID is more robust because it lives in the pixels rather than the header, and Google reports having marked over ten billion images and video frames with it. But robust is not permanent. Published testing shows strong survival after high-quality compression and minor cropping, and sharp drop-off under severe cropping and low-quality compression, with a meaningful share of watermarks degraded beyond recognition under harsh conditions. On the text side, SynthID-Text is vulnerable to meaning-preserving attacks: paraphrase it, back-translate it, or splice it, and detectability falls. Separate research has shown that invisible image watermarks are removable by regenerating the image from clean noise. An adversary who wants the mark gone can get it gone.
So the metadata and watermark layers are real, worth deploying, and structurally lossy. They answer the question "can a casual viewer tell this was AI-generated" reasonably well. They do not answer "can a regulator reconstruct what happened" reliably, because by the time content is in dispute it has usually been through exactly the pipelines that erode them.
The durable layer is the log, and it lives with whoever ran the model
There is a further signal in the Code, and it is the one that does not wash out: logging. A tamper-evident record of what the model did, held by the party that ran it.
The reason the log is durable is structural. The watermark and the manifest travel with the artefact into a hostile world of re-encoders and screenshots. The log does not travel. It stays in the perimeter of the operator, written once, signed, and never edited. Nothing a downstream platform does to the file can reach back and alter the record of generation. When someone asks whether a given output came from your system, on what prompt, under which model version and policy, the answer is not recovered from the degraded file in the wild. It is read from your own signed record.
This is where owning the substrate matters. If you run inference on someone else's cloud endpoint, your log is a courtesy they extend to you, on their retention schedule, in their format, mediated by their access controls. If you run the model inside your own walls, the log is a first-class artefact you control. In Mickai every action a model takes writes a cryptographically-signed entry to an append-only audit record that the operator holds. It is the same discipline behind our work on tamper-evident audit records, and it is the layer a regulator can actually walk in and inspect, because it sits on infrastructure the regulated organisation governs rather than on a vendor's backend.
That is the sovereignty argument, stated honestly. Almost every regime here, the EU AI Act included, permits running these workloads in the cloud with the right controls. The genuine no-cloud line is workload-specific: classified material, ITAR-controlled data, isolated operational technology, cases where a data protection assessment comes back negative. The broader case for holding the log yourself is not a legal prohibition on cloud. It is a preference for control. The provenance layer regulators can most reliably inspect is the one you own, and ownership is easiest to demonstrate when the model and its log never leave your perimeter.
What to build
Deploy the outward layers. Mark outputs with C2PA and a watermark, because Article 50 asks for machine-readable marking and because those signals do real work while the content is fresh. But do not mistake them for your evidentiary base. Treat the watermark and the manifest as the outward-facing courtesy to viewers, and treat the signed log as the inward-facing record you will actually stand on when a regulator or a court asks you to prove what your system did.
Then hold that log where you can produce it on demand, under your own keys, on infrastructure you run. That is the part of the provenance stack that is still standing after the file has been through the internet.
Frequently asked questions
Does the EU AI Act ban AI content that has no watermark?
Not exactly. Article 50 of Regulation (EU) 2024/1689 requires providers to mark synthetic output in a machine-readable format and make it detectable as AI-generated, with obligations applying from 2 August 2026 and a transitional deadline of 2 December 2026 for systems already on the market. It is a marking-and-detection duty, not a blanket prohibition. Cloud deployment is permitted with appropriate controls.
If C2PA and SynthID are fragile, why deploy them at all?
Because they work well at the point of creation and while content is fresh, and because the Code of Practice expects them. A C2PA manifest and a watermark let a viewer and a platform recognise AI-generated content in the common case. They are worth having. The point is that they are not sufficient on their own as an evidentiary record, which is exactly why the second-draft Code adds logging as a further signal.
Why is the log more durable than the watermark?
The watermark and manifest travel with the file into pipelines that compress, re-encode and screenshot, all of which degrade or remove them. The log does not travel. It is written once inside the operator's perimeter and never leaves, so nothing done to the file downstream can alter it. That is what makes it the layer a regulator can inspect with confidence.
Do we have to run models on-premise to get a trustworthy log?
No, but ownership is the cleanest way to demonstrate it. A log on a third party's endpoint depends on their retention, format and access controls. A log written to an append-only, signed record inside your own walls is an artefact you govern outright, which is how Mickai handles model provenance and every audited action.
Takeaway
Watermarks wash out. Manifests get stripped. The regulator stacked its methods precisely because the outward ones are lossy. The durable one is the tamper-evident log held by whoever ran the model, and the surest way to make that log something a regulator can inspect is to own the model and the log inside your own perimeter. Build the outward signals for viewers. Build the signed log for evidence. Keep the evidence where you control it.
Sources: EU AI Act Article 50; Commission publishes second draft of Code of Practice on marking and labelling of AI-generated content; Kennedys on the draft marking and labelling Code; SynthID-Image: watermarking at internet scale; C2PA manifest stripping on social platforms.


