Article · 3 May 2026

AudioSeal: a dual-layer watermark for AI-generated audio that survives codec, compression, and re-recording.

AI-generated voice, music, and ambient audio crossed the threshold of indistinguishable in 2025. The 2026 problem is provenance: every downstream party needs to know whether a clip was generated, where, by whom, under what authority. AudioSeal is the Mickai dual-layer watermark primitive (Patent 11) that survives the realistic transformations broadcast and platform pipelines apply, and ties every generated clip back to the operator who authorised it.

Author

Micky Irons

Published

3 May 2026

AudioSealAudio WatermarkAI ProvenanceSovereign AIMickai

By the end of 2025 the first wave of generative-audio models hit the indistinguishable threshold against trained human listeners. By the end of 2026 the same models are running on consumer hardware. The downstream problem is not detection; the downstream problem is provenance. Every party that handles an audio clip (a journalist, a courtroom, a fact-checker, an automated content-moderation system, a parent listening to a voicemail from their child) needs to know whether the clip was generated, by whom, when, and under what authority. The detection-side approach (does this clip look generated) has the same half-life ChatClone defends against. The provenance-side approach (does this clip carry a verifiable provenance record) is the structural answer.

AudioSeal is the Mickai sub-component that implements that provenance record at the audio-watermark layer. Filed under Patent 11 of the Mickai portfolio at the UK Intellectual Property Office (application UK00004373277, sole inventor Micky Irons). This article is the format and the survival profile.

Why a dual-layer watermark

Single-layer watermarks (a perceptual mark embedded only in the frequency or time domain) die at the first hostile transformation. The hostile transformation does not have to be malicious; the transformations broadcast pipelines apply (Opus codec re-encoding, MP3 transcoding, dynamic-range normalisation, automatic-gain control on a phone speaker, re-recording through a microphone in a noisy room) destroy single-layer marks routinely. A watermark scheme designed against a research benchmark of clean transformations does not survive contact with the platform-upload pipeline of any major audio host.

AudioSeal carries TWO payloads, in two domains, with different survival characteristics. The verifier reads either or both, and combines the readings to produce a confidence score and a provenance record.

Layer 1: perceptual-domain payload (survives lossy)

A frequency-domain payload spread across the spectrum at perceptually masked locations, with a forward-error-corrected encoding so partial reads are recoverable. The payload itself is small (256 bits) and carries the AudioSeal manifest hash, not the manifest itself. The full manifest is fetched via the hash from the public AudioSeal directory or, for sensitive operators, from an operator-side mirror. The perceptual layer is designed to survive Opus re-encode, MP3 transcoding, AGC normalisation, light noise addition, and re-recording through a moderate-quality microphone at moderate distance. It does not survive deliberate adversarial re-synthesis; nothing perceptual survives that, by definition.

Layer 2: cryptographic-domain payload (survives integrity-preserving transport)

A cryptographic-domain payload carried in a sidecar manifest (the WAV BWF chunk, the FLAC METADATA_BLOCK_PICTURE, the MP4 udta atom, or for streaming protocols an out-of-band attestation channel). The sidecar carries the full provenance record signed under the originator's hardware-bound key (Patent 12, signed under Patent 08 ML-DSA-65). The cryptographic payload survives any transport that preserves file integrity (file copy, lossless codec conversion, archive packing) but is destroyed by re-encode that strips metadata. The two layers are complementary: the perceptual layer survives where the cryptographic layer dies, and the cryptographic layer carries cryptographic certainty where the perceptual layer can only carry probabilistic recovery.

What the manifest contains

originator: the operator identity that authorised the generation. Bound to a hardware-attested actor identity (Patent 12).
model: the specific Mickai brain (or external model registered through the Mickai composition layer) that produced the audio.
prompt_hash: SHA-256 of the prompt or input that produced the audio. The prompt itself is not stored; only the hash. A regulator with the original prompt can verify that this clip came from that prompt.
input_audio_hash: where the generation was conditioned on input audio (voice cloning, source separation, accompaniment), the hash of the input. Required for any voice-clone generation; absence of this field on a voice-clone output fails the audit.
consent_attestation: where the generation produced a voice in the likeness of a specific human, the cryptographic attestation that the human consented to the generation (their hardware-bound signature over the prompt and the use). No consent attestation present on a voice-clone output is a deniable generation; downstream verifiers can refuse it.
generated_at: ISO-8601 UTC timestamp.
audit_run_id: opaque identifier mapping to the post-quantum signed audit ledger record for this generation.
signature: ML-DSA-65 signature over the canonical serialisation of every preceding field, made under the operator's hardware-bound key.

What the verifier does, and what it tells the user

The verifier ingests an audio file. It scans for the perceptual layer, scans for the cryptographic layer, and presents one of four outcomes:

VERIFIED: both layers present, manifest signature valid, originator key in the user's trust store, content hash matches the audio. The user sees who generated it, when, under what consent.
PARTIAL_VERIFIED: only the perceptual layer is recoverable; the manifest can still be fetched, the originator can still be identified, but the cryptographic chain is incomplete. Often happens after platform re-upload. The user sees the originator and a confidence score.
UNVERIFIED: no layers recoverable. The audio carries no AudioSeal payload. This does not prove the audio is not generated; it proves it does not carry an AudioSeal provenance record. The user sees an UNVERIFIED indicator and treats the content accordingly.
REVOKED: a layer is recoverable but the manifest has been revoked in the public directory. Used when an operator detects misuse of their own key and pulls a generation. The user sees the revocation notice and the reason.

What this is not

AudioSeal is not a content moderation tool. It does not decide whether a clip is misleading, defamatory, or harmful. It produces a provenance record. Downstream tools, human reviewers, and regulators decide what to do with that record. The structural property AudioSeal provides is that decisions about a clip can be made on the basis of who generated it, not on the basis of guesses about its detectability.

AudioSeal is also not a defence against re-synthesis from scratch by a non-Mickai model that does not embed any watermark. There is no defence against an open-source model that simply does not cooperate. The defence available is downstream: high-trust contexts (broadcast, court, fact-checking, financial transactions) can require AudioSeal-verified audio and refuse anything else. That refusal is a policy choice the verifier makes, not a property of the watermark itself.

Where this composes with the rest of Mickai

Every Mickai audio brain (voice synthesis, music generation, voice cloning under explicit consent, ambient generation) writes AudioSeal payloads on every output by default. The signing key is the operator's hardware-bound identity (Patent 12). The signature primitive is ML-DSA-65 (Patent 08). The manifest is appended to the post-quantum signed audit ledger (Patent 16). The consent attestation, where applicable, ties to the voice biometric (Patent 02) and the ChatClone attestation (Patent 09) of the consenting human. Provenance is not a feature added on; it is the by-product of the architecture every Mickai audio output already passes through.

Where this sits

Mickai is the sovereign AI operating system. Twenty-one filed UK patent applications. Six hundred and seventy-five cryptographically signed claims. Sole inventor Micky Irons. Application reference UK00004373277. AudioSeal (Patent 11) is the audio-provenance primitive that makes Mickai-generated audio verifiable downstream without requiring the verifier to trust Mickai's infrastructure. Mickai is held privately by its founder; the engagement model is direct.

“Sovereign means the audio carries its own provenance. Two layers. One key in hardware. The verifier holds the proof.”
Mickai manifesto

Sources

Mickai patent portfolio: mickai.co.uk/patents (Patent 11, AudioSeal dual-layer watermark).
FIPS 204 (ML-DSA): NIST post-quantum digital signature standard.
Previous Mickai articles: mickai.co.uk/articles/chatclone-anti-deepfake-voice-attestation, mickai.co.uk/articles/multi-brain-cooperative-intelligence-why-one-llm-is-not-enough.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/audioseal-dual-layer-watermark-for-ai-generated-audio. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.