Sovereign Multi-Modal Avatar with Per-Modality Watermarking and Cross-Modality Consistency Verification.
Independent watermarks across face, video, audio, and lip-sync, bound under one operator signing key.
A method and system for synthesising a multi-modal avatar (face image, head video, spoken-audio track, lip-sync alignment) and binding the modalities to a single operator-attested signing key via independent per-modality watermarks and a cross-modality consistency oracle. Each modality is watermarked at synthesis time using modality-appropriate schemes (per-pixel Merkle for face, per-frame chained Merkle for video, AudioSeal for audio, signed phoneme alignment for lip-sync), with all four watermarks derived from a common generative-session identifier and operator-bound module-lattice key. A consistency oracle verifies that the four watermarks identify the same operator and session before any single post-quantum signature is emitted; a verifier can detect splicing across any pair of modalities and can verify a partial-modality fragment without trusting the originating system at verification time. Filed 21 May 2026 as GB2611895.0.