The AI Act Is Live and the Logs Already Don't Match
Article 12 record-keeping meets the first wave of August 2026 enforcement, and most stacks fail on replay.
A regulator does not ask whether your system was compliant. A regulator asks you to prove it. The distinction sounds academic right up until a supervisory authority sits across the table, names a single automated decision from eight months ago, and asks you to reconstruct exactly how the machine reached it. In that moment the word compliance stops meaning a policy on a shelf and starts meaning a thing you can replay, line by line, in front of an examiner who has the legal power to fine you and the technical literacy to know when you are bluffing.
The European Union's Artificial Intelligence Act is no longer a regulation that legal teams can defer to next quarter's roadmap. The high-risk obligations are biting now, and the first wave of enforcement landing in August 2026 has a forensic character most organisations have not prepared for. The provision doing the quiet damage is not one of the headline bans. It is the record-keeping duty. It is the part everyone read as plumbing.
The Provision Everyone Skimmed
Article 12 requires that high-risk systems automatically record events, what the drafters call logs, across the lifetime of the system. The text is short. The intent is not. The logging exists so the functioning of the system can be traced, so risks and substantial modifications can be identified, and so post-market monitoring has something real to monitor. The logs are not for your dashboards. They are for the people who will one day audit you.
This is where the reading goes wrong. Engineering teams have treated Article 12 as a feature flag: switch on logging, point it at the observability stack, tick the box. Legal signed off because the architecture diagram showed an arrow labelled logs. Everyone moved on. But the obligation is not satisfied by the existence of logs. It is satisfied by the existence of records, and a record has a property that a log, as most stacks produce it, does not.
A record can be independently verified to be what it claims to be. A log, sitting mutable in a cloud telemetry pipeline, cannot. The gap between those two things is the whole essay, and it is about to become the whole enforcement story.
A Log You Cannot Replay Is Not a Record
Here is the test I would apply if I were auditing you. Take a single consequential decision your system made. Hand me the inputs the system saw, the model and version that ran, the configuration in force, and the chain of intermediate steps. Now let me run it again and arrive at the same output. If I can, you have a record. If I cannot, you have a story about a record, and the story is exactly as strong as my willingness to take your word for it.
Under enforcement, my willingness to take your word for it is zero. That is the entire point of an audit. So the operative standard for Article 12 is not whether you logged the event. It is whether the event can be reconstructed and replayed by someone who does not trust you. Replayability is the line between evidence and assertion.
“A log you cannot independently replay is not a record. It is a claim. And a regulator does not fine claims into existence or out of it. A regulator tests them.”
Most stacks fail this test for a reason that has nothing to do with negligence and everything to do with how modern telemetry is built. Cloud logging is designed to be cheap, lossy, sampled, retention-capped, and editable. It is a brilliant tool for finding out why your service was slow on Tuesday. It is a catastrophic tool for proving, two years later, precisely why an automated system declined someone's mortgage, flagged someone for fraud, or filtered someone out of a hiring pipeline.
Why Mutable Telemetry Collapses on Contact
Consider what has to be true for a cloud-telemetry log to count as forensic evidence. The log line must not have been altered after the fact. It must not have been silently dropped by a sampler under load. The model weights and version it references must still exist and be retrievable. The configuration it implies must be reconstructable. The clock that timestamped it must be trustworthy. And you must be able to demonstrate all of this to a third party rather than merely assert it.
Now consider how a typical stack actually behaves. Logs are mutable by default, because operators need to scrub secrets and personal data. Sampling discards events precisely when the system is under the load that makes incidents likely. Retention policies delete the older records first, which are exactly the ones a long-tail complaint will ask about. Model versions get overwritten on the next deployment. And the whole edifice sits inside infrastructure controlled by the same party being audited, which is the structural definition of a conflict an examiner is trained to distrust.
- Mutability: telemetry is editable by design, so an unaltered state cannot be proven, only promised.
- Sampling: events are dropped under load, so the most consequential moments are the most likely to be missing.
- Retention windows: the oldest records vanish first, and complaints arrive on the longest timelines.
- Version drift: the model and configuration that produced the decision are gone by the next deploy.
- Self-custody: the audited party also controls the evidence, which is exactly what an auditor is paid to discount.
Any one of these would weaken a compliance posture. Together they guarantee that the first serious replay request collapses the record into a claim. The organisation says the decision was made correctly. The regulator asks to see it reconstructed. The stack produces a partial, mutable, unverifiable approximation. And the burden of proof, which under this regime sits with the operator, has not been met.
The August 2026 Wave Has a Different Temperature
There is a tendency to treat each compliance deadline as the same kind of event: a date passes, a checklist is filed, life continues. The enforcement now arriving is colder than that. It is forensic rather than declarative. The earlier phases of the AI Act asked organisations to classify their systems and assert conformity. This phase asks them to substantiate it on demand, against a specific decision, with evidence that survives adversarial scrutiny.
That shift changes what good looks like. A conformity assessment you wrote once is a snapshot of intent. A record-keeping regime that can replay any decision across the lifetime of the system is a standing capability. The first is a document. The second is an architecture. The organisations that built the document and called it the architecture are the ones whose logs will not match when the request comes, and in this wave the request comes against named decisions, not general policies.
I want to be precise about the failure mode, because it is not dramatic. Nobody's system blows up. The logs are there. They look fine on the dashboard. The failure is quiet and total: asked to reproduce one specific output from one specific moment, the stack produces something close, or something partial, or something it cannot prove was not edited. Close is a finding. Partial is a finding. Cannot prove is the finding that ends the conversation.
What a Real Record Requires
If replayability is the standard, the record has to carry everything needed to reproduce the decision and nothing that can be silently changed afterwards. That is a demanding combination, and it is why bolting compliance onto observability never quite works. You cannot make a system designed to be edited behave as if it were designed to be sealed. You have to seal it at the moment the event happens, with the full context, in a form a third party can verify without your cooperation.
A real record binds the inputs, the model identity and version, the configuration, the intermediate reasoning, and the output into a single object, then seals that object cryptographically so any later alteration is detectable by anyone. Crucially, the verification must not depend on trusting the operator's infrastructure. The whole value of the seal is that it lets an adversary confirm the record is genuine using only the record and a public verification step.
Sealed, Not Stored
The difference between storing a log and sealing a record is the difference between writing something down and notarising it. Storage answers the question can I find this later. Sealing answers the much harder question can someone who distrusts me confirm this was not touched. Under Article 12 enforcement, only the second question matters, because the first answer is always available and always insufficient.
How We Built For This
I am the founder and chief executive of Mickai, and I will describe how our Sovereign Intelligence Operating System treats this problem, because we built the architecture before the enforcement gave us a reason to defend it, not after. The SIOS runs fifty specialised brains on the operator's own hardware, fully offline-capable, and every consequential action it takes is sealed into a post-quantum Open Audit Record. The seal uses FIPS 204 ML-DSA-65, the standardised lattice signature scheme, so the record's integrity survives even the arrival of cryptographically relevant quantum computers.
What matters for Article 12 is not the cryptographic vocabulary. It is the property that vocabulary buys. An Open Audit Record is replayable by design. It carries the inputs, the model and version, the configuration, and the chain of reasoning, and it is signed at the instant of the action. A regulator can take that record, verify the signature independently, then replay the decision and arrive at the same output. The operator's trustworthiness never enters the calculation, which is precisely what an auditor wants and precisely what mutable cloud telemetry can never deliver.
And because the brains run on the operator's own hardware rather than someone else's cloud, the evidence does not live inside infrastructure controlled by a third party with its own retention economics and its own incentives. The records are the operator's, sealed and portable, which dissolves the self-custody conflict instead of papering over it. Sovereignty here is not a slogan. It is the structural condition that makes the audit trail believable.
The Spirit and the Letter Are the Same Thing Here
Compliance discourse loves to separate the letter of a regulation from its spirit, usually so that someone can satisfy one while quietly ignoring the other. Article 12 does not grant you that luxury. The letter requires automatic logging over the system's lifetime so functioning can be traced and risks identified. The spirit requires that an automated decision affecting a human being can be reconstructed and examined. A sealed, replayable record satisfies both, because tracing the functioning and reconstructing the decision are the same operation performed by the same evidence.
This is why I am unbothered by the objection that sealed records are heavier engineering than firing events at a telemetry endpoint. Of course they are heavier. They are doing a heavier job. The lightness of cloud logging is exactly the property that makes it forensically worthless under adversarial replay. You are not being asked to log more. You are being asked to prove what you logged, to someone who assumes you might be lying, about a decision you would rather forget. Lightweight evidence is not evidence. It is optimism with a timestamp.
What I Would Do Before August
If I ran a high-risk system today and this enforcement wave were bearing down, I would not start by buying more compliance documentation. I would run the replay test on my own stack, in private, against the most consequential decision I could find. I would hand the inputs to a colleague instructed to distrust me, and ask them to reproduce the output and prove the record had not been altered. Whatever broke in that exercise is what the regulator will break in public.
Most teams who run that test honestly will discover their logs are claims. Sampled in the wrong places. Mutable where it counts. Pointing at model versions that no longer exist. The discovery is uncomfortable, but it is far cheaper to make across your own desk than across a supervisory authority's. The organisations that survive this wave will be the ones who treated the record-keeping duty as a forensic capability from the start, and built systems where the evidence is sealed at the moment of action and verifiable by anyone, including the people most motivated to find it wanting.
The AI Act is live. The blindfold is off. And somewhere in your telemetry there is a decision a regulator is going to ask you to replay. The only question that matters is whether what you hand them is a record or a story. I know which one survives contact with the scales, because I built for the day the scales would tip. That day is now.




