Article · 13 June 2026

The Black Box AI Never Built: Why Every Machine Decision Needs a Flight Recorder

Aviation turned catastrophe into evidence with one orange box. Artificial intelligence still runs on faith. Here is the model that fixes it.

Author

Micky Irons

Published

13 June 2026

Follow Micky Irons

LinkedIn X

AI accountabilityflight data recorderaudit trailAI governancepost-quantum cryptography

The Black Box AI Never Built: Why Every Machine Decision Needs a Flight Recorder

An orange box that changed an entire industry

There is an object bolted into the tail of every commercial aircraft you have ever flown on. It is painted high-visibility orange, not black, despite the nickname. It is built to survive an impact of thousands of times the force of gravity, a fire that burns for an hour, and months at the bottom of the ocean. It records what the aircraft was doing and what the crew was saying in the minutes before everything went wrong. We call it the black box, and it is one of the most quietly important pieces of safety engineering ever deployed.

Here is the part that matters. Aviation did not become the safest way to travel because crashes stopped happening. They did not stop. Aviation became safe because every crash became investigable. When an aircraft goes down, a team recovers the recorder, replays the final moments, reconstructs the chain of events, and publishes findings the whole industry is obliged to act on. The accident becomes evidence. The evidence becomes a rule. The rule prevents the next one. That feedback loop, not luck and not heroics, is why you can board a plane today without thinking twice about it.

Now hold that picture in your mind and look at artificial intelligence (AI). We have built systems that approve loans, triage patients, flag fraud, screen job applicants, steer vehicles, and feed recommendations into custody and sentencing decisions. They make millions of consequential calls a day. And when one of those calls goes wrong, in most deployments, there is no recorder to recover. There is no replayable account of what the system saw, what it weighed, and why it landed where it did. We are flying a very large fleet with no black box on board, and we have somehow convinced ourselves that this is normal.

What we keep is not a record, it is a rumour

People assume that because software logs things, AI systems already keep good records. They do not, at least not in the sense that matters. Most production logging captures application events: a request arrived, a response left, an error was thrown. That tells you the plumbing worked. It does not tell you what the model actually reasoned over. The specific inputs, the version of the weights, the retrieved context, the configuration, the policy in force at that moment, the intermediate steps. These are the things you would need to answer the only question anyone cares about after a bad outcome, which is whether it would happen again, and why.

Even where richer logging exists, it has three fatal properties for accountability. First, it is written by the same system that made the decision, so it is marking its own homework. Second, it is usually mutable, which means anyone with sufficient access can change it after the fact, and you would struggle to prove they had. Third, it is held by the vendor, so to trust the record you have to trust the party with the strongest incentive to shade it. An aviation investigator does not phone the airline and ask them to describe the crash from memory. The whole point of the recorder is that it does not depend on the goodwill of the operator.

This is the gap I want to be precise about. The problem with AI accountability is not that we lack data. We are drowning in data. The problem is that we lack an account that is contemporaneous, tamper-evident, independent, and replayable. A flight recorder is valuable not because it stores bytes, but because of the guarantees attached to those bytes. It captures at the moment of action. It resists being altered. It can be read by an investigator who was not party to the flight. Strip any one of those properties away and the orange box becomes a souvenir, a heavy reassuring object that proves nothing.

Why a screenshot of the answer is not evidence

When something goes wrong with an automated decision today, the usual artefact produced in the aftermath is a screenshot, an exported spreadsheet, or a summary email from the team that ran the system. Treat that with the suspicion it deserves. A screenshot shows an output. It says nothing trustworthy about the process that produced it, and it can be staged, cropped, or regenerated. Asking a deployer to explain their own model's decision after the harm has occurred is asking the pilot to write the crash report from the bar afterwards. Sometimes you get the truth. You have no structural reason to believe it.

Consider a realistic case. An applicant is declined for a mortgage by an automated underwriting model. Six months later a regulator asks whether a protected characteristic influenced the decision. The lender pulls the logs. The logs show a score and a timestamp. They do not show which model version ran, what reference data it pulled, whether a feature acted as a proxy for ethnicity, or whether the policy thresholds were changed the week before and changed back. The lender cannot prove innocence and the regulator cannot prove fault. Everyone is now arguing about a decision nobody can replay. That is not an edge case. That is Tuesday.

Now run the same scenario with a real flight recorder for the decision. You hold a sealed account, made at the instant of the decision, that names the exact model version, the inputs, the retrieved context, the active policy, and the result. An independent party reads it without trusting the lender. If the decision was clean, the lender is exonerated in minutes instead of months. If it was not, the failure is precise and fixable. Notice that this cuts both ways. A trustworthy record protects the honest operator just as firmly as it exposes the dishonest one. The absence of a record protects no one except the careless, and the careless are exactly the people you do not want protected.

Sign before you fly, not after you land

The single most important design decision in this whole field is when the record gets made. If you write the account after the action, you have built a diary, and diaries can be edited, backdated, and selectively forgotten. The aviation recorder does not work that way. It captures continuously, in flight, before anyone knows whether this is a routine landing or the worst day of their life. The integrity comes from the timing. You cannot retrofit honesty onto a record that was assembled once the consequences were already visible. The order of operations is the safety property.

This is the principle we built into Mickai. Every action a brain takes is signed before it executes, not after. The system commits to what it is about to do, cryptographically, and only then does it act. That ordering is the entire game. It is the difference between a commitment and an alibi. A commitment made before the outcome is known carries weight precisely because the system did not yet know whether it would need to defend itself. An explanation produced afterwards carries the permanent suspicion that it was shaped to fit the result. Sign before you fly, and the record cannot have been edited to flatter the landing.

Each of these signed entries is then linked to the one before it in a hash chain, append-only, so the record forms a sequence in which any later tampering with an earlier entry breaks every link that follows. You cannot quietly remove an embarrassing decision from the middle of the chain without the break being obvious to anyone who checks. This is the digital equivalent of the recorder's crash-hardened casing. Not a promise that nothing bad will ever be written, but a structural guarantee that nothing can be silently unwritten. We call this the Open Audit Record, and the word open is doing real work, which I will come to.

The investigator must not have to trust the airline

Here is the test that separates real accountability from theatre. Can someone who does not trust the vendor, who has no special access, no shared key, and no relationship with the operator, verify the record themselves? In aviation the answer is yes by design. The recorder is read by an independent body using standard equipment. The airline does not get to be the sole interpreter of its own black box. If accountability for artificial intelligence is going to mean anything, it has to clear the same bar, and almost nothing on the market today does.

Most so-called audit features fail this test instantly. They produce a dashboard you can only see if the vendor lets you in, backed by a database only the vendor controls, verified by a process only the vendor runs. That is not verification. That is a tour of the gift shop. The Open Audit Record is built so that any entry can be checked offline, in an ordinary web browser, with no connection back to us and no trust placed in us. You take the record, you take the public verification, and you confirm for yourself that the entry is authentic, unaltered, and in its correct place in the chain. We could vanish tomorrow and the records would still verify. That is the property that matters, and almost nobody else offers it because almost nobody else wants to give up being the indispensable middleman.

I want to be honest about why this is uncomfortable for the industry. A vendor whose record can be independently verified has surrendered a great deal of power. They can no longer be the final word on what their own system did. They have made themselves accountable in a way that cannot be walked back with a support ticket. Most companies will tell you they support transparency right up to the moment it removes their ability to control the narrative. The offline, vendor-independent check is the line. On one side of it you have marketing. On the other you have evidence. We chose the side that is harder to live with, on purpose.

Building for a threat that has not fully arrived

There is a further problem that the aviation analogy does not capture, because aircraft recorders do not have to survive a future adversary armed with a fundamentally more powerful machine. A record meant to settle disputes years from now has to be readable and trustworthy years from now. That forces a question most teams are still avoiding, which is what happens to today's digital signatures when quantum computing matures enough to break the cryptography they rely on. A signature that is unforgeable today but forgeable in a decade is not a foundation for a long-lived legal record. It is a time bomb under your evidence, and it is ticking quietly whether or not you choose to listen.

This is why the migration to post-quantum cryptography is not an academic exercise, and why standards bodies have already published the algorithms organisations are expected to adopt. We signed the Open Audit Record using a post-quantum standard, the United States National Institute of Standards and Technology specification known as Federal Information Processing Standard 204 (FIPS 204), the algorithm designated Module-Lattice-Based Digital Signature Algorithm at security level 65 (ML-DSA-65). Plainly put, the signatures are designed to remain unforgeable even against an adversary with a mature quantum computer. We did this now, ahead of the curve, because a record you cannot trust in ten years is not worth making today. You do not retrofit the black box after the crash. You build it in before the first flight.

The law is moving toward the recorder, not away from it

None of this is happening in a vacuum. The regulatory direction of travel is unmistakable. The European Union (EU) Artificial Intelligence Act brings a wave of obligations for high-risk systems, with significant requirements landing through 2026, and at their core sits a demand that should sound familiar by now: keep records, enable traceability, make decisions auditable, be able to show what the system did and why. Liability regimes across multiple jurisdictions are shifting the burden toward those who deploy automated systems, which in practice means the operator who cannot produce a credible account of a decision is increasingly the operator who loses the argument, and then loses the case.

You can read this as a compliance headache, and many will. I read it as the law slowly arriving at a conclusion aviation reached decades ago. If a system can do serious harm at scale, society will eventually insist on a recorder. The only open questions are whether your record is contemporaneous or reconstructed, tamper-evident or editable, independently verifiable or vendor-controlled, and durable against tomorrow's attacks or brittle against them. Organisations that build the strong version now will find that compliance falls out as a by-product. Organisations that build the weak version will keep producing screenshots and hoping nobody looks too closely, which is a strategy only until the day somebody does.

The honest limits of the analogy

I should be careful not to oversell the parallel, because a sloppy analogy does more harm than no analogy. A flight recorder captures physical telemetry from a system governed by the fixed laws of aerodynamics. An AI decision is a statistical act over messy, high-dimensional inputs, and a perfect replay of those inputs does not always yield a single tidy reason a human can read like a paragraph. A record proves what was decided, on what basis, under what configuration. It does not magically make every model interpretable, and anyone promising that is selling you something. Capturing the account and explaining the reasoning are related but distinct problems, and conflating them is how vendors overpromise and then quietly underdeliver.

There is also a hard truth about completeness. A recorder that only captures the easy ninety percent of decisions, or that can be switched off for the inconvenient ones, is worse than useless because it manufactures false confidence. The discipline has to be total: every action recorded, no exceptions, no quiet off-switch for the decisions you would rather not have on record. That completeness is precisely what most bolt-on logging cannot offer, because it was added as a feature rather than built as a foundation. A black box that the crew can disable before the crash is not a black box. It is a prop, and props are for reassuring passengers, not for finding out what actually happened.

What I am actually asking you to demand

I am not asking you to take my word for any of this. The whole argument collapses if it rests on trusting me, which is the very disease it claims to cure. So here is what I would put to anyone deploying or relying on automated decisions. Ask four questions of whatever system you use, and accept nothing softer than a yes. Is the record made before the action, or reconstructed after. Is it tamper-evident, so that altering the past breaks visibly. Can an outsider verify it without trusting the vendor, offline, with no special access. And will it still be trustworthy a decade from now against far stronger attackers than exist today.

If the answer to any of those is no, you do not have a flight recorder. You have a diary, and diaries do not stand up when it matters. Mickai is a Sovereign Intelligence Operating System (SIOS), built and running, and the Open Audit Record is the part I would stake the whole thing on, because it is the part that asks you to trust nothing and verify everything. Fifty brains do the work, twenty-five domain and twenty-five operational, on the Poseidon silicon substrate, and we are actively training our own models now, specialising and hardening them on a sealed corpus while we build toward fully native weights. Behind it all sits 2,340 claims across 104 filed United Kingdom (UK) patent applications, owned by Mickai LTD. But the heart of it is simple, and it is borrowed openly from an orange box in the tail of an aircraft.

The Pantheon chain we are building will anchor the audit root to Bitcoin, so the record's integrity ultimately rests on something no single party owns, settled by its native token, PAN, with a fixed supply of five billion. That last layer matters because it removes even us from the chain of trust. Aviation did not earn your confidence by promising its machines would never fail. It earned your confidence by guaranteeing that when they did, the truth would survive the wreckage, and the truth would not belong to the people who built the machine. That is the standard. Artificial intelligence is nowhere near it yet. It is long past time we started building the recorder, signing every action before it flies, and handing the means of verification to everyone rather than no one. The technology to do this exists. We have shipped it. The only thing still missing is the will to insist on it.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/flight-recorder-model-ai-decision-evidence. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.