Article · 13 June 2026

Prompt Injection Is Not a Bug You Patch

When a system mixes instructions and data in the same channel of natural language, manipulation is not a defect. It is the design. The defence is not a better filter. It is bounded permission and a record of every act.

Author

Micky Irons

Published

13 June 2026

Follow Micky Irons

LinkedIn X

prompt injectionAI securityagentic AIverifiable auditsovereign AI

You cannot sanitise your way out of prompt injection. I want to state that plainly at the start, because most of the industry is still spending money as if you can.

A large language model reads instructions and data through the same door. The system prompt that tells it how to behave, the user's question, the email it was asked to summarise, the web page it fetched, the pull request it reviewed, all of it arrives as one undifferentiated stream of natural language. The model has no reliable way to know which words are orders and which words are merely content. That is not a flaw in any single product. It is the operating principle of the technology. When you build on a system that treats commands and information as the same substance, manipulation is not a bug that slipped through. It is a property of the design.

The incidents are not edge cases, they are the shape of the thing

In June 2025, EchoLeak was disclosed, catalogued as Common Vulnerabilities and Exposures identifier CVE-2025-32711 and scored 9.3 on the Common Vulnerability Scoring System. It was a zero-click attack on Microsoft 365 Copilot. An attacker sent an ordinary-looking email containing hidden instructions. When the recipient later asked Copilot to summarise their inbox, the assistant read those instructions as commands and quietly exfiltrated internal documents from OneDrive, SharePoint, and Teams through a trusted Microsoft domain. The victim clicked nothing. There was no malware to scan for, no signature to match, no log of an intrusion in any conventional sense. Copilot did exactly what it was built to do. It read text and acted helpfully on it.

EchoLeak was the first documented case of prompt injection turned into concrete data theft in a production system, and it was not the last. In 2025, GitHub Copilot carried CVE-2025-53773, where hidden instructions in a pull request description could achieve remote code execution on a developer's machine. In May 2026, a single prompt was shown to become a shell, walking a path through a popular agent framework until it launched a process on the host. These are not three unrelated mistakes by careless teams. They are the same structural fact appearing in three places.

Why the filter approach keeps losing

The standard response is a classifier. Microsoft built one. It is called the Cross-Prompt Injection Attempt classifier, and its job is to read incoming content and flag malicious instructions before they reach the model. EchoLeak defeated it by writing the payload as if it were addressed to the human reader rather than to the machine, then slipping data out through reference-style links and auto-fetched images that the content security policy permitted. Every layer of sanitisation was a wall, and the attacker simply wrote prose that walked around each wall in turn.

This is the heart of the matter. A filter is a guess about intent expressed in language, defending against an attack also expressed in language, by an adversary who can rephrase indefinitely. The defender must block every phrasing. The attacker needs one that works. That asymmetry does not improve with a bigger model or a cleverer prompt. The Open Worldwide Application Security Project ranks prompt injection as LLM01 in its 2025 Top Ten for large language model applications, the single most critical risk, and its own guidance concedes that neither retrieval augmentation nor fine-tuning fully removes the class. In December 2025 the United Kingdom's National Cyber Security Centre put it more bluntly still, warning that prompt injection may be a problem that is never fully fixed, because it stems from how these models interpret language at the most basic level. When the national authority and the standards body both tell you a class of attack is unfixable at the input layer, the responsible move is to stop pretending the input layer is where you win.

The trifecta, and what turns a trick into a breach

There is a useful way to see why these incidents are dangerous rather than merely embarrassing. A breach needs three things to coexist in one agent. The agent must have access to private data. It must be exposed to untrusted content. And it must be able to communicate to the outside world. Hold any one of those away from the others and a successful injection produces nothing of value. Put all three in the same process and a single poisoned email can read your files and ship them out, with no flaw in any line of conventional code.

Look again at the incidents through that lens. EchoLeak had private mailbox data, an untrusted inbound email, and an exfiltration path through allowed domains. The agent-framework remote code execution had untrusted input and the capability to run commands on the host. The breach is never the injection alone. The breach is the injection meeting unbounded capability. That is the seam, and it is where defence belongs.

The scale is now industrial. Across late 2025 and 2026, a single actor used commercial coding assistants to breach nine Mexican government agencies and lift roughly 195 million taxpayer records. In March 2026 a backdoored release of LiteLLM, the model gateway underneath CrewAI, DSPy, Microsoft GraphRAG and dozens of other frameworks, sat on the Python package index for three hours and was pulled nearly 47,000 times, carrying an autonomous attack bot into every project that updated. Survey work through the year found that 88 percent of organisations running agents reported a confirmed or suspected incident, while only 6 percent of security budgets went to defending those agents. We are wiring autonomous systems into databases, payments, and code, and we are guarding them with the one technique we already know cannot hold.

Bound the action, sign the act

I built Mickai because I concluded the input layer is the wrong battlefield. You will not stop the model from being persuaded. So you assume it will be persuaded, and you design so that persuasion buys the attacker almost nothing. Two principles carry the entire load. Constrain what an agent is permitted to do, to the least capability its task requires and no more. And record every action it takes before that action runs, in a form an outsider can verify without trusting you. Do both and a successful injection becomes bounded and visible rather than unlimited and silent. The attacker who slips a command through still hits a wall of permission, and whatever does happen leaves an indelible mark.

In Mickai, that record is the Open Audit Record. Every action the system takes is signed before it executes, not after, and written to an append-only ledger whose entries are hash-chained so that no past entry can be altered or removed without breaking the chain. The signatures are post-quantum, using the Module-Lattice Digital Signature Algorithm standardised as Federal Information Processing Standard 204, ML-DSA-65, so the record holds even against an adversary with a future cryptanalytic machine. The chain's root is anchored through Pantheon, Mickai's sovereign Layer 1 blockchain, out to Bitcoin, so the ledger's integrity does not rest on any server I control. Most importantly, the record is verifiable offline by a verifier that runs inside a plain web browser. No network call. No appeal to my honesty. You check the mathematics yourself.

What the record changes

Mickai is a Sovereign Intelligence Operating System, built and in production, fifty brains across twenty-five domain and twenty-five operational functions running on the Poseidon silicon substrate. The architecture is the subject of 104 filed United Kingdom patent applications, 2,340 claims, owned by Mickai LTD, named inventor Micky Irons, the company registered at Companies House under number 17166618. I name the specifics not to advertise but to make a claim falsifiable, because the entire point of this essay is that claims you cannot check are worthless.

Return to EchoLeak. Its quiet menace was the absence of evidence. No log, no alert, no signature, the assistant behaving exactly as designed while data left the building. Now imagine that same injection against a system where the agent may only touch what its task strictly needs, and where every read and every outbound action is signed before it happens and chained into a ledger you can audit from your own laptop. The attack does not vanish. The model can still be fooled. But the fooling is fenced, and it is written down in ink that cannot be erased and that you did not have to take my word for. The injection that succeeds is no longer invisible. It is bounded and it is on the record.

Stop treating prompt injection as a defect awaiting a patch. It is a permanent condition of mixing instruction and data in natural language, and the standards bodies and national authorities now say so out loud. The durable answer is not a smarter filter. It is least privilege and a signed, append-only, independently verifiable account of everything the machine does. Make the blast radius small and make the truth checkable. That is the only ground that holds when the next phrasing arrives.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/prompt-injection-is-not-a-bug-you-patch. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.