Agentic Memory Poisoning (ASI06): The Record the Agent Cannot Rewrite
When an AI agent's own memory becomes the attack surface, the only durable defence is a record the agent is not allowed to edit.
The vulnerability that lives between the prompts
Most discussion of agentic risk fixates on the single prompt. The real exposure is what survives between prompts. An autonomous agent keeps state: scratchpads, vector stores, conversation history, retrieved documents, tool outputs it has decided to trust. That persistent memory is what lets an agent plan across hours instead of seconds. It is also the quietest attack surface in the stack.
Agentic memory poisoning (catalogued here as ASI06) is the deliberate corruption of that retained state. The attacker does not need to win an argument with the model in real time. They need only to plant a false fact, a forged instruction, or a tainted document where the agent will later read it back as its own settled belief. The model then reasons correctly over a premise that was tampered with, and acts on it with full confidence.
“You do not have to jailbreak an agent if you can edit its memory. It will jailbreak itself, politely, and log the result as normal operation.”
Why poisoned memory is worse than a bad prompt
A malicious prompt is loud and momentary. It arrives, it is handled, it is gone. Poisoned memory is patient. It persists across sessions, propagates into summaries, and gets copied into every downstream context the agent assembles. One tainted entry in a retrieval store can quietly steer a thousand later decisions, because retrieval-augmented agents treat their own memory as ground truth rather than as untrusted input.
The failure compounds in multi-agent systems. When one agent writes to a shared memory that others read, a single poisoned record becomes a shared delusion. Each agent independently confirms the false fact because it sees its peers relying on it. There is no single moment of compromise to point at, only a slow drift away from reality that looks, from inside the system, like consensus.
The defences that do not hold
The instinctive answer is to harden the model: better alignment, tighter guardrails, a classifier that sniffs out poisoned inputs. These help at the margin and fail at the core. They all ask the agent to police its own memory, which is precisely the surface under attack. An agent cannot reliably tell a real memory from a planted one when the planting was designed to be indistinguishable. Asking the compromised party to audit itself is not a control. It is a hope.
Encryption and access control narrow who can write to memory, which matters, but they say nothing about what was written or whether it later changed. The question that actually protects an operator is not who had access. It is: can I prove, after the fact, exactly what the agent saw and did, in an order no one was able to rewrite once it happened? That is an evidentiary question, and it needs an evidentiary answer.
A record the agent is not allowed to edit
Mickai treats this as a property of the substrate rather than a feature of the model. Mickai is a Sovereign Intelligence Operating System (SIOS): fifty specialised AI brains, twenty-five domain and twenty-five operational, running on the operator's own hardware and fully offline-capable. The brains can be wrong. The record of what they did is built so that being wrong cannot be hidden.
Every consequential action is written to the Open Audit Record (OAR). The OAR is append-only and each entry is sealed and signed with FIPS 204 ML-DSA-65, the published NIST post-quantum signature standard. Mickai did not invent that standard. It adopts it, so the seals stay verifiable even against an adversary with a quantum computer. The agent can read its memory. It cannot reach back and silently alter the record of what that memory was when it acted. Poison the working memory and you still cannot poison the receipt.
This inverts the usual trust model. Instead of asking you to trust that the agent's memory is clean, the SIOS gives you a signed, ordered, tamper-evident account you can check independently. If a memory was poisoned, the divergence shows up against the record. The attacker can corrupt what the agent believes. They cannot corrupt the proof of what the agent believed, because that proof was sealed the moment it happened and the signing key the agent runs under does not grant edit rights over history.
Anchoring the record beyond the operator's reach
A signed local record answers most threats, but a determined adversary will ask the harder question: what stops someone with full control of the machine from discarding the whole log and starting fresh? Local signatures prove integrity. They do not, on their own, prove the log was not replaced wholesale.
This is where Pantheon does its work. Pantheon is Mickai's own sovereign, Bitcoin-anchored Layer 1, with a native token (PAN) and a fixed supply of five billion. Periodically it commits a hash of the record to Bitcoin, binding the OAR's state to the most expensive-to-rewrite ledger in existence. Pantheon does not move bitcoin and it is not a Bitcoin Layer 2. It publishes a fingerprint, not a payment. Anchoring is not spending. To forge history you would now have to rewrite Bitcoin itself, which no operator, including the operator running the agent, can do.
The chain of custody is therefore complete. The agent acts. The OAR seals the action with a post-quantum signature the agent cannot reissue. Pantheon anchors the record's hash to Bitcoin so even a wholesale replacement is detectable. Memory poisoning may still occur upstream, but its effects become provable rather than deniable, and provable harm is harm you can detect, contain, and answer for.
What this buys an operator
Agentic memory poisoning will not be solved by making models smarter, because the smarter the agent, the more convincingly it acts on a belief that was planted. The durable defence is structural. Keep the agent's reasoning where it belongs, in fast mutable memory, and keep the account of what it did somewhere the agent is not permitted to rewrite: append-only, post-quantum signed, and anchored beyond the operator's own reach.
That is the line Mickai draws. The brains stay improvable and, yes, fallible. The record does not. An operator running the SIOS on their own hardware can be deceived about a fact, but cannot be quietly deceived about what their system did with it. In a world of autonomous agents, the record the agent cannot rewrite is not a nice-to-have. It is the difference between an incident you can investigate and a corruption you never see.
This work sits inside a wider portfolio: 101 filed UK patent applications, around 2,234 claims, owned by Mickai LTD, with named inventor Micky Irons. The patents are evidence of how deep the architecture goes. The point of the architecture is simpler. When an agent can be lied to, make sure it cannot lie to you about what it did next.




