MICKAI
Article · 4 July 2026

On-Premise Retrieval-Augmented Generation and Knowledge Sovereignty

Why your knowledge base should never leave the building, and how sealed provenance makes on-premise retrieval provable

On-Premise Retrieval-Augmented Generation and Knowledge Sovereignty
Author
Micky Irons
Published
4 July 2026
Follow Micky Irons
LinkedInX
knowledge-sovereigntyon-premise-ragretrievaldata-residencyprovenance

Retrieval-augmented generation has become the default way to make an intelligence system speak with authority about your own documents. You point a model at your contracts, your case files, your engineering archives, and it retrieves the relevant passages before it answers. The problem is where that retrieval happens. In most deployments your most sensitive documents are chunked, embedded, and shipped to a vendor's cloud, indexed on hardware you will never see, and answered against by a model you cannot inspect. For a regulated organisation that is not a convenience. It is a breach waiting for a subject access request.

Knowledge sovereignty is the principle that your knowledge base never leaves the building. At Mickai we treat on-premise retrieval-augmented generation as a first-class subsystem of the Sovereign Intelligence Operating System (a SIOS), not a feature bolted on afterwards. The index lives on hardware you own. The embeddings are computed locally. Every retrieved source carries sealed provenance. Nothing crosses your perimeter, and you can prove it.

Why cloud retrieval breaks the regulated boundary

When you send a document to a third-party retrieval service you have created a copy you no longer control. The vendor's embedding model has read it. The vendor's vector store holds a mathematical fingerprint of it that can, under the right conditions, be partially reconstructed. The vendor's logs may retain your queries, which are themselves sensitive: the questions a legal team asks reveal the shape of a case long before any answer is returned.

This is exactly the boundary the public cloud cannot cross on the customer's own terms. Under the General Data Protection Regulation (GDPR), a data controller must know where personal data sits and be able to erase it. Under the Health Insurance Portability and Accountability Act (HIPAA), protected health information cannot be casually disclosed to a processor. Under the International Traffic in Arms Regulations (ITAR), certain technical data may not leave national soil at all. Retrieval that phones home quietly strains all three. The cloud giants are allies at a different layer, and they are candid that some workloads simply cannot run on shared infrastructure. We serve that regulated boundary directly, on the customer's own terms.

A colossal marble figure of Prometheus cupping a small ember of gold light against a black void
Like Prometheus guarding the flame, sovereign retrieval keeps the source of knowledge in your own hands

Muse of history: retrieval as remembered truth

Clio, the Muse of history, did not invent the past. She recorded it faithfully and could recite its sources. That is precisely the discipline retrieval-augmented generation demands and so rarely delivers. A good answer is not one that sounds plausible. It is one that can point to the exact passage it came from, in a document you can open, unaltered.

Our retrieval subsystem is built around that principle. Every answer is grounded in retrieved chunks, and every chunk is traceable to its origin file, its page, and its position within that page. If a brain cannot find supporting evidence in your own corpus, it says so rather than inventing an answer. Faithful memory is the whole point, and faithful memory has a location: your building, your drives, nowhere else.

How on-premise retrieval-augmented generation actually works here

The pipeline runs end to end on hardware the customer owns, air-gapped or on-premise, with zero data egress. Documents are ingested, chunked with structure preserved, and embedded by a sovereign brain running locally on your central processing unit (CPU) or graphics processing unit (GPU), your choice. The resulting vectors are written to an index that never leaves the machine. At query time the same local brain embeds the question, retrieves the closest passages, and a second sovereign brain composes the grounded answer. No network call reaches the open internet at any stage.

A colossal marble figure of Hermes mid stride carrying a sealed tablet through darkness
Hermes the faithful messenger carries only what is sealed, never what is exposed

High-fidelity retrieval over proprietary documents is the hard part, and it is where generic tooling falls down. Legal clauses, engineering tolerances, and clinical notes are dense with meaning that shallow chunking destroys. We preserve document structure through ingestion, apply a local reranking stage so the passages that reach the brain are the truly relevant ones rather than merely the nearest, and keep the whole retrieval graph inspectable. Because the brains are revocable, a brain that misbehaves can be pulled and replaced without tearing down your index.

Sealed provenance on every retrieved source

Grounding is worth little if you cannot prove where the grounding came from. Every source our system retrieves is stamped with sealed provenance: which document, which version, which passage, and when it was last modified. That record is written into a tamper-evident, cryptographically-signed audit ledger, so the chain from answer to evidence cannot be quietly rewritten after the fact.

This matters most when an answer is challenged. A regulator, an auditor, or opposing counsel can ask how a conclusion was reached, and you can show the precise passages the system stood on, with cryptographic assurance that the trail has not been edited. Provenance is signed with post-quantum signatures using the Federal Information Processing Standard 204 (FIPS 204) ML-DSA-65 scheme, and it can be verified offline, on an air-gapped machine, years later, with no dependence on any vendor still being in business.

A colossal marble figure of Themis standing with scales held level in deep shadow
Themis holds the scales level, the way a signed audit ledger holds the record true

Attestation before the answer, not after

Retrieval is a read, but retrieval systems increasingly trigger actions: drafting a filing, updating a record, escalating a case. In the Sovereign Intelligence Operating System, every action is described by an Operation Attestation Record (OAR), and that record is signed before the action executes, not logged after it. The system commits to what it is about to do, cryptographically, and only then does it act.

For high-stakes operations we require more than a signature. Multi-brain agreement plus voice-biometric approval means a sensitive action needs both several independent brains to concur and a verified human voice to authorise it. Applied to retrieval-driven workflows, this closes the gap that catches most deployments: the moment a system moves from answering a question to taking an irreversible step on the strength of that answer.

What sovereignty buys a regulated organisation

The practical payoff is that compliance stops being a negotiation with a vendor's terms of service and becomes a property of your own architecture. Under the European Union Artificial Intelligence Act (EU AI Act) and the Digital Operational Resilience Act (DORA), you must be able to explain, document, and control automated decision-making. When the index, the brains, and the audit ledger all sit inside your perimeter, those obligations are satisfied by design rather than by contract. There is no third-party sub-processor to assess, no cross-border transfer to justify, no data-processing agreement standing between you and your own knowledge.

A colossal marble figure of Argus with many watchful eyes emerging from shadow
Argus of the many eyes watches every retrieval, so nothing crosses the perimeter unseen

This capability is contained within our filed intellectual property. Mickai LTD holds 104 filed United Kingdom patent applications covering about 2,340 claims across sovereign retrieval, attestation, and the signed audit ledger among other subsystems. We frame that portfolio by what it lets a customer do: retrieve over the most sensitive corpus in the country without a single document leaving the room.

The bottom line

Retrieval-augmented generation is only as trustworthy as the place it runs. Send your documents to someone else's cloud and you have traded knowledge sovereignty for convenience, and inherited a compliance liability you cannot fully see. Keep retrieval on hardware you own, with sealed provenance on every source and attestation before every action, and the same technology becomes an asset a regulator can inspect with confidence. Your knowledge base never leaves the building. That is the whole promise, and we have built it to be provable.

Subscribe
Get every new Mickai article by email.

Long-form essays on sovereign AI from Micky Irons. One email per article. No tracking, no marketing, no third parties. Every email includes a one-click unsubscribe link.

Prefer RSS? Subscribe at /articles/feed.xml.

Originally published at https://mickai.co.uk/articles/on-premise-rag-knowledge-sovereignty. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.
More articles