MICKAI
Article · 29 June 2026

Self-Hosted Enterprise Search AI: Air-Gapped RAG Over Decades of Records

The on-premise alternative to cloud enterprise search, connecting decades of un-redacted corporate data to a local AI with zero data egress

Self-Hosted Enterprise Search AI: Air-Gapped RAG Over Decades of Records
Author
Micky Irons
Published
29 June 2026
Follow Micky Irons
LinkedInX
self-hosted enterprise search AIair-gapped RAGon-premise knowledge managementzero data egressprivate RAG

What self-hosted enterprise search AI is

Self-hosted enterprise search AI connects decades of un-redacted corporate records, contracts, correspondence, board papers, research, case files, to a local model running on hardware the organisation owns, so that a single natural-language question can be answered from the entire archive without any of it leaving the building. It is the on-premise alternative to cloud enterprise search: the model is brought to the records, the records are not shipped to the model, and what happens in the server room stays in the server room.

Cinematic wide shot of a colossal Library of Alexandria reimagined in void-black marble, towering shelves veined with satin gold, a single shaft of warm light falling on one glowing golden scroll, no
Cinematic wide shot of a colossal Library of Alexandria reimagined in void-black marble, towering shelves veined with satin gold,

For a chief information officer, a chief information security officer or a general counsel, that is the proposition in one sentence. An organisation's most valuable asset is often its own accumulated memory, and that memory is exactly what a retrieval-augmented generation (RAG) system must read in full to be useful. The cloud route asks the organisation to expose that un-redacted memory to a third-party processor and, where the search runs offshore, to a cross-border transfer. The Mickai Sovereign Intelligence Operating System (SIOS) removes the cross-border transfer and third-party processing path, because the retrieval and the inference both sit inside the organisation's own perimeter.

Mnemosyne, Greek goddess of memory, standing before an endless wall of black marble drawers each rimmed with gold, a golden thread of recollection drawn from one open archive, void-black and satin-gol
Mnemosyne, Greek goddess of memory, standing before an endless wall of black marble drawers each rimmed with gold, a golden thread

The cloud tools it replaces, and why on-premise wins

The enterprise-search market has produced capable cloud products, Glean and Box AI among them, and an honest comparison should acknowledge how well they index a connected estate. The Mickai distinction is not about connectors or relevance. It is about where the corporate memory lives at the moment of retrieval, and who owns the engine reading it.

A cloud search tool, however well secured, indexes the organisation's content on infrastructure the organisation does not control. Self-hosted enterprise search inverts that on the dimensions that decide a regulated procurement.

  • **The archive stays put.** Every document is indexed and retrieved in place by a local engine, with zero data egress. There is no transit path to intercept and no offshore copy to account for.
  • **The model and the index are owned.** Both the search brain and the Mickai sovereign vector store are snapshots the organisation holds, immune to a cloud vendor changing its data-use policy or to the European Union Artificial Intelligence Act shifting under a hosted service.
  • **Context is ingested without a throttle.** Owned compute allows unthrottled context ingestion across millions of historical documents, where a cloud service would meter the same work expensively per token.
  • **The economics flip to capital.** Indexing and querying decades of records runs at near zero marginal cost on owned hardware rather than as a recurring consumption bill.

The point of air-gapped RAG is not to protect the pipeline that carries your archive to the cloud. It is to remove the pipeline, so the archive never travels at all.

A vast obsidian catalogue hall, columns of black marble crowned with gold, countless gold-edged tomes receding into darkness, a single sealed golden index glowing on a central altar, cinematic Greek p
A vast obsidian catalogue hall, columns of black marble crowned with gold, countless gold-edged tomes receding into darkness, a si

The compliance barrier it clears

Enterprise search touches everything an organisation holds, which is precisely why the most regulated firms have hesitated. A sovereign deployment clears the barriers at the level of architecture.

Data protection under UK GDPR and the GDPR

A full corporate archive is dense with personal data, and often with special-category data. Indexing it through an external model adds a third-party processor and, where retrieval runs offshore, a cross-border transfer. Running the system on-premise means data residency holds and the records never leave the building. The organisation keeps its own controller obligations on a fully contained footprint.

Privilege, confidentiality and fiduciary duty

For a law firm, a bank or a professional-services firm, the archive contains privileged, confidential and fiduciary material. Containment is what preserves it: the records are read locally, and every material answer is wrapped in an Open Audit Record, a signed, inspectable account of which documents the model drew on and what it concluded.

Sector-specific regimes

Where the organisation also sits under financial secrecy rules, the Network and Information Systems regime, or export controls, keeping the corpus and the inference inside the perimeter removes the external exposure those regimes are most concerned with. The organisation still holds its own obligations, but the structural transfer risk is gone.

Close cinematic study of golden threads weaving between black marble scrolls, each thread a retrieved citation glowing faintly, void-black background, satin-gold accents, no text, no UI, no charts, no
Close cinematic study of golden threads weaving between black marble scrolls, each thread a retrieved citation glowing faintly, vo

The Mickai studio that delivers it: Pinakes

Within the Mickai SIOS, knowledge management and enterprise search are delivered by Pinakes, named for the ancient catalogue of the Library of Alexandria, the first systematic index of recorded knowledge. Pinakes is a horizontal capability the organisation composes into a vertical pack: full-archive indexing, natural-language search across un-redacted records, citation-grounded answers, and knowledge-base generation, paired with the organisation's own domain knowledge base and a compliance crosswalk.

Pinakes works with the rest of the relevant studios. It pairs with Astraea, the contract-review studio, for legal-ops retrieval, with Nomos, the compliance studio, for regulator-facing evidence, and with Documents for ingestion of mixed-format archives. The retrieval layer is the Mickai sovereign vector store, which holds the embeddings locally and has no external route. The whole institutional memory is searchable, and none of it is exposed.

A sealed golden vault at the heart of a black marble library, no external door visible, light radiating from within, sovereign and self-contained, no people in offices, no UI, frameless, no watermark
A sealed golden vault at the heart of a black marble library, no external door visible, light radiating from within, sovereign and

What makes Mickai different

Many providers will offer a private index. The Mickai difference is that the guarantees are engineered into the system rather than promised in a contract.

  • **The Open Audit Record.** Every consequential answer is sealed into a signed, inspectable record showing the sources the model used, the evidence an auditor, a regulator or a risk committee can examine.
  • **A defensible patent moat.** The architecture rests on 101 filed United Kingdom patent applications owned by Mickai LTD, covering the sovereign substrate, its audit machinery and its identity model. The barrier is intentional.
  • **Hardware-bound identity.** The instance's identity is bound to the silicon it runs on, so the index cannot be quietly cloned or relocated off the organisation's estate.
  • **Built and owned, not rented.** The organisation owns the model, the vector store and the compute. Search runs independent of cloud outages because the organisation owns the machine, and the index is insulated from a vendor rewriting the terms.

Mickai's own sovereign brains do the reasoning, and the Mickai sovereign vector store does the retrieval. There is no dependency on an external public model or a third-party store, and the corporate archive is never harvested to train someone else's.

The goddess of memory pressing a golden seal into a black marble ledger, fine gold filaments radiating outward, void-black background, satin-gold accents, no text, no watermark
The goddess of memory pressing a golden seal into a black marble ledger, fine gold filaments radiating outward, void-black backgro

How a sovereign deployment actually runs

The pattern is undramatic by design. The organisation provisions local compute inside its own data centre, sized to the scale of its archive and its query load. Its record systems are connected to the Mickai sovereign vector store in place, where the content is embedded and indexed without a copy leaving the perimeter. Pinakes answers questions locally, grounding each response in retrieved sources and sealing it into the Open Audit Record. Nothing in that loop needs an internet path to the archive, so search runs independent of cloud outages because the organisation owns the compute, and the attack surface is reduced to the organisation's own perimeter.

The honest boundary: this delivers zero data egress for the archive and removes the cross-border transfer and third-party processing path, and it reduces the external attack surface. It does not remove the organisation's own data-protection and confidentiality obligations, and insider and physical access remain the organisation's to control. The promise is data residency, ownership of model and index, and an institutional memory that stays in-house.

Request a private demonstration

If you are a chief information officer, chief information security officer, chief financial officer, chief operating officer or general counsel deciding how to put decades of un-redacted records to work with artificial intelligence without any of it leaving the building, the right next step is to watch the system answer real questions from data that never leaves the room.

Mickai was built by Micky Irons, founder, chief executive and named inventor, on a single principle: bring the intelligence to the archive and keep both inside the institution. Request a private demonstration, and we will show you self-hosted enterprise search AI indexing your records, answering with citations and sealing an Open Audit Record entirely behind your own firewall.

Subscribe
Get every new Mickai article by email.

Long-form essays on sovereign AI from Micky Irons. One email per article. No tracking, no marketing, no third parties. Every email includes a one-click unsubscribe link.

Prefer RSS? Subscribe at /articles/feed.xml.

Originally published at https://mickai.co.uk/articles/self-hosted-enterprise-search-ai. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.
More articles
23 Jun 2026
Hold Your Own Keys
When you and your competitors all run your crown jewels through the same frontier model, the only thing standing between your secrets and theirs is a boundary you do not control. The frontier providers are excellent and their security is real. The exposure is structural, not an accusation. The answer is custody: hold your own keys.
23 Jun 2026
The Third Answer to the AI Water Crisis
A viral argument has split the internet into two camps: switch the AI data centres off to save the water, or starve the taps to feed a coming superintelligence. Both are wrong, because both assume intelligence has to live inside one giant water-cooled megacentre. It does not. The third answer is sovereign, distributed intelligence on hardware you own, sited where it is used. You keep the water and the intelligence.
22 Jun 2026
Keep the Logs. Now Prove They Were Not Edited.
Everyone keeps the logs. Almost no one can prove the logs were never edited. That gap is the quiet weakness at the centre of the artificial intelligence boom, and it is about to become the whole conversation. Mickai's answer is three layers of verifiable proof: seal a signed record, anchor its hash to Bitcoin, run it on sovereign hardware, so an auditor can check what a system actually did without ever being let inside.
22 Jun 2026
Your AI Decision Is Discoverable. Can You Prove What It Did?
Every automated decision is now discoverable, by a regulator, a court, or the person it harmed. Explainability cannot answer for it, because a model narrating its own reasoning is still just a story. Mickai builds the alternative: a signed Open Audit Record, a hash anchored to Bitcoin through Pantheon, all on sovereign hardware, so anyone can verify what an AI did without trusting the operator.