Article · 5 May 2026

The twenty five brain architecture inside Mickai, and why it is structurally not a Mixture of Experts. SENTINEL, CORTEX, HIPPOCAMPUS, AMYGDALA, and twenty one specialists, each in its own process, with its own queue, signing its own audit.

The academic Mixture of Experts (MoE) literature describes a token-routing technique inside a single model. Mickai's twenty five cooperating brains are independent processes, each with its own LLM instance, its own message queue, and its own signed audit ledger. The two are different categories. This article walks the difference, explains why the microservice topology is the right choice for sovereignty, and walks the role of each of the four core brains.

Author

Micky Irons

Published

5 May 2026

mickaimulti-brainarchitecturemixture-of-expertsmoe

Two architectures, often confused

When a reader first encounters the phrase twenty five cooperating brains in the Mickai documentation, the first reach is to the academic literature on Mixture of Experts (MoE). The reach is understandable. MoE is the prominent recent term in the deep-learning literature for routing a token through a subset of expert sub-networks. Switch Transformer, GShard, GLaM, Mixtral 8x7B, DeepSeek-MoE, and a long line of follow-up papers describe model architectures where a router gate selects, for each token, the relevant experts to compute against. The papers are excellent, the technique is genuine, and the published benchmarks are credible.

Mickai's twenty five brain topology is not Mixture of Experts. It is not a variant of MoE, not a re-implementation of MoE, not a deployment-time wrapper over MoE, and not a marketing rebranding of MoE. The two are different architectural categories with different constraints, different failure modes, different sovereignty properties, and different audit characteristics. This article walks the difference, because the conflation is common and the structural difference matters for anyone evaluating the substrate for procurement, integration, or technical depth.

What MoE actually is

Mixture of Experts is a deep-learning technique. The technique sits inside a single neural-network model, runs in a single address space, executes against a single set of model weights, and produces a single forward pass per inference. The router gate, a small learned network at the start of each MoE layer, selects which of the candidate expert sub-networks to send the current token to. The selected experts compute, the outputs are combined (typically by weighted sum), and the next layer of the model proceeds. The full set of experts shares the same training run, the same context window, the same tokeniser, the same vocabulary, the same input embedding, and the same output projection. The model is one model. The experts are layers inside it.

MoE is a real technique with real benefits. Conditional computation reduces the active parameter count per token, which reduces compute cost while expanding the total parameter capacity. Expert specialisation can produce empirically better results on heterogeneous corpora, because the router learns to send linguistically similar tokens to the same experts. The downsides are also real: load imbalance during training, expert collapse, increased communication cost in distributed settings, and the engineering complexity of routing logic that has to be differentiable end to end. None of this is in dispute in the academic literature. None of it applies to the Mickai topology, because the Mickai topology is not running an MoE forward pass.

What the twenty five Mickai brains actually are

Each of Mickai's twenty five brains is an independent process. Each has its own operating-system process identifier, its own memory address space, its own LLM runtime instance, its own model weights file (which may differ in size, architecture, and base model from the other brains), its own message queue (with its own backpressure characteristics), and its own append-only signed audit ledger. The brains do not share weights. They do not share a forward pass. They do not share a tokeniser. They do not share a context window. The cooperation between brains is an inter-process message protocol, not a token-routing decision.

The cooperation pattern is closer to a microservice topology than to a model-internal architecture. SENTINEL receives a request, runs its privacy and policy filter, and either rejects, redacts, or forwards the message to CORTEX. CORTEX selects a brain-cooperation profile (which other brains to engage, in which order, with which gating). HIPPOCAMPUS supplies relevant memory context. AMYGDALA supplies the personality and tone profile. The relevant specialist brains supply domain output (legal, medical, code, scheduling, robotics, voice, vision, and seventeen others). Each brain processes its part, signs the result with its own attestation key, and returns the signed fragment via the queue. CORTEX composes the fragments into the final response. The audit ledger records every fragment, every signature, every gating decision, every memory retrieval, and every personality adjustment as a hash-linked entry.

This is not a token-routing decision. It is twenty five separate inference calls against twenty five separate processes, with a coordination protocol that makes them appear coherent to the caller. The coordination is the engineering. The brains are the components.

The four core brains

Four of the twenty five brains have canonical names that describe their role in the cognitive architecture. The names are deliberately drawn from the human nervous system, because the cooperation pattern parallels the cognitive division of labour that biology has settled on. The parallel is illustrative, not literal.

SENTINEL: the privacy and policy filter

SENTINEL is the perimeter brain. Every request entering the Mickai substrate passes through SENTINEL first, and every response leaving the substrate passes through SENTINEL last. SENTINEL runs the privacy filter (PII detection and redaction, regulated-data class detection, cross-jurisdiction transfer policy), the prompt-injection filter (adversarial-content detection, untrusted-source provenance check, operator-policy gating), and the data-egress filter (regulated-content escape detection, cryptographic-key escape detection, audit-record completeness check). SENTINEL is the brain that decides what gets in and what gets out. It is also the brain whose signed audit entries appear at both the start and the end of every request lifecycle.

SENTINEL is intentionally compartmentalised from the reasoning brains. The compartmentalisation is the architectural commitment that makes self-marking-its-own-homework impossible. CORTEX cannot bypass SENTINEL. HIPPOCAMPUS cannot bypass SENTINEL. The specialists cannot bypass SENTINEL. The bypass is structurally not possible because SENTINEL is a separate process with its own queue. A request that reaches a specialist has been through SENTINEL by construction.

CORTEX: the reasoning and orchestration brain

CORTEX is the brain that runs the cooperation. It receives a SENTINEL-cleared request, decomposes the request into the relevant sub-tasks, selects the brain-cooperation profile (which brains to engage, in what order, with what gating thresholds), dispatches the sub-tasks to the relevant brains via the queue, awaits the signed fragment responses, and composes the fragments into the final response. CORTEX is the brain whose throughput dominates the latency of the substrate, because every request transits CORTEX twice (once on the way in, once on the way out), and CORTEX has the most active reasoning load per request.

CORTEX is the brain that has the OpenAI-compatible model identifier exposed to the compatibility shim. When an external client requests gpt-4o or o3 or any other OpenAI model name, the shim maps the name to a CORTEX cooperation profile. CORTEX then engages the relevant brains for that profile. The external client sees a single response. The internal reality is a multi-brain composition.

HIPPOCAMPUS: the memory brain

HIPPOCAMPUS holds the memory layer. The memory has three temporal granularities: short-term (the current conversation), long-term (the persistent profile of the user, the organisation, and the deployment), and episodic (the audit-grade record of past interactions, retrievable by hash, by signature, or by time window). The short-term memory is held in process. The long-term memory is held in an embedding index with a structured side-store. The episodic memory is held in the signed audit ledger and is retrievable through a queryable interface that respects the original SENTINEL clearance of each entry.

HIPPOCAMPUS is the brain that gives the substrate continuity. A user returning after a month does not have to re-establish context. The user's existing context is retrieved through HIPPOCAMPUS, gated by SENTINEL on the basis of the current attested authority of the user, and supplied to CORTEX as part of the cooperation profile. The continuity is not a hidden layer in a single model. It is a separate process with its own queue, its own audit signatures, and its own retention policies.

AMYGDALA: the personality and tone brain

AMYGDALA holds the personality and tone profile. The profile is the set of preferences, idioms, formality settings, response-length norms, and tone characteristics that distinguish one Mickai deployment from another. A medical-deployment AMYGDALA has a calibrated clinical tone and an explicit distance from speculative claims. A creative-deployment AMYGDALA has a different calibration. A government-deployment AMYGDALA holds a tone profile aligned with British civil-service writing norms (clear, plain, neutral, accountable). The profile is a brain in its own right because the calibration is a continuous adjustment that benefits from a dedicated process with its own observation and adjustment loop, not a static prompt prefix.

AMYGDALA is the brain that makes the substrate feel coherent across many interactions. The personality is not a single prompt. It is a learned and audited profile that AMYGDALA applies to every CORTEX composition before SENTINEL clears the egress.

The twenty one specialists

Beyond the four core brains, twenty one specialist brains cover domain-specific work. The list below is the canonical taxonomy as of the May 2026 substrate release. The list is not exhaustive of all possible specialists; it is the operational set in the current release.

LEGAL specialist for British and European legal-text reasoning, with an audit chain calibrated for tribunal admissibility.
MEDICAL specialist for clinical-text reasoning, with an audit chain calibrated for MHRA and CQC inspection.
FINANCIAL specialist for regulated financial-text reasoning, with an audit chain calibrated for FCA and PRA inspection.
CODE specialist for software engineering, with the dry-run pre-commit simulation gating from the Mickai patent corpus integrated.
VOICE specialist for speech recognition, voice biometrics, and speech synthesis, with chained audio attestation.
VISION specialist for image and video understanding, with provenance verification on inputs.
SCHEDULER specialist for calendar, deadline, and dependency reasoning, with attested external-system writes.
RESEARCH specialist for literature retrieval, citation construction, and prior-art mapping, with provenance retention.
TRANSLATION specialist for cross-language work, with British-English calibration and dialect retention preferences.
WRITER specialist for long-form composition, with brand-voice gating against the deployment's tone profile.
EDITOR specialist for revision, copy-editing, and structural rework, paired with WRITER.
TUTOR specialist for educational delivery, with curriculum-state retention and pedagogical pacing.
ANALYST specialist for structured-data analysis, with explicit calculation provenance.
PLANNER specialist for multi-step task decomposition, with pre-commit dry-run gating.
ROBOTICS specialist for embodied-AI action sequencing, paired with per-actuator cryptographic signing.
SECURITY specialist for adversarial reasoning, paired with SENTINEL's perimeter rules.
DEVOPS specialist for deployment, observability, and operational reasoning.
DATA specialist for ETL, schema, and data-pipeline reasoning, with sovereignty-domain awareness.
MARKETING specialist for outbound communication, gated against AMYGDALA's brand-voice profile.
SUPPORT specialist for customer-facing interaction, with regulated-data redaction at the SENTINEL boundary.
META specialist for self-observation, audit summarisation, and substrate-internal reasoning.

Each specialist is a process. Each has its own LLM instance (which may be a smaller and more domain-tuned model than the CORTEX instance, in line with the small-language-model sovereignty argument). Each has its own queue. Each signs its own audit entries. The total resource footprint of the twenty five brains is calibrated to fit on a single mid-sized server in a small deployment, and to scale horizontally across multiple servers in a larger deployment.

Why this topology is the right choice for sovereignty

Three properties make the microservice topology the right architectural choice for a sovereign AI substrate. Each property is hard to achieve, or in some cases structurally impossible to achieve, in a single-model MoE architecture.

**Property one, isolation.** Each brain is a separate process. A failure in one brain (out-of-memory, model crash, adversarial input that produces a runaway loop) does not bring down the others. SENTINEL continues to enforce the perimeter even if a specialist fails. The isolation is enforced by the operating-system process boundary, not by a soft assumption inside a single model. Sovereign deployments running in regulated environments require this kind of isolation by procurement contract; the topology supplies it by construction.

**Property two, audit.** Each brain signs its own audit entries with its own attestation key. The signatures are not a single chain controlled by a single signing process. They are a distributed set of chains that compose into a directed acyclic graph of signed evidence. A third-party verifier can walk the graph and check the signatures independently per brain. This composability is the architectural foundation of the OAR audit standard. A single-model architecture cannot produce per-brain signatures because there is no per-brain boundary inside the model. The signatures collapse into a single signature over the whole forward pass, and the granularity required for tribunal-grade audit is lost.

**Property three, swap-out.** Any brain can be replaced with a different model implementation, a different architecture, or a different operator-controlled image, without rewriting the rest of the substrate. The interface between brains is a queue protocol, not a tensor-shape contract. A LEGAL specialist tuned on UK case law can be swapped for a LEGAL specialist tuned on EU case law without affecting CORTEX, SENTINEL, HIPPOCAMPUS, or AMYGDALA. A small-deployment ROBOTICS specialist can be swapped for a larger one without affecting the rest of the system. The swap-out is what gives the operator durable control over the substrate. A single-model architecture cannot offer this; an upgrade is a re-train of the entire model, not a per-brain replacement.

Why MoE is the wrong frame

MoE is a clever technique for compute efficiency inside a single model. It is not a sovereignty technique. It cannot be a sovereignty technique because the experts inside an MoE layer are not separable artefacts. They are sub-networks that share the same forward pass, the same training run, and the same parameter file. There is no way for an operator to swap one expert for an operator-controlled alternative without retraining the whole model. There is no way for one expert to sign its own audit entries because the expert does not exist as a separable runtime entity. There is no way for the experts to fail in isolation because a failure inside the forward pass is a failure of the whole pass.

An organisation evaluating Mickai for sovereign deployment, that arrives at the documentation expecting an MoE-style architecture, is going to misread the system. The misreading typically goes like this: the organisation looks for the router gate, finds none, concludes that Mickai is just orchestrating multiple model calls in a heavy way, and moves on. The conclusion is wrong because the architectural goal is not compute efficiency. It is operator control. The orchestration is not a heavy version of MoE. It is a different category of system. The right comparison is not Switch Transformer and Mixtral. The right comparison is a microservice control plane with attested signing at every service boundary.

The frame matters because procurement officers, regulatory inspectors, and chief information security officers are evaluating substrates against the sovereignty criteria, not against the MoE benchmark suite. The benchmark for Mickai is not GLUE, not MMLU, not the LMSYS Arena. The benchmark is whether a regulator can independently verify the action chain, whether the operator can swap out a brain without re-training, and whether the substrate keeps running when one brain fails. MoE has nothing to say on these questions. The microservice topology has direct answers.

What the twenty five brain choice costs

Three honest costs, stated clearly because the architectural choice is a trade-off and the costs are real.

First, the latency floor. A request that engages five brains has at minimum five sequential inference calls, plus the queue-coordination overhead. The latency floor is higher than a single-model call. The Mickai engineering response is that the latency floor is reduced by parallelising the brain calls where the cooperation profile allows, by tuning each brain's model size to its task, and by caching the audit-signing operations. The result is competitive with single-model latency for typical interactive workloads, but not for the lowest-latency single-token streaming benchmarks. The trade-off is deliberate.

Second, the operational complexity. Twenty five processes is more to monitor, deploy, and patch than one process. The Mickai engineering response is that the topology is shipped with a complete observability and patching pipeline, and that the operator does not have to manage each brain individually. The deployment is a single-host or multi-host install with an operator-facing dashboard. The complexity is absorbed by the substrate, not pushed onto the operator. The trade-off is that the operator-facing surface is wider than a hosted-API surface, but the gain is full operator control of every brain.

Third, the resource footprint. Twenty five processes use more memory and CPU than one process, even when the per-brain models are smaller. The Mickai engineering response is that the resource footprint is calibrated for current commodity server hardware, that a small-deployment configuration runs on a single sixty four core server with sufficient RAM, and that horizontal scaling is supported for larger deployments. The trade-off is that low-end laptop deployments run a reduced-brain configuration with the core four brains plus a subset of specialists, rather than the full twenty five.

Closing the conflation

The conflation between Mickai's twenty five brains and Mixture of Experts is mostly an honest reading mistake by readers familiar with the recent deep-learning literature. The mistake is fixed by stating the architectural facts. Mickai is microservice. MoE is single-model. Mickai is operator-controlled per-brain. MoE is operator-controlled per-model. Mickai's audit is per-brain signed. MoE's audit is per-model signed. The two are not in conflict. They are not competitors. They are different answers to different questions. MoE is the right answer to the compute-efficiency question. Microservice brains with signed audit is the right answer to the sovereignty question.

The Mickai substrate, designed and filed by Micky Irons in Workington Cumbria, addressed at the sovereignty question, is documented at mickai.co.uk and on the public IPO register. The portfolio of thirty one filed UK patent applications totalling nine hundred and fourteen claims includes the specifications for the brain-cooperation protocol, the per-action signing, the trust-domain externalisation, and the OAR audit format. The architectural choice is on the public record. The conflation with MoE is the kind of misreading the public record exists to correct.

“Mixture of Experts is a routing decision inside one model. Mickai is twenty five processes with twenty five queues and twenty five signatures. The first is an efficiency technique. The second is a sovereignty topology. Conflating them is the kind of mistake that costs procurement teams six months of evaluation against the wrong benchmark.”

Sources and references

Shazeer et al., Outrageously Large Neural Networks: The Sparsely-Gated Mixture of Experts Layer, 2017.
Fedus et al., Switch Transformer: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, 2021.
Lepikhin et al., GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, 2020.
Du et al., GLaM: Efficient Scaling of Language Models with Mixture-of-Experts, 2022.
Mistral AI, Mixtral 8x7B technical report, 2023.
DeepSeek AI, DeepSeek-MoE technical report, 2024.
Mickai patent portfolio, mickai.co.uk/patents (31 filed UK patent applications, 914 claims, sole inventor of record Micky Irons / Mickarle Wagstaff-Irons).
GB2608806.2 / MWI-PA-2026-008, PQ-Safe Attestation and ML-DSA Signed Tool-Invocation Ledger.
GB2610413.3 / MWI-PA-2026-022, Open Inter-Vendor Audit Record (OAR) Format.
GB2610415.8 / MWI-PA-2026-024, Trust-Domain Externalisation Architectural Pattern.
Mickai brain topology technical specification, mickai.co.uk/docs/architecture.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/the-twenty-five-brain-architecture-and-why-it-is-not-a-mixture-of-experts. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.