Mickai Subsystem
Mickai Lama™
Mickai Lama is the subsystem of the Mickai SIOS that serves local models from 1.1B to 32.5B parameters, on x86-64 and ARM64. Ten rebranded SKUs, an OpenAI-compatible API on port 11438, dynamic model loading, in-app benchmarking, RAG-ready context windows. Mickai is downloadable at mickai.co.uk/download and runs on Windows, Linux, or macOS.
View capabilitiesThe Mickai SIOS
Mickai is a Sovereign Intelligence Operating System (SIOS). It runs entirely on your own hardware, on Windows, Linux, or macOS. No cloud, no telemetry. This page describes one subsystem of the Mickai SIOS. Download Mickai at mickai.co.uk/download.
A subsystem of the Mickai SIOS. Local model serving from 1.1B to 32.5B parameters, on x86-64 and ARM64. Weights stay on the operator's machine.
Local model serving, your hardware, your weights.
What Lama serves
Seven primitives that turn a workstation into a model server. Ten rebranded SKUs, OpenAI-compatible API on port 11438, dynamic loading, in-app benchmarking, RAG-ready context windows up to 128k tokens.
01 / SKUs
Ten rebranded model SKUs
Mickai-tiny (1.1B), mickai-small (3B), mickai-base (7B), mickai-medium (14B), mickai-large (32.5B), plus specialist variants for code, reasoning, embedding, and routing. Every SKU ships with a signed manifest and a deterministic inference seed.
02 / API
OpenAI-compatible API on port 11438
/v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models. Drop-in for any client that already speaks OpenAI. The shim translates aliases (gpt-4 to mickai-large, gpt-3.5-turbo to mickai-base) so existing code switches over with one base URL change.
03 / Loading
Dynamic model loading
Models load on first request and stay resident under an LRU policy. The runtime advertises memory headroom so a workstation with 64 GB can hold mickai-large alongside two specialist brains, while a 16 GB laptop swaps between SKUs without manual intervention.
04 / Benchmarking
In-app benchmarking
Run a benchmark suite against the local installation. Tokens per second, time-to-first-token, prompt-cache hit rate, KV-cache memory, all reported and signed into the audit chain. No third-party benchmark site, no opaque scoring.
05 / Context
RAG-ready context windows
Up to 128k tokens on the larger SKUs, with KV-cache compression and grouped-query attention for memory-bound hardware. Hippocampus retrievals stream straight in without copying, so a multi-document RAG query finishes inside one inference call.
06 / Architectures
x86-64 and ARM64
Builds for Windows, Linux, macOS, on Intel, AMD, Apple Silicon, and ARM64 server hardware. AVX-512, AVX-2, NEON, and Apple Metal back-ends. Quantised weights (Q4_K_M, Q5_K_M, Q8_0) included for low-memory machines.
07 / Sovereignty
Weights on your hardware
Model weights live on the operator's machine. No cloud calls, no model-update telemetry, no per-token billing. Plug a different model in if you wish; Lama treats foreign GGUF weights as first-class.
Patent anchors
Lama sits on three of the 31 filed UK patent applications behind the Mickai SIOS. Patent 02 anchors multi-brain routing, patent 04 the multi-tenant brain isolation, patent 05 the privacy-preserving RAG primitive.
- 02Multi-Brain Cooperative Intelligence, the routing primitive across the local model ensemble.
- 04Adaptive Multi-Tenant OS, brain isolation across tenants on a single host.
- 05Privacy-Preserving Sovereign RAG, retrieval streamed into the inference call without copy-out.
GB2607309.8 to GB2610422.4 · 31 filed UK patent applications · 914 claims
Wired with
- Ten Mickai SKUs (1.1B to 32.5B parameters)
- OpenAI-compatible API on port 11438
- Dynamic model loading with LRU residency
- In-app benchmark suite, results signed into the chain
- Up to 128k context with KV-cache compression
- AVX-512, AVX-2, NEON, Apple Metal back-ends
- Q4_K_M, Q5_K_M, Q8_0 quantised weights
- 100 percent on-device, weights stay on the operator's machine
Serve sovereign models on your hardware.
Lama serves the ten Mickai SKUs locally on Windows, Linux, or macOS. Read the multi-brain patent, or download Mickai and run mickai-large against the OpenAI-compatible API on port 11438.
Engineered by Micky Irons in Cumbria, United Kingdom · @mickyirons