Embeddings.
On-device embedding, indexing, and semantic neighbourhood maintenance.
Embeddings produces and maintains the dense vector representations of the operator's corpus. Every embedding is stored on-device and signed by an operator-controlled key. The brain rotates indexes, prunes stale chunks, and exposes neighbourhood queries to Retrieval. No embedding ever leaves the machine; this is the operational rebuttal to vendor RAG patterns that silently sync embeddings to a cloud store.
- 01Local embedding production with signed provenance
- 02Index lifecycle (rotation, pruning, compaction)
- 03Semantic neighbourhood queries for Retrieval
- 04Embedding deletion on operator revocation
Authoritative external corpora and standards this brain treats as canonical. Every retrieval against these sources is signed into the audit ledger so a regulator can prove which evidence drove which output.
- 01Mickai Patent 05, 18
- 02Word2Vec, GloVe, BERT, SBERT papers
- 03HNSW, FAISS, Annoy vector-index literature
- 04MTEB benchmark
- 05Cohere and OpenAI embedding model cards
- 06arXiv cs.CL papers
- 07Stanford NLP embedding research
- 08Hugging Face Sentence-Transformers library
- 09ScaNN scalable nearest-neighbour search
Mickai-native tooling primitives this brain implements internally. Codex for sovereign plain-text graph PKM, Lectern for spaced-repetition memory, Stele for citation-provenance, and domain-native primitives layered on top. No external services in the trust path; data stays on operator-personalised hardware.
- 01Lodestone (vector index)
- 02Cataloguer (corpus management)
- 03Cipher (per-embedding signing key custody)