Synthetic Cohorts in Drug Development Need Audit-Ready Provenance
Generative control arms can shorten trials and spare patients placebo, but a regulator will only accept what it can trace. Provenance, not plausibility, is the gate.
A synthetic control arm is a tempting idea. Instead of recruiting hundreds of patients to receive a placebo, you generate a statistically faithful cohort from historical trial data, registries and real-world records, then compare the treatment arm against that model. Fewer patients sit in the placebo group. Trials read out faster. Rare-disease studies that could never assemble a real control become feasible. The arithmetic of human cost and capital both point the same way.
The problem is not whether the synthetic patients look real. Modern generative models clear that bar easily. The problem is whether a regulator can believe them. A reviewer at the agency is not asking 'is this plausible'. They are asking 'show me exactly how this was produced, from which sources, under which model, with which parameters, and prove none of it changed after the fact'. Plausibility is cheap. Provenance is the thing that is actually scarce, and it is the thing most synthetic-data pipelines cannot supply.
Why a beautiful cohort is still inadmissible
Evidence law in drug development is conservative by design, and rightly so. A submission that a reviewer cannot reconstruct is a submission they cannot approve. With a real control arm, the chain is legible: consent forms, case report forms, lab results, an audit trail of every edit. With a synthetic arm, the equivalent chain has to cover the data that trained the generator, the model version, the random seeds, the conditioning, the acceptance criteria, and every human who signed off. Miss one link and the whole arm is suspect.
Most teams discover this late. They build a generator, validate it against held-out data, produce a gorgeous cohort, and only then realise they cannot answer the auditor's first question: which exact version of which model, fed which exact records, produced this exact patient. The records have since been updated. The model has been retrained twice. The notebook that ran it lives on a laptop that has been reimaged. The cohort is plausible and entirely untraceable, which means it is worthless as evidence.
Provenance is an engineering requirement, not a paperwork afterthought
Treating provenance as documentation you assemble at the end is the original sin. By then the inputs have drifted and the environment is gone. Provenance has to be captured at the moment each consequential action happens, sealed so it cannot be altered later, and bound to the artefact it describes. That means recording the input dataset by content hash, the model weights by version, the generation parameters, the validation results and the named approver, all at the instant the synthetic patient is created, not reconstructed from memory a year on.
There is a second requirement that teams underestimate: this evidence often cannot leave the building. Patient-level training data is among the most regulated material there is. Shipping it to a third-party platform to generate or to prove provenance widens the exposure surface and adds a custodian a sponsor may not be permitted to trust. The pipeline that produces the cohort and the system that seals its provenance need to run on infrastructure the sponsor controls, offline where required, with the raw data never crossing a boundary it should not cross.
What a sealed record actually looks like
This is the gap Mickai is built to close. Mickai is a Sovereign Intelligence Operating System (SIOS): fifty specialised AI brains, twenty-five domain and twenty-five operational, that run on the operator's own hardware and are fully offline-capable. A pharmaceutical sponsor can run cohort generation, statistical validation and record-keeping inside their own perimeter, with no patient-level data leaving it.
Every consequential action in that pipeline is written to the Open Audit Record (the OAR). The OAR seals and signs each step with FIPS 204 ML-DSA-65, the published NIST post-quantum signature standard, so the record stays verifiable even against future cryptographic attack. The signature binds the input hashes, the model version, the parameters and the approver into one tamper-evident entry. A reviewer does not take the sponsor's word that nothing changed. They verify the signature and read the chain.
Permanence without exposure
A signed local record answers 'was this altered after signing'. It does not, on its own, answer 'did this record exist on the date you claim'. For that you need an external, immutable reference point. Mickai anchors a hash commitment of the record to Bitcoin through Pantheon, its own sovereign Layer 1 (native token PAN, fixed supply of five billion). Only the hash is anchored, never the underlying data, so the patient records and the trial evidence stay inside the sponsor's perimeter while a one-way fingerprint of the sealed record gains an independent timestamp that no party can backdate.
It is worth being precise about what this is and is not. Pantheon does not move Bitcoin and it is not a Bitcoin Layer 2. It commits a cryptographic fingerprint of the audit record for permanence. Anchoring is not spending. The result is a record that is local where it must be private and externally provable where it must be trusted, which is exactly the posture a regulator's chain-of-custody question demands.
The shape of an admissible synthetic arm
Put the pieces together and the admissible version of a synthetic control arm has a clear shape. Generation runs on hardware the sponsor owns. Every input is captured by content hash before a single synthetic patient is produced. Each generation and validation step is sealed and signed in the Open Audit Record at the moment it happens. A hash of that record is anchored externally for an unforgeable timestamp. When the auditor arrives, the answer to 'show me exactly how this was produced' is a verifiable chain, not a slide deck.
This discipline is not bureaucracy bolted onto the science. It is what lets the science count. The methods that make synthetic cohorts work, the engineering that makes them defensible, and the provenance that makes them admissible are protected by Mickai's portfolio of 101 filed UK patent applications (around 2,234 claims), owned by Mickai LTD, named inventor Micky Irons. The technology shortens trials and spares patients placebo. Provenance is what turns that benefit into evidence a regulator will accept.
Synthetic cohorts will keep getting more convincing, and conviction was never the bottleneck. The bottleneck is trust, and trust in this field is built from traceability. Generate on your own hardware, seal every step, anchor the proof. A regulator can only accept what it can follow, and provenance is how you let it follow.




