AI Financial Advice Needs a Suitability Record a Regulator Can Rebuild
Why the defensible core of AI wealth and pensions advice is not the recommendation, but a tamper-evident, offline-verifiable trail of how it was reached.
The question a regulator will actually ask
Picture a complaint landing on a compliance desk two years from now. A retired teacher moved most of a pension pot into a portfolio that a digital adviser recommended. The market fell, the client felt the loss, and now the regulator wants to know one thing. Was that recommendation suitable for this person, on this day, given what was known about them. Not whether the artificial intelligence (AI) was clever. Whether it was suitable. That single word carries the whole weight of financial advice, and it is where AI advice tends to fall apart. The model can produce a confident, fluent recommendation in a second. What it usually cannot produce, weeks later, is the record that proves why that recommendation was right for that client.
I have spent the last few years building the thing that closes that gap, so I will be blunt about where the industry actually stands. Suitability is not a feeling. In the United Kingdom and across the European Union (EU), it is a legal standard with structure. An adviser, human or machine, has to gather the client's circumstances, knowledge and experience, capacity for loss, attitude to risk, and objectives, then show that the recommendation follows from those inputs. The test is reconstruction. If you cannot rebuild the reasoning from the evidence that existed at the moment of advice, you do not have a suitable recommendation. You have a guess that happened to be written down well.
Why advice is different from chat
Most discussion of AI in finance blurs two very different jobs. One is generic information, explaining what an Individual Savings Account (ISA) is, or how drawdown differs from an annuity. The other is a personal recommendation, telling a named person what to do with their money. The first is low stakes and forgiving. The second is regulated advice, and it triggers duties that do not care whether a human or a model produced the words. The gap between the two is not a matter of degree. It is a hard legal line, and a system that does not know which side of it a given output sits on is a system that cannot be trusted with either.
The harder part is that large language models are built to be persuasive, not accountable. They generate the most plausible next sentence, and plausibility is exactly the quality that makes a bad recommendation dangerous. A model will tell a sixty-three-year-old to concentrate a pension in a single volatile sector with the same calm authority it uses to recommend a diversified, age-appropriate allocation. Fluency is not suitability. The regulator knows this, which is why the burden has always been on the firm to evidence the match between client and advice, not on the client to disprove it. So the real engineering question is not how to make the model smarter. It is how to make every recommendation leave a trail a regulator can walk back down, step by step, and arrive at the same conclusion the firm did. That trail is the product. The advice is just the visible tip of it.
Anatomy of a suitability record
Strip away the jargon and a defensible suitability record has a small number of parts. It captures the client inputs that were known at the time, the fact-find, the risk profile, the stated objectives, the capacity for loss. It captures the recommendation itself and the specific products or allocations involved. It captures the reasoning that connects the two, including the trade-offs that were weighed and the options that were rejected and why. And it captures the moment, an honest timestamp, because suitability is judged against what was knowable then, not what turned out to be true later.
Each of these has a failure mode when a machine produces it. Client inputs can be silently revised after the fact to make the advice look better, which is the oldest trick in mis-selling. Recommendations can drift between what was shown to the client and what is stored in the file. Reasoning, with a language model, is the worst offender, because the explanation a model gives after the fact is often a fresh story, not a faithful record of the computation that actually drove the output. And timestamps, if the firm controls them and can rewrite them, prove nothing at all.
This is the gap that ordinary logging does not fill. A database row that says the advice was given is only as trustworthy as the party who can edit the database. When the firm is also the defendant, self-reported logs are evidence of intention, not of fact. A regulator reading them has to take the firm's word that nothing was touched. That is precisely the trust we should stop asking anyone to extend, because in a genuine dispute the one party with both the motive and the ability to alter the record is the one holding it.
The reasoning trace problem
Here is the part the AI industry would rather not dwell on. When you ask a model to explain its recommendation, you do not get the reasoning. You get a plausible-sounding reconstruction generated after the answer, often disconnected from whatever statistical path actually produced the output. In casual use that is harmless. In regulated advice it is a landmine, because the explanation written into the client file may not correspond to anything the system genuinely did. A firm that treats the model's self-narration as the audit trail has built its defence on a story the model invented to sound coherent.
The honest response is not to pretend the model's narration is a true account of its internals. It is to record the things that are actually verifiable. The exact inputs that entered the system. The model and configuration that processed them. The retrieved facts and rules that were in scope. The constraints that were applied. The output that came out. You bind those together so the record reflects the real pipeline rather than a flattering story. You can still let the model write a client-friendly explanation, but you mark it for what it is, a presentation layer, and you anchor the file to the verifiable inputs and outputs underneath. That distinction, between what the system genuinely did and what it later said about itself, is the difference between a record that survives scrutiny and one that collapses under it.
Tamper-evidence is the whole game
Suppose you capture all of that faithfully. You still have one problem left, and it is the one that matters most in a dispute. How does anyone trust that the record was not edited afterwards. The client says the risk profile in the file is not the one they gave. The firm says it is. Without tamper-evidence, that is one word against another, and the regulator is left weighing credibility instead of reading proof. Credibility contests are exactly what a serious record-keeping system should make impossible.
The fix is to make alteration detectable by anyone, not just by the firm that holds the data. You sign each record before it is acted on, so the commitment exists ahead of the consequence rather than being assembled after a complaint arrives. You chain each record to the one before it with a cryptographic hash, so removing or reordering or editing any single entry breaks the chain in a way an outside party can see. You make the log append-only, so the natural operation is to add, never to quietly overwrite. And you use signatures built to last, because a record that has to stand for the lifetime of a pension cannot rely on cryptography that a future computer will be able to forge.
That last point is not paranoia. The migration to post-quantum cryptography is already underway across serious institutions, and the relevant standards now exist. We use the United States National Institute of Standards and Technology (NIST) Module-Lattice-Based Digital Signature Algorithm, published as Federal Information Processing Standard 204 (FIPS 204), at the ML-DSA-65 parameter set, precisely because a suitability record may need to be verifiable decades from now, long after today's ordinary signature schemes are considered fragile. A pension dispute can surface twenty years after the advice. The proof has to outlive the technology that created it.
The regulatory clock is not theoretical
None of this is a thought experiment about some distant future. The EU AI Act treats AI used in essential financial services as high-risk, and its high-risk obligations begin to bite from August 2026. Those obligations are not about tone or branding. They are about logging, traceability, human oversight, and the ability to demonstrate after the fact how a system reached an outcome that affected a person. The direction of travel is the same in the United Kingdom, where the Consumer Duty already pushes firms to evidence good outcomes rather than merely assert them, and where the regulator has been explicit that deploying AI does not dilute accountability.
Run those two trends together, rising AI liability and a hardening expectation of traceability, and the conclusion is unavoidable. Firms that adopt AI advice without a verifiable record are not saving cost. They are accumulating a liability that becomes visible only when a complaint or a market downturn forces the file open. At that point the absence of a reconstructable trail is not a paperwork gap. It is the case for the other side. I would rather build the record now than explain its absence later, and the firms that wait will find the cost of retrofitting evidence into a system that was never designed to produce it far exceeds the cost of designing for it from the start.
What a defensible AI advice stack looks like
Put the pieces together and the architecture is not exotic. Every consequential action the system takes, ingesting a fact-find, scoring a risk profile, generating a recommendation, applying a suitability rule, is treated as an event that must be signed before it executes. The signing happens first, so the commitment is genuine and not a story written after the result is known. Each event is hash-chained to its predecessor, producing an append-only ledger where any later edit is mathematically obvious. The whole thing is verifiable offline, in an ordinary browser, by someone who does not trust the vendor and should not have to.
That offline, vendor-independent check is the detail that changes the power balance. A regulator, an auditor, an ombudsman, or the client's own representative can take the record and confirm its integrity without asking the firm for anything and without trusting any server the firm controls. The proof stands on its own mathematics. This is the design principle behind the Open Audit Record (OAR) in the Mickai Sovereign Intelligence Operating System (SIOS), which is built and running today. Actions are signed before they execute, hash-chained, append-only, post-quantum, and checkable by anyone with no trust in us required. We anchor the audit root onward to Pantheon, a sovereign Layer 1 chain that pins the chain's history all the way to Bitcoin. Pantheon, with its fixed-supply token PAN of five billion units, is the one part of the system still being built. Everything that produces and verifies the record itself is live.
The honest caveats
A verifiable record does not make a recommendation suitable. It proves what was recommended, on what inputs, with what reasoning available, at what time. If the inputs were wrong, the record faithfully preserves a wrong recommendation, and that is a feature, because it lets a regulator see exactly where the failure entered. Tamper-evidence is about honesty, not about correctness, and conflating the two would be its own kind of mis-selling. I will not pretend the signature fixes the advice. It fixes the argument about the advice.
Human oversight stays in the loop for the same reason. The point of a reconstructable trail is to make meaningful review possible, to let a qualified person see the basis of a recommendation and intervene before harm, and to give the firm a true account when something goes wrong. A record that no human ever reads is just storage. The value appears when the trail is used, in supervision, in audit, in the moment a client asks why. And none of this removes the firm's duty to design the model, the rules, and the products responsibly in the first place. The record is the floor under accountability, not a substitute for it.
Why I built it this way
I am a security realist before I am anything else, and the security realist's instinct is simple. Assume the record will be challenged, and build it so it wins the challenge on mathematics rather than on trust. Most AI deployments in finance are being built the other way around, optimised for a smooth demonstration and a fast answer, with the audit trail bolted on as an afterthought that the firm can quietly edit. That is exactly backwards. The audit trail is not the afterthought. In regulated advice it is the entire defensible core, and the recommendation is the easy part that sits on top of it. This is also why we are training our own models now, fine-tuning and specialising open foundations such as Llama 3.2 and Qwen 2.5 and building a sealed corpus, so that the system producing the advice is as accountable as the record that proves it. The same conviction runs through the portfolio behind the SIOS, 101 filed UK patent applications carrying about 2,234 claims, owned by Mickai LTD, with me named as the inventor.
So the test I would put to any firm deploying AI for wealth or pensions advice is the one the regulator will eventually put to you. Take any recommendation your system made last quarter. Can an outside party, who does not trust you, reconstruct from a tamper-evident record the inputs, the reasoning that was available, and the moment it was given, and verify that none of it was touched since. If the answer is yes, you have suitability you can defend. If the answer is no, you do not have AI advice. You have a confident machine and a story, and a story is what the other side gets to rewrite. The signed, offline-verifiable record is how you make sure the story stays yours.


