Watts Per Decision: The Energy Envelope of Sovereign AI
The unit cost that frontier labs hide and the edge cannot.
Ask a frontier lab how much energy your last question cost and you will not get an answer. You will get a token count, perhaps a latency figure, and a bill that quietly folds the electricity into a per-million-token rate. The joule is the one number nobody quotes. I think that silence is the most revealing thing in modern computing, and I want to break it.
Every answer an AI gives is a transaction in physics before it is a transaction in language. Electrons move, transistors switch, heat is shed, and somewhere a power station burns to make it possible. The model does not care that you find this unromantic. The grid certainly does not. So the honest question is not how many tokens that took. It is how many watts that decision cost, and whether the decision was worth the watt.
The unit the bill is designed to hide
Cloud AI is sold by the token because the token is abstract enough to be elastic and concrete enough to be billable. It is a brilliant unit for a vendor. It floats free of the data centre, the cooling plant, the carbon, and the sheer scale of silicon it took to produce. You pay for output volume and you are encouraged never to look behind it.
The trouble is that the real energy varies by orders of magnitude depending on the model, the hardware, the batching, and whether the answer was actually any good. A trivial question routed through a vast frontier model can draw the same power as a hundred well placed questions through a small one. The token price erases that difference. It charges you the same shape of money whether the work was proportionate or wildly wasteful.
This is not an accident of accounting. It is a design choice that keeps the customer downstream of the meter. You cannot optimise what you cannot see, and you cannot see what the invoice was built to obscure.
Watts per decision is the honest unit
I want to replace the token with a unit that cannot lie. Watts per decision. The energy drawn to turn a question into a useful, defensible answer. Not per token, because tokens are filler as often as they are substance. Per decision, because a decision is the thing a regulator, an auditor, or an operator actually cares about.
A decision has a boundary. It starts when the question arrives and ends when an answer fit to act on is produced. Inside that boundary you can measure the draw of the accelerator, the host, the retrieval step, the verification step. You can attribute joules to outcomes. Once you can do that, the whole economics of intelligence changes, because waste stops being invisible and becomes a line you are accountable for.
“The token is what the vendor wants to sell you. The watt per decision is what the planet, the auditor, and your own balance sheet actually bill you for.”
There is a moral dimension here, and I will not pretend otherwise. An industry that refuses to state its energy per useful answer has decided its costs are someone else's problem. Sovereign AI starts from the opposite premise. If it runs on your hardware, it draws on your power, and you are the one who must answer for it. That accountability is not a burden. It is the beginning of honesty.
Why frontier scale is the wrong tool for most questions
The frontier model is a magnificent instrument and a terrible default. It is built to be capable of almost anything, which means it carries the energy cost of almost everything on every single query, including the ninety per cent of questions that needed a fraction of its capacity. You are paying to raise a cathedral to post a letter.
Most real work is narrow. A clinical coding query, a contract clause check, a network anomaly triage, a citation lookup. These are bounded problems with bounded knowledge. Routing them through one enormous generalist is energetically illiterate. The model activates an immense parameter space to do something a specialist a fraction of its size would do faster, cheaper, and on less power.
The buried truth of scale is that capability and efficiency pull in opposite directions. The bigger the model, the more it can do and the more every answer costs, whether or not that answer needed the headroom. For a regulated operator paying for their own electricity, that trade is not abstract. It shows up on the meter every hour of every day.
Fifty small brains against one large one
This is the architecture I have built Mickai around, and energy is one of the strongest reasons it exists. Rather than one frontier model answering everything, the Sovereign Intelligence Operating System runs fifty specialised brains, twenty five domain and twenty five operational, each a model tuned tightly to its remit. A router sends every question to the smallest brain that can answer it well.
The energy logic is simple and unforgiving. A small specialised model activated for the question it was built for draws a fraction of the power of a giant generalist activated for the same question. Multiply that across a working day of thousands of decisions and the difference stops being a rounding error. It becomes the entire operating cost.
- Route to the smallest competent brain, not the most capable one, so most queries never touch a large model at all.
- Keep specialist models resident and warm, so there is no cold start tax, no spin up, no idle draw waiting for work.
- Verify cheaply: a small model that is right the first time costs less energy than a large model that needs a second pass.
- Cap the envelope: every brain runs inside a known power profile on owned silicon, so the worst case draw is bounded, not open ended.
- Scale by hardware, not by reaching for a bigger model in someone else's data centre.
I am not claiming the fifty brains beat a frontier model on raw ceiling. They do not, and they are not meant to. I am claiming they beat it decisively on energy per useful answer across the real distribution of work, which is overwhelmingly narrow. For the regulated operator, energy per useful answer is the figure that matters, and it is the figure the cloud will never print for you.
The edge cannot hide its joules
There is a constraint at the heart of sovereign AI that frontier labs never have to face, and I have come to see it as the system's greatest virtue. When the model runs on your own hardware, in your own building, on your own power feed, the energy is not abstract. It is the heat coming off the rack and the figure on your electricity meter. You cannot route it through a vendor and forget it exists.
The cloud can hide a profligate model behind the scale of a data centre and a blended per token rate. The edge cannot. Every watt a brain draws is a watt you provisioned, paid for, and must cool. That visibility is exactly the discipline the industry has been avoiding. It forces the architecture to be honest, because waste has nowhere to hide.
So the constraint that looks like a limitation, you only have the silicon you own, is the very thing that makes the system trustworthy. It cannot pretend its answers are free. It has to earn each one against a real and visible power budget. That is not a tax on sovereignty. It is the proof of it.
Bounded silicon, bounded promise
Owned hardware also bounds the promise. A cloud model can be quietly upgraded to a hungrier successor overnight, and your bill drifts with it. A sealed sovereign substrate runs the models you chose, on the silicon you specified, inside the envelope you sized. The energy behaviour is knowable in advance because the hardware does not move under your feet. Predictability is its own form of sovereignty.
Sealing the watt into the record
Here the argument turns from economics into governance. In Mickai, every consequential action is sealed into a post quantum Open Audit Record under FIPS 204 ML-DSA-65. That record already captures what was decided, by which brain, on what evidence. I want it to capture the energy too. The watts that decision drew, sealed in, signed, and unforgeable.
Consider what that does. An energy figure inside a tamper evident audit record is no longer marketing. It is a fact you can be held to. A regulator can read it. An auditor can verify it. An operator can prove their AI did its work inside a stated power budget and did not quietly exceed it. The joule stops being the number nobody quotes and becomes the number you can defend.
“An energy figure you can seal into the audit record is itself a sovereignty claim. It says I know what my intelligence costs, and I will sign my name to it.”
No frontier lab can offer this, because they do not control the hardware your query ran on and they have no incentive to expose the draw. Sovereignty makes it possible, because sovereignty means the whole stack, from silicon to seal, is yours to measure. The audit record is where the watt is finally told the truth.
What this costs and what it buys
I will not pretend the sovereign path is effortless. You buy the silicon, you provision the power, you size the envelope, you do the engineering to route intelligently and keep the specialists warm. The cloud asks you to do none of that. It asks only that you stop looking at the meter. That is the real trade, and operators in regulated sectors are increasingly unwilling to make it.
What you buy back is control of three things the cloud will not give you. The energy, because it is yours to measure and bound. The audit, because the draw is sealed alongside the decision. The predictability, because the hardware does not change beneath you between one quarter and the next. For anyone answerable to a regulator, that triad is not a nicety. It is the price of being allowed to operate at all.
The economics compound in the operator's favour over time. A frontier subscription is a meter that only ever runs up, on hardware you will never see, drawing power you will never measure. Owned silicon running fifty efficient brains is a capital cost that amortises against an energy bill you can finally optimise, because for once you can finally see it.
The forge, not the theft
Prometheus is remembered for stealing the whole fire. I have always thought the more useful figure is the one who crouches at the banked forge and measures the exact ember he needs to carry down, the flame reflected in a single dialled gauge, taking precisely what the task requires and not a watt more. That restraint is the entire philosophy. Intelligence proportioned to the question, energy proportioned to the intelligence, every joule accounted for.
This is why I am building Mickai the way I am, and why the Pantheon Layer 1 and the thirty million pound PAN round exist to fund silicon you own rather than capacity you rent. Sovereignty is not only about who holds the keys and where the data sleeps. It is about whether you can answer the simplest hard question anyone will ever put to an AI system. What did that decision cost in the physical world, and can you prove it.
The frontier labs sell you the token and bury the joule. The edge can bury nothing, and that is precisely why it can be trusted. Measure the watt, seal it, sign it. An intelligence that knows what it costs is the only kind worth running on hardware you call your own.




