Article · 15 June 2026

The Appealable Grade: Why AI Assessment Needs a Record You Can Argue With

Name: Mickai
Availability: PreOrder

Automated marking is arriving in classrooms without the one thing that makes a grade fair. A mark you cannot contest is not an assessment, it is a verdict, and the record is what turns it back into something a student can challenge.

The Appealable Grade: Why AI Assessment Needs a Record You Can Argue With

Author

Micky Irons

Published

15 June 2026

Follow Micky Irons

LinkedIn X

AI in educationautomated assessmentappealable gradingaudit recordEU AI Act

A red mark with no memory

Picture a sixteen-year-old who opens a results portal and finds an essay scored a grade lower than every mock she sat. She is sure something is wrong. She asks her teacher, who pulls up the dashboard and sees a number with a confidence band next to it. The teacher cannot tell her which sentences the model weighted, which clause of the rubric it applied, which version of the system graded her, or whether the same model would produce the same mark tomorrow. The school asks the vendor. The vendor offers a support ticket and a reassurance that the model is ninety-something percent accurate on a benchmark nobody at the school has ever seen. The appeal, such as it is, dies there. Not because the girl was wrong, but because there was nothing to appeal against. The mark had no memory of how it was made.

This is the quiet failure mode of automated assessment, and it is spreading faster than the law and faster than most school leaders realise. The interesting question is not whether artificial intelligence (AI) can grade. It plainly can, often well. The question is whether a grade it produces can be contested. A mark you cannot challenge is not an assessment. It is a verdict. In education we have spent a century building the right to appeal a verdict into the system, precisely because we know that graders, human and machine alike, get things wrong. Remove the right to appeal and you have not made grading more efficient. You have made it final in a way it was never meant to be.

What an appeal actually requires

We use the word appeal loosely, so it is worth being precise about what one needs in order to function. An appeal is not a complaint. A complaint says I am unhappy. An appeal says here is the decision, here is the basis on which it was made, here is where the basis was misapplied, and here is the standard against which you should re-examine it. Every one of those clauses depends on a record. You cannot appeal a decision whose reasoning was never written down, any more than you can appeal a coin toss. The record is not paperwork around the edges of the decision. It is the substance the appeal acts on.

In traditional marking, the record is messy but real. There is a script with annotations. There is a mark scheme. There is a marker who can be asked what they were thinking, and a moderator who second-marked a sample. There is a paper trail, sometimes literally on paper. When a school or an examination board re-marks, it is re-running a process whose inputs still exist. The student can see the rubric. An advocate can point to clause three and say the marker penalised a feature the rubric explicitly permits. The process is imperfect, but it is contestable, because the things you would need to contest it have been preserved.

Automated assessment quietly removes most of this and replaces it with a number. The script may still exist, but the reasoning that produced the score usually does not, or exists only as transient computation that vanished the moment the response was returned. So the appealable record, the thing that makes a challenge meaningful, has to be deliberately created. It does not come for free with the model. If you do not build it, you do not have it, and you will discover that on the worst possible day, when a parent arrives with a lawyer and asks you to show your work.

The five things the record has to capture

From the perspective of a student who wants to challenge a mark, and an administrator who has to defend or overturn it, a usable assessment record has to answer five concrete questions. First, what exactly was submitted: the precise text or artefact the model saw, byte for byte, not a paraphrase or a truncated version. Second, what graded it: the specific model, its version, its weights or a fingerprint of those weights, and the configuration in force at that moment. Third, against what standard: the rubric or mark scheme as it existed when the grade was assigned, not the rubric as edited three weeks later. Fourth, what the model actually did: the score, any sub-scores, the features it flagged, the rationale it produced, and any confidence or uncertainty signal. Fifth, when, and in what order relative to everything else the system did that day.

Notice that none of this is exotic. It is the same information a careful human department would keep, written down properly. The difference is that with software, capturing it is a design decision someone has to make and pay for, and the temptation is always to skip it because the model gives an answer either way. The grade looks the same on the screen whether or not anyone recorded how it was reached. The cost of skipping shows up later, and it lands on the student first. A record that is not captured at the moment of the decision cannot be reconstructed honestly afterwards, because reconstruction is just a polite word for guessing what probably happened.

Why a log file is not a record

Most systems that claim to have an audit trail have a log file. A log file is better than nothing, and it is also nearly worthless as the basis of a contested appeal, for one reason: it can be edited, and everyone knows it can be edited. If the institution that produced the grade also controls the log, and the log can be changed after the fact without trace, then the log proves only what the institution is currently willing to say. That is not evidence. It is testimony from an interested party. A security realist assumes that any record which can be quietly altered will, under enough pressure, eventually be altered, if not maliciously then through a well-meaning correction that nobody bothers to document.

Consider the asymmetry. When a student appeals, the institution is being asked to investigate itself. The same body that ran the model, and that may have an interest in not having graded thousands of other students the same flawed way, is the body holding the records. Even with the best intentions, this is a conflict. A re-mark that draws on an editable internal log is a re-mark the student has to take on trust. For a single disputed essay, trust might be enough. For a cohort, a national examination, or any setting where the stakes are high and the numbers are large, trust is exactly what is missing and exactly what cannot be manufactured by assertion.

Classical marble scales of justice in perfect balance, gold rim light on a void black background — An appeal needs a standard to weigh the decision against. Without a fixed, recorded rubric, the scales have nothing to hold.

So the record needs a property that ordinary logs do not have. It needs to be tamper-evident. Not merely stored, but stored in a way where any later alteration is detectable by someone who does not trust the storer. This is a cryptographic requirement, not a policy one. Policies can be waived, quietly, by the same people they are meant to bind. Mathematics cannot be waived after the fact. That distinction is the whole reason the record has to be built on cryptography rather than on a promise to behave well.

Sign before you grade, not after

There is a subtle point here that most logging misses, and it matters enormously for fairness. The natural instinct is to grade first and then write down what happened. But a record written after the action can be shaped by the action. If the model produces an embarrassing result, the after-the-fact account can be tidied. The honest design inverts this. You commit to what you are about to do, cryptographically, before you do it. You record the submission, the model and version, the rubric, and the intent to grade, and you sign that commitment first. Then the grade executes. Then the result is bound to the commitment that already exists, so the commitment cannot be quietly rewritten to flatter the outcome.

This is the discipline we built into Mickai as the Open Audit Record (OAR). Mickai is a Sovereign Intelligence Operating System (SIOS), built and in production, not a dashboard bolted onto somebody else's model. Every action the system takes is signed before it executes, then hash-chained into an append-only sequence where each entry locks the one before it. You cannot insert a grade retroactively, because the chain would break. You cannot rewrite an earlier entry, because every later entry depends on its hash. And the signature uses post-quantum cryptography, specifically the United States National Institute of Standards and Technology (NIST) standard Federal Information Processing Standard 204 (FIPS 204), the algorithm ML-DSA-65, so the signatures do not become forgeable the day a sufficiently capable quantum computer arrives. That matters when a student's record may need to stand up to scrutiny a decade after they leave school.

Verifiable offline, by someone who does not trust you

The hardest and most important property is the one institutions resist most, because it removes their privileged position. The record must be verifiable offline, in an ordinary web browser, by someone with no access to and no trust in the vendor or the school. A parent should be able to take the signed record of their child's grade, open it on their own laptop with no connection to the school's servers, and confirm for themselves that this submission, graded by this model version against this rubric, produced this score, and that nothing in the chain has been altered since. They should not have to ask the school to confirm it. They should not have to trust us. The verification should hold even if the school disappears, the vendor goes bankrupt, or both parties would prefer the record said something else.

This is the difference between transparency and accountability. Transparency is the institution showing you what it chooses to show. Accountability is you being able to check, independently, whether what it showed you is true. Most education technology offers the first and calls it the second. An appeal built on transparency is an appeal the institution can win by controlling the dashboard. An appeal built on independent verification is one where the facts do not depend on who owns the server. For a contestable grade, only the second is worth anything, because the whole point of an appeal is that the student and the institution disagree, and a referee the institution can edit is no referee at all.

The regulatory floor is rising under everyone

None of this is merely good practice for much longer. Automated assessment that materially affects a person's access to education sits squarely in the territory regulators are now fencing off. Under the European Union (EU) Artificial Intelligence Act, AI systems used to evaluate learning outcomes and to determine access to education are treated as high-risk, with obligations around record-keeping, transparency, human oversight, and logging. The substantive duties for high-risk systems begin to bite from August 2026. The direction of travel elsewhere is the same even where the statute differs. Data-protection law in many jurisdictions already gives individuals rights against purely automated decisions that have significant effects, including a right to meaningful information about the logic involved and a right to human review.

The blunt implication is that schools, examination boards, and the vendors who sell to them are acquiring a legal duty to keep exactly the kind of record this essay describes, and to be able to produce it on demand. Liability for AI-driven decisions is rising in parallel, and the institution that cannot show how a grade was reached will increasingly find that the absence of a record is not neutral. It is held against them. A vendor who hands a school a grading model with no defensible audit trail is handing it a liability dressed as a convenience. The honest reading of the next couple of years is that the appealable record stops being a feature and becomes a condition of operating at all.

Honest caveats, because the record is necessary and not sufficient

I want to be careful not to oversell. A perfect record does not make a model fair. You can sign and hash-chain a biased grade with full post-quantum rigour and end up with a beautifully verifiable injustice. The record proves what happened. It does not prove that what happened was right. That work, validating the model against diverse student populations, checking that it does not penalise dialect or non-native phrasing, keeping a human in the loop for consequential decisions, remains exactly as hard as it always was. The record is what makes that work auditable, and what makes errors discoverable and challengeable. It is the floor, not the ceiling, and anyone who sells you the floor as the ceiling has misunderstood the problem.

A marble hand pressing a seal onto a stone tablet, gold rim light against void black — Signing before the action, not after. The commitment is sealed first, then the grade executes and binds to it.

There are also real tensions to manage honestly. A complete record of every submission is also a sensitive store of children's work, and it must be governed and protected accordingly. That is an argument for data minimisation in what you capture and strong control over who can read it, not an argument against capturing the decision trail at all. And independent verifiability has to be built so that confirming one grade does not require exposing the full text of every other student's answer. These are solvable engineering problems, and we treat them as such, but anyone who tells you the record is free of trade-offs is selling something. The point is that the trade-offs are worth making, because the alternative is the unappealable verdict.

Why we built our own substrate to hold the record

A record this consequential cannot sit on infrastructure you do not control, because control over the infrastructure is control over the evidence. That is why Mickai runs the fifty brains, twenty-five domain and twenty-five operational, on its own Poseidon silicon substrate, and why we are actively training our own models now, fine-tuning and specialising open foundations such as Llama 3.2 and Qwen 2.5 while building a sealed corpus, with funding scaling that work toward fully native weights. A grading decision and the record that explains it are produced in the same governed environment, by a system whose behaviour we can account for, rather than handed off to a remote model whose version may change under you without notice.

For the longest-lived records, where a mark may be questioned years later, the audit chain needs an anchor that no single institution can rewrite. Pantheon, our sovereign Layer 1 chain (the only part of the stack still in build), periodically anchors the audit root to Bitcoin, so the integrity of the whole sequence can be checked against the most widely witnessed ledger there is. Its token, PAN, has a fixed supply of five billion. This is also where the patent position matters in practice rather than as a slogan: the methods behind the signed, hash-chained, offline-verifiable record are covered by 101 filed United Kingdom patent applications, around 2,234 claims, owned by Mickai LTD with Micky Irons as named inventor. We did not assemble this to decorate a pitch deck. We assembled it because a record you can argue with has to be a record nobody, including us, can quietly change.

Give the student something to point at

Return to the sixteen-year-old. The version of her story worth building toward is mundane. She opens her result, disagrees, and clicks to see the record. She sees the exact text she submitted, the model and version that graded it, the rubric clause by clause, the features the model flagged, its rationale, and a confidence signal. She, or a teacher, or an advocate, notices that the model marked down a perfectly valid argument structure the rubric expressly allows. She points at it. The grade is re-examined against a fixed, signed, independently checkable account of what was actually done. If the model erred, the error is visible and the appeal succeeds on the facts. If it did not, she can verify that for herself, on her own machine, and put the matter to rest. Either way, she had something to point at.

That is the whole argument. Automated assessment is coming to classrooms whether we are ready or not, and most of it will ship without the record that makes a grade contestable, because the grade looks identical on the screen with or without it. The cost of that omission is paid by students, quietly, one un-appealable mark at a time. A grade that cannot be challenged is not assessment. It is authority without recourse. The remedy is not to slow the technology down. It is to insist that every grade it produces arrives with a signed, hash-chained, post-quantum, offline-verifiable account of how it was reached. Build the appealable record, and AI grading becomes something a student can argue with. Skip it, and you have automated the one thing education was never supposed to be: a mark you simply have to accept.

ShareLinkedIn X Hacker News Reddit Mastodon Bluesky Email

Originally published at https://mickai.co.uk/articles/appealable-grade-signed-record-ai-marking. If you operate in a regulated sector or want sovereign AI on your own hardware, the audit form on mickai.co.uk is the entry point.