The Regulator Is Already in the Building
While most of the AI industry watches the EU AI Act's deadlines slide into 2027, a quieter regulator has stopped waiting. As of this spring, insurance examiners in twelve states are already asking carriers to account for their AI. The instrument exists. It is in use. And your AI vendor cannot answer it for you.
The most repeated reassurance in regulated-vertical AI right now is some version of "compliance is well covered in our responsible AI report." It is a reasonable thing to believe. For most of the AI industry, in most of 2026, it is even true. The forcing functions everyone points to are deferred: the EU AI Act's high-risk obligations were pushed to December 2027, federal AI rules are tangled in a preemption fight, and no enforcement action anywhere has yet penalized a company for being unable to prove what its model did.
Insurance is the exception worth watching, because in insurance the gap between "we have a governance report" and "show me what the model did" has already started to close, and it has closed through a specific, named instrument that is live right now.
What actually happened this spring
On March 2, 2026, the National Association of Insurance Commissioners launched a pilot of its AI Systems Evaluation Tool across twelve states. This is not a white paper or a request for comment. It is a standardized examination instrument, and participating state insurance departments have begun sending inquiries to carriers domiciled in their states, with an emphasis on property and casualty and life insurers.
California · Colorado · Connecticut · Florida · Iowa · Louisiana · Maryland · Pennsylvania · Rhode Island · Vermont · Virginia · Wisconsin
The tool operationalizes the NAIC's 2023 Model Bulletin on the Use of AI Systems by Insurers, which had been adopted by roughly 25 states as of March 2026. The bulletin set the principle: insurers must maintain a written program governing how they develop, deploy, and monitor AI. The evaluation tool is the part that turns that principle into questions an examiner can actually ask, on the record, during an exam.
Two things about that timeline matter. First, the inquiries are real and arriving now, in a quarter of the country. Second, the November adoption is the moment the tool stops being a twelve-state pilot and becomes something every one of the fifty-plus U.S. insurance jurisdictions can pick up and use. If you are a carrier outside the pilot states, your runway is not "someday." It is roughly six months.
What the tool actually asks
The instrument is built as four exhibits, and an examiner typically starts with the first and escalates based on what comes back.
How many AI and machine-learning systems are in use, in which functions (underwriting, pricing, claims, fraud, marketing), affecting which decisions, and built in-house, bought from a vendor, or some hybrid of the two.
The governance and risk-assessment framework: who is accountable, how effectiveness is assessed, what monitoring exists, and what third-party due diligence is done on vendor systems.
For systems the carrier classifies as high-risk: model design, training data, validation procedures, performance metrics, and bias-testing results. The documentation an examiner needs to understand what the model does and how you know it works.
Data sources, quality controls, representativeness, lineage, and screening for proxies for race and ethnicity. The NAIC is specifically focused on rate-setting data, social-media data, and aerial imagery that can correlate with protected characteristics.
Notice what these questions assume. They assume you can produce, after the fact, a coherent account of which model ran, what data it saw, how it was validated, and what it produced. They assume the model you documented last year is the model that actually scored the policy this year. They assume the lineage is reconstructible. For a lot of production AI, those assumptions do not hold, and a governance report written in the present tense is not the same artifact as an evidence trail that can be reconstructed in the past tense.
The part most carriers are missing
Here is the part that turns this from a documentation exercise into a structural problem. The NAIC framework makes no distinction between AI a carrier builds and AI a carrier buys. Accountability stays with the insurer either way. If a third-party vendor's model contributed to an underwriting or claims decision, it is the carrier, not the vendor, that has to answer the examiner's questions about it.
The NAIC is building a parallel track for this. A separate working group has circulated a draft framework that would have third-party data and model providers register with state departments, file model documentation, and submit an annual attestation. But the regulators have been explicit about what that registry is and is not. It creates visibility, not a safe harbor. In their own framing, registration is a prerequisite, not a substitute, for the carrier's own diligence. Accountability does not transfer to the registered vendor.
This is the gap. A carrier can collect every SOC report and responsible-AI summary its vendors will provide and still have nothing that answers the question an exam is built to ask: what did this model do, on this data, on this date, and how do you know. The vendor wrote those summaries about itself. They are the vendor's assertion that the right thing happened, not independent evidence that it did.
Where the standard is going
The tool today is principle-based. It does not use the words "tamper-evident receipt" or "reproduce this inference." I want to be careful not to claim a rule exists that does not yet exist. But the direction is not subtle, and it is not only an insurance story.
Across regulated industries in 2026, the audit conversation has shifted from "show me your logs" to something stricter: reproduce this outcome, name every input the model saw, tell me which model version and which policy produced it, and prove the record has not been altered. Internal-controls guidance published this year calls for a complete, non-editable audit trail sufficient to reconstruct what an AI system acted on, tamper-evident enough to stand as evidence. The demand is converging on reconstruction, and a log a vendor keeps about itself does not meet a reconstruction standard.
Insurance is simply the vertical where a regulator has built the instrument first. The NAIC tool is a working template for what "examine the AI" looks like in practice: an inventory, a governance file, model-level detail, data lineage, and vendor accountability that does not transfer. It is not hard to see other regulators, in healthcare, in banking, in any domain where a third-party model drives a consequential decision, arriving at the same shape of question. That is a forecast, not a fact. But it is the forecast I would bet on, because the question underneath it is the same everywhere: when the model is not yours, how do you prove what it did?
What to do with six months
For carriers in the pilot states, the inquiries are arriving now. For everyone else, the November adoption is the moment the instrument goes national. Either way, the useful framing is this: six months is enough time to build infrastructure that produces evidence continuously, from this point forward. It is not enough time to reconstruct an evidence trail you never captured.
That asymmetry is the whole argument for treating verifiability as something you build in rather than document after. The carrier that can produce a tamper-evident, independently checkable record of what each model did, on what data, when, can answer the exam. The carrier relying on its vendors' self-written assurances is betting that the examiner never asks the one question those assurances were never designed to answer.