I'm Not the Engineer. I'm the One Who Has to Defend It.
The people who actually decide whether an AI tool gets bought at most companies look more like me than they look like the people pitching it. The industry has not fully noticed yet.
I am the first employee of a deep-tech startup. I am not an engineer. My background is finance and management, and my job is operations, compliance, and the parts of the business that have to hold up under outside scrutiny.
The demo and the audit are not the same meeting
Every AI vendor demo I have sat through has been impressive. The model answers the question. The output is polished. The accuracy numbers are competitive and the references check out.
Then the meeting ends and I go back to my actual job, which is making sure that six months from now, when an auditor or a customer's procurement team asks us a specific question, we have an answer that will hold up.
I have yet to meet an AI vendor who has a real answer.
What I can and cannot verify
If I buy a database, I can test it. If I buy a payment processor, every transaction has a receipt. If I buy cloud storage, I can checksum what goes in and what comes out and know they match.
AI inference is different. The model runs on someone else's hardware. The output is a paragraph of text or a string of numbers. There is no checksum that tells me whether the version of the model we evaluated last quarter is the version that produced today's answer. There is no receipt. The vendor's word is the only artifact.
For most of the last decade that was acceptable, because the consequences of being wrong were small. That is changing across every market we look at. Financial regulators are publishing AI model risk guidance. Insurance commissioners are auditing automated decisions. Healthcare payers are being sued over algorithmic denials. Hiring tools are facing class actions under fair-credit law. The specifics differ by jurisdiction but the direction is the same everywhere.
When that question finally lands on a desk, it will not land on the engineer's desk. It will land on mine.
The questions I ask vendors now
I have learned which questions separate vendors who have thought about this from vendors who have not.
Can the model that produced a given output be cryptographically identified, not just named in a contract. Is the execution environment the same across every run or can it drift. Does the vendor verify its own results, or is a third party involved. What gets stored, who controls it, and how long does it survive.
The honest answers, almost universally, come down to: we log the request and the response, we control the logs, and there is no independent verification because we are the ones running the model. That position is going to get harder to defend every quarter from now on.
This is becoming an operations problem
I think this whole category of risk has been miscategorized as a technical issue, which is why so few companies have addressed it. It gets handed to engineering, the way uptime and latency get handed to engineering. But this is not a technical problem in the engineering sense. It is a verifiability problem in the procurement sense, and it sits squarely on the operations side of the org chart.
The vendors who win the next round of enterprise AI procurement will be the ones whose answer to prove it is something more than their own logs. The companies who get this right early will look, in two years, like they had foresight. They didn't. They just listened to the people in their organization who already knew the question was coming.
I am one of those people. I am writing because I know there are others.