10 Jun 2026 · ai-security

A Few Reasons to Develop AI Verification Technology

Even if you're not shooting for an international agreement.

Contents

We are working on verification mechanisms that can be used for lab-to-lab and government-to-government coordination over safe AI development and deployment. During our work, we’ve also been asked for our thoughts on other reasons that you might be interested in verification mechanisms.

Whilst these arguments may support the need for verification, there is now an appetite for directly shooting at technologies that support international agreements.

Prevent a race to the bottom on safety

If AI developers can credibly commit and publicly verify that they are meeting high safety and security standards, then we can facilitate a race-to-the-top, rather than a race to the bottom on safety standards.

A good example of this is the conditional claims that Anthropic make in their responsible scaling policy:

We have also adopted a set of competitor-contingent commitments (see Appendix A) aimed at staying in line with these recommendations in scenarios where we can be confident that other relevant AI developers are doing the same. […] to the extent that other relevant AI developers prioritize safety and invest in legible demonstrations that they are doing so—as we intend to—commitments like this may help avoid an inadvertent “race to the bottom” on safety.

— Anthropic RSP, Appendix A

Verification technologies reduce races to the bottom by allowing companies to verify that their competitors are meeting high safety standards.

Unilateral declarations help stabilise things

Whether it precedes an international agreement or just a race to the top on safety, we might expect coordination to start with unilateral declarations: one party making a declaration about what they’re doing (or not doing).

The first step in coordination can be to make a costly signal that you are doing the right thing. If good actors can make trustworthy declarations that they are being responsible, then this can cool an arms race (even without an actual agreement).

We might want verification tech that allows frontier AI developers to say: “We have paused training”, “we’re not training on the chain of thought”, “we have implemented x classifiers”.

Trustworthy unilateral declarations might have a cooling effect on dangerous races.

Export controls/American Data Centres in the Middle East

If you consider AI systems geostrategically important, but still want to sell GPUs to adversaries or partner states, then you might want to impose rules over how these GPUs are used.

This is what has happened with US GPU exports to the Middle East. The US government are interested in verification as part of Pax Silica:

By offering this service and promoting cryptographic verification, the United States can ensure its partners lead the AI revolution using an end-to-end secure architecture, with American-made technology. (link)

G42 have laid out a framework of verification to placate the US and allow them to build big DCs in Saudi:

G42 today announced its intent to develop and implement an enhanced assurance framework designed to secure the export, deployment, and stewardship of advanced U.S.-origin artificial intelligence semiconductors operating within its infrastructure. (link)

Improving evals and safety cases

AI developers are already making voluntary declarations about system capabilities: dangerous capability evaluations, RSPs, and safety cases. External evaluators (e.g. UK AISI and METR) run evaluations on API endpoints, trusting that the AI model developers are faithfully serving the right model to the evaluator.

As the stakes of AI systems increase, so too do the stakes of AI evaluations. This means it may no longer be possible to simply trust AI developers, rather AI evaluators may want stronger guarantees that the evaluations they’ve been running are faithfully reporting results of the model that they are aiming to evaluate.

This could look like the secure enclaves work from OpenMined.

Sovereignty

There are two sides to this:

(1) How does a country trust foreign models being run in its data centres (e.g. a US model running in an EU data centre)?

The EU, UK, etc. are using American and Chinese models in their data centres. How can we be confident that these foreign companies are not taking their IP?

Additionally, how can they tell that they are really using our data centres in the way they said they would be? If a country only wants its data centres to be used for inference and not training, then how can it ensure that a company is following this requirement?

(2) How does an AI company trust that the model it has exported to a foreign data centre won’t leak IP or be used against its usage terms?

Assurance & the market

Companies and regulators want to know that the models being served are safe/correct.

If you are a healthcare professional, then you may need to trust that the tokens you are receiving really do come from the medical model that has received regulatory approval. Signing the tokens with some proof that shows it comes from the approved model may be useful for this application and be part of the same tech tree as international agreement verification tech.

An interesting case study here is that people keep accusing Anthropic of swapping out their models for worse versions of the model. Anthropic may quite like to be able to prove that this hasn’t happened! Indeed, their Frontier Safety Roadmap includes a mention of provable inference that seems to solve this very problem.

We will develop a prototype by September 30, 2026 of provable inference, a technique for reliably, provably “signing” AI model outputs in a way that makes them attributable to a specific set of model weights. In the future, it’s possible that very sophisticated attackers will seek to infiltrate our systems and modify our models after we’ve trained them - whether to sabotage our work or co-opt our models into serving their own goals. If we could reliably and systematically verify that model outputs were coming from a specific set of model weights, we believe this threat would be significantly reduced.

— Anthropic Frontier Safety Roadmap

These aren’t the primary reasons that we’re driving forward verification mechanisms, but they could be a jumping off point for other people’s work.