The White House has announced an update to the Executive Order on Safe, Secure, and Trustworthy AI, introducing mandatory safety testing standards developed by NIST for large AI models.
Key Requirements
Models trained using more than 10^26 FLOPS of compute must undergo:
- Pre-deployment red-teaming by an independent third party approved by NIST
- Bias and fairness audits covering demographic performance differences
- Safety incident reporting within 72 hours of discovering a model vulnerability
- Public transparency reports detailing model capabilities, limitations, and safety measures
Which Models Are Affected
The 10^26 FLOPS threshold catches GPT-5-class and above models. GPT-4o (~10^25 FLOPS) falls just below the threshold. GPT-5, Claude 4 Opus, and Gemini 2.5 Pro all exceed it and must comply.
Industry Response
OpenAI and Anthropic both expressed support for the standards, noting they already conduct red-teaming that meets or exceeds the NIST requirements. Meta took a different stance, arguing that the standards should apply equally to open-source and proprietary models — a position that has created tension with regulators.
Enforcement
The standards are enforced through existing federal procurement rules — companies that fail to comply cannot sell AI services to the US government. For companies like OpenAI and Anthropic, which derive significant revenue from government contracts, this is a meaningful enforcement mechanism.