OpenAI debuted its most capable model yet under pressure from a mass user exodus tied to the company's controversial Pentagon contract.

In brief:

  • OpenAI launched GPT-5.4 amid the growing QuitGPT backlash over its Pentagon AI contract.

  • GPT-5.4 adds a 1-million-token context window, stronger reasoning, and agentic capabilities.

  • Enterprise users benefit most as GPT-5.4 delivers faster AI agents with fewer tokens.

OpenAI began rolling out GPT-5.4—its most capable model to date—as the company scrambles to contain a PR crisis that has seen an estimated 2.5 million users take actions against the company, either by canceling their subscription or sharing the boycott on social media.

The so-called QuitGPT movement exploded after OpenAI revealed a deal with the U.S. Department of Defense hours after Anthropic publicly walked away from the same contract—earning the Claude maker the public scorn of President Trump and other government officials.

The new model consolidates reasoning, coding, and agentic capabilities into a single release. It also has a million tokens of context capability, which translates in users having more freedom to handle large amounts of information in a single session.

On paper, the numbers look promising. On GDPval—a benchmark testing knowledge work across 44 occupations—GPT-5.4 matches or beats industry professionals in 83.0% of comparisons, up from 70.9% for GPT-5.2. Computer use is the biggest leap: On OSWorld-Verified, which measures a model's ability to operate a desktop through screenshots and keyboard/mouse actions, GPT-5.4 hits a 75.0% success rate versus GPT-5.2's 47.3%—and clears the human baseline of 72.4%. On BrowseComp, a test of deep web research, it jumps 17 percentage points over GPT-5.2. The 1 million token context window and a mid-response steering feature—letting users redirect the model while it's still thinking—round out the headline features.

The feature saves time and computation by avoiding the need to discard all previously generated tokens when an error is detected.

Who will benefit from GPT 5.4?

Coders have the most reason to temper expectations: On SWE-Bench Pro, the improvement from GPT-5.3-Codex (56.8%) to GPT-5.4 (57.7%) is barely a rounding error. The model also claims significantly fewer tokens are required to complete tasks compared to GPT-5.2.

“GPT‑5.4 is our most token-efficient reasoning model yet, using significantly fewer tokens to solve problems when compared to GPT‑5.2”, OpenAI said. That said, any improvement in this field is a positive for developers who use OpenAI models via API and get charged per token used. A model with an efficient chain of thought may provide the same results at a fraction of the cost, versus a model that tends to overthink things to ensure it reaches the proper conclusion.

The clearest beneficiaries are enterprise users doing document-heavy work. On an internal spreadsheet modeling benchmark, GPT-5.4 scored 87.3% against GPT-5.2's 68.4%. Legal research firm Harvey said it scored 91% on its BigLaw Bench eval. Mainstay, which runs agents across 30,000 property tax portals, reported a 95% first-attempt success rate and sessions running "~3x faster while using ~70% fewer tokens."

That's the kind of efficiency argument that might matter to enterprise procurement teams—but it's a harder sell to the individual user reconsidering whether to delete their account.