In September 2025, OpenAI launched the most ambitious shopping experiment in AI history: Instant Checkout, a feature that let ChatGPT's 800+ million weekly users buy products without leaving the conversation. Six months later, they killed it.

Only 12 merchants ever went live. Near-zero completed purchases. OpenAI hadn't even built sales tax collection. TD Cowen analysts called it "a stunning admission" that AI platforms replacing apps as the "new OS" is "either not playing out, or at a minimum is pushed back significantly."

But here's the thing the hot takes missed: the demand was real. Eighteen percent of US adults were already using ChatGPT to shop for discretionary products. Fifty million shopping queries hit ChatGPT daily. AI-referred retail traffic grew 1,200% year-over-year. Users loved the discovery. They just couldn't complete the checkout.

This isn't a story about agentic commerce failing. It's a story about infrastructure failing to keep up with demand. And every merchant implementing ACP or UCP endpoints right now should study what went wrong — because these same failure modes will break your agent-driven checkout flows too.

The Timeline

September 29, 2025: OpenAI announces Instant Checkout in partnership with Stripe, Etsy, and Shopify. Simultaneously launches the open-source Agentic Commerce Protocol (ACP). The vision: any ChatGPT user can discover a product in conversation and buy it without leaving the chat.

October 2025: Stripe begins pushing merchants to enable agentic payments through ChatGPT. Forrester's Market Research Online Community finds only 8% of US online adults have actually used Instant Checkout — despite 18% using ChatGPT for shopping.

December 8, 2025: Instacart becomes the first major third-party app partner, launching embedded shopping within ChatGPT. This is the first sign of the pivot: why would you need third-party apps if native checkout worked?

February 2026: Reporting reveals OpenAI still hasn't built a system for collecting and remitting state sales taxes. The feature can't legally process transactions in most US states. ChatGPT "Buy it in ChatGPT" launches February 16, but few merchants have integrated.

March 2026: OpenAI announces Instant Checkout is being deprecated. Purchases will route through third-party apps (Instacart, Target, Expedia, Booking.com) or redirect to merchant websites. The native checkout dream is dead.

The 6 Root Causes

1. Stale Inventory Data

When a human browses a website, they see the current inventory because the page renders in real-time from the merchant's database. When an AI agent queries an API endpoint, it sees whatever the product feed last reported.

OpenAI's Instant Checkout relied on merchant product feeds that updated anywhere from every 15 minutes to every 60 minutes. In fast-moving categories (fashion drops, electronics deals, limited-edition products), this created a fundamental problem: agents were showing products that no longer existed.

As The Drum reported: "When a chatbot becomes the interface for shopping, inventory synchronization must happen instantly. If a user asks ChatGPT to buy a product that has just gone out of stock or whose price has changed, the system must detect that change before completing payment."

This gets exponentially harder at scale. Retail systems run on different commerce software, inventory tools, and pricing structures — requiring large-scale data standardization and constant updates across millions of merchants.

What a test would have caught: A simple "staleness check" — comparing the ACP endpoint's reported inventory against the merchant's live storefront — would have revealed the gap immediately. You don't need fancy simulation for this. You need a script that queries both sources and diffs them.

2. Missing Sales Tax Collection

This one is almost comical in its simplicity: as of February 2026 — five months after launch — OpenAI had not built a system for collecting and remitting state sales taxes across the United States.

This isn't a minor technical detail. It's a legal requirement. US states have complex, overlapping tax jurisdictions. Nexus rules determine which merchants owe sales tax where. Tax rates vary by product category, jurisdiction, and sometimes even by street address. Companies like Avalara and TaxJar exist specifically because this problem is so hard.

OpenAI, a company focused on AGI research, apparently didn't budget for this. Lengow's analysis nailed it: "a live regulatory problem and a sign that the underlying commercial infrastructure was less complete than the press releases suggested."

What a test would have caught: A protocol compliance test checking whether the checkout response includes proper tax calculation in the totals array. If the tax field returns 0 for a taxable item shipped to a state with sales tax, that's an immediate red flag.

3. Fraud System False Positives

Existing fraud detection systems are trained on human behavioral patterns: mouse movements, click sequences, session duration, device fingerprinting, geographic consistency. When an AI agent initiates a transaction, none of these signals exist.

The result: legitimate agent-originated transactions were flagged for manual review. Merchants reported that their fraud stacks couldn't distinguish a real AI agent acting on a customer's behalf from a sophisticated bot attempting credential stuffing or card testing.

Checkout.com's analysis identified this as systemic: "Escalation mishandling is a larger revenue risk than payment decline rates." In other words, the fraud system blocking good transactions was costing more than actual fraud.

What a test would have caught: Running synthetic agent transactions against a merchant's live fraud scoring (in sandbox mode) would immediately reveal the false positive rate. If >10% of legitimate agent transactions are flagged, that's a measurable problem with a measurable revenue cost.

4. Session Escalation Failures

ACP and UCP both have mechanisms for "escalation" — when a checkout requires human intervention (age verification, complex shipping selection, Strong Customer Authentication). In ACP, this surfaces through the messages array. In UCP, it uses the requires_escalation state with a continue_url.

When escalation happened in OpenAI's implementation, the handoff broke silently. The session would cancel without the user knowing why. The UI didn't properly handle the redirect to the merchant's authentication page. The result: a silent conversion killer.

What a test would have caught: A scenario that triggers escalation (e.g., an age-gated product or a shipping address requiring address verification) would immediately reveal whether the handoff works. If the session transitions to requires_escalation but the continue_url returns a 404 or the redirect chain breaks, that's a critical failure.

5. No Idempotency

Network requests fail. Agents retry. If your checkout endpoint doesn't implement idempotency correctly, those retries create duplicate sessions, duplicate charges, and duplicate orders.

In OpenAI's case, the ACP spec required Idempotency-Key header support — but not all merchant implementations enforced it. Sending the same CreateCheckoutRequest with the same Idempotency-Key created two separate sessions instead of returning the existing one.

The insidious part: idempotency failures "silently reduced conversion without affecting traffic metrics." From the dashboard, everything looked fine. But customers were being double-charged, abandoning in confusion, and generating chargebacks.

What a test would have caught: Send the same request twice with the same Idempotency-Key. If you get two different session IDs, the endpoint is broken. This is a five-line test.

6. Limited Feature Set (The Paper-Cut Death)

Real e-commerce is messy. People use coupon codes. They order multiple items. They want to see shipping costs before committing. They need to compare fulfillment options. They want gift wrapping.

Instant Checkout launched without support for multi-item carts, promotional codes, or transparent shipping information. These aren't edge cases — they're table stakes. Every one of them is a reason a customer would click away and buy from a real website instead.

Search Engine Land noted: "Conversion rates across the e-commerce industry hover somewhere between one and four percent." OpenAI apparently expected to achieve those rates while missing basic feature parity with a 2010-era Shopify store.

The Behavioral Gap: Discovery ≠ Conversion

The most important insight from OpenAI's data isn't technical. It's behavioral.

Users loved using ChatGPT for shopping research. They asked what to buy, compared options, got recommendations. OpenAI's own data (reported by The Information) showed "few users were finalizing their purchases inside the chatbot, despite many of them using it to browse for products."

The Silk Road Nexus analysis identified the psychological barrier: "Shopping is a behavior built on habits — people know how to buy on Amazon, trust the checkout flow on their favorite brand's website, and have saved cards and loyalty points embedded in platforms they've used for years. Asking them to complete a transaction inside a chatbot required rewiring a habit loop that commerce companies have spent decades reinforcing."

Lengow reinforced this: "Buying a $400 jacket through a chatbot interface, with payment data stored by OpenAI, is a different psychological proposition than buying it on the retailer's own checkout page."

The industry summary is brutal but accurate: "The discovery layer worked. The transaction layer did not."

The Contrast: Who's Getting It Right?

While OpenAI fumbled, two companies proved agentic commerce works — if you own the infrastructure.

Amazon Rufus reached 300 million users and generated $12 billion in incremental sales in 2025. Amazon succeeded because it controls the entire stack: AI model, marketplace, payment infrastructure, fulfillment, and customer relationships. There's no inventory sync problem when the AI is the store.

Alibaba's Qwen accumulated 10 million downloads in 7 days, surpassed 100 million monthly active users within two months, and completes real purchases inside a single conversational interface — food orders, travel bookings, product purchases. As The Drum noted: "Alibaba owns the AI model, the marketplace, the payment rails, and the logistics."

The pattern is clear: vertical integration (owning platform, payments, fulfillment) enables working agentic commerce. Horizontal approaches (building an AI layer on top of fragmented merchant infrastructure) fail — unless the infrastructure is tested and reliable.

This is exactly where Prova fits. Most merchants aren't Amazon or Alibaba. They need to make their existing endpoints work reliably with the agents that are already sending traffic. The only way to know if they work is to test them.

What This Means for Your ACP/UCP Endpoints

Every root cause of OpenAI's failure maps to a testable condition:

Failure	Test
Stale inventory	Compare API inventory vs. live storefront
Missing tax	Check `totals.tax` for taxable items by state
Fraud false positives	Run synthetic agent transactions through fraud scoring
Escalation failures	Trigger `requires_escalation` and verify `continue_url` works
Idempotency bugs	Same request + same key = same session?
Feature gaps	Multi-item, coupons, shipping options — do they work?

OpenAI spent six months discovering these problems in production. You can discover them in two minutes with a synthetic agent test.

The lesson isn't that agentic commerce is dead. It's that untested agentic commerce is dead.

Want to test your endpoints before real agents hit them? Get your free Agent Readiness Score

Why OpenAI Killed Instant Checkout: A Technical Post-Mortem