Testing

The 12 Ways Agentic Checkout Flows Break (And How to Test for Each)

· 11 min read

After analyzing hundreds of ACP and UCP endpoints, we've cataloged the 12 most common failure patterns in agentic checkout flows. Each one is a silent revenue killer — the agent encounters it, fails, and moves on. No error report. No retry. No second chance.

Here's what breaks, why it breaks, and exactly how to test for it.


1. The 500 on Out-of-Stock

What happens: An agent tries to checkout an item that's out of stock. Instead of returning a proper out_of_stock error, the endpoint throws an unhandled exception and returns 500 Internal Server Error.

Why it's common: Developers test the happy path (item in stock) but forget that inventory can hit zero between the agent's browse and checkout request. The backend tries to create an order for a zero-quantity item and crashes.

The fix: Validate inventory before creating the checkout session. Return 400 with {"type": "error", "code": "out_of_stock", "message": "SKU-001 is out of stock", "param": "items[0].id"}.

Test: Send a CreateCheckoutRequest with a known out-of-stock SKU. Expect a 400, not a 500.

Severity: Critical. Every OOS item becomes an unrecoverable error for every agent.


2. The Stale Inventory Gap

What happens: Your ACP endpoint says an item is available. The agent starts checkout. By the time the agent completes payment, the item is sold out. The order is created but can't be fulfilled.

Why it's common: Product feeds update every 15–60 minutes. Fast-moving inventory (flash sales, limited drops, restocks) changes faster than the feed syncs.

The fix: Validate real-time inventory at both CreateCheckout and CompleteCheckout. Use Shopify webhooks or similar for near-real-time updates instead of polling.

Test: Compare your ACP endpoint's inventory response against your live storefront for 10 random SKUs. If any differ, measure the staleness gap. Anything over 5 minutes is a risk.

Severity: High. Creates unfulfillable orders, chargebacks, and customer complaints.


3. The Silent Address Rejection

What happens: An agent sends an international or unusual address. The endpoint accepts it without validation, creates the checkout, and proceeds to payment. But the fulfillment system can't actually ship there. The order fails during processing, long after the customer thinks it's confirmed.

Why it's common: Address validation is hard — especially for international addresses. Many implementations defer validation to the fulfillment stage instead of catching it at checkout.

The fix: Validate the address against your shipping zones at checkout creation. If you don't ship to a region, return address_invalid immediately. Don't create a checkout session you can't fulfill.

Test: Send checkout requests with addresses in countries you don't ship to (e.g., country: "GB" if you're US-only). Also test edge cases: PO Boxes, APO/FPO military addresses, US territories (Puerto Rico, Guam).

Severity: High. Creates false confirmations that lead to cancellations and chargebacks.


4. The Idempotency Void

What happens: An agent's network request times out. It retries with the same Idempotency-Key. Your endpoint ignores the key and creates a duplicate checkout session. The agent now has two sessions, potentially leading to a double charge.

Why it's common: Idempotency is mentioned in the ACP spec but not enforced by the protocol itself. Many implementations skip it entirely or implement it incorrectly (checking the key but not comparing the request body).

The fix: Store Idempotency-Key → session_id mappings with a 24-hour TTL. On duplicate key: if params match, return the existing session. If params differ, return 409 Conflict.

Test: Send identical CreateCheckoutRequest twice with the same Idempotency-Key. Verify you get the same id back. Then send a different request body with the same key — verify you get 409.

Severity: Critical. Double charges are the fastest way to generate chargebacks and lose merchant trust.


5. The Escalation Dead End

What happens: The checkout requires human intervention (age verification, SCA, complex shipping selection). UCP returns requires_escalation with a continue_url. The continue_url is either broken (404), expired, or leads to a page that doesn't properly return to the agent session.

Why it's common: Escalation flows require tight coordination between the agent interface and the merchant's web UI. Most implementations test the API but never test the actual browser redirect experience that the human sees.

The fix: Ensure continue_url is a valid, stable URL that: (a) loads within 3 seconds, (b) includes the session context so the merchant knows which checkout to resume, and (c) returns the user to the agent after completion with an updated session state.

Test: Trigger escalation (age-gated product, or force the requires_escalation state). Open the continue_url in a browser. Verify the page loads, allows the required action, and the checkout session transitions to ready_for_complete afterward.

Severity: Critical. A broken escalation flow means the agent can never complete the checkout.


6. The Tax Black Hole

What happens: Your checkout totals include a tax field that always returns 0 — even for taxable items shipped to states with sales tax. Or worse, the tax field is missing entirely from the response.

Why it's common: Tax calculation is complex and often handled by a third-party service (Avalara, TaxJar). If the integration is misconfigured, disabled, or throws an error, the checkout proceeds without tax. This was a documented issue for OpenAI's Instant Checkout.

The fix: Always calculate tax for the given shipping address before returning a response. If the tax service is unavailable, return an error rather than zero tax. Never complete a checkout with zero tax for a taxable item in a tax-collecting jurisdiction.

Test: Send checkout requests with addresses in states that have sales tax (CA, NY, TX). Verify the totals array includes a non-zero tax entry. Then test with tax-exempt states (OR, MT) to verify they correctly return zero.

Severity: Critical. Non-collection of sales tax is a legal compliance issue that creates liability for the merchant.


7. The Missing display_text

What happens: Your totals include amount (in minor units like cents) but omit display_text. The agent needs to tell the user "Your total is $64.99" but only has the number 6499. The agent must guess the currency formatting — and gets it wrong for non-USD currencies.

Why it's common: display_text isn't strictly required in all versions of the ACP spec, so developers skip it. But agents heavily rely on it for user-facing price display.

The fix: Always include display_text on every amount field. Format it with the correct currency symbol, decimal separator, and thousands separator for the checkout's currency.

Test: Check every amount field in the response for a corresponding display_text. Flag any missing ones.

Severity: Medium. Won't break checkout, but degrades the user experience and may lead to currency formatting errors.


8. The Fulfillment Phantom

What happens: Your endpoint returns fulfillment options in the checkout response, but when the agent selects one via update_checkout, the selected option is silently ignored or returns an error.

Why it's common: Fulfillment options are often generated dynamically based on inventory location, carrier availability, and address. By the time the agent selects an option, the underlying conditions may have changed — but the endpoint doesn't communicate this.

The fix: When a fulfillment option is selected, validate that it's still available. If not, return an error with the updated list of available options.

Test: Create a checkout, note the fulfillment_options. Select each option one by one and verify the session updates correctly. Then try selecting an option ID that doesn't exist — verify you get a proper error, not a silent acceptance.

Severity: Medium. Causes checkout failures that are hard for agents (and developers) to diagnose.


9. The Latency Cliff

What happens: Your create_checkout and update_checkout respond in <500ms. But complete_checkout takes 5–10 seconds because it synchronously calls the payment processor, sends webhook notifications, updates inventory, and triggers email confirmation.

Why it's common: Developers optimize read operations but don't profile the full completion pipeline, which involves multiple external service calls.

The fix: Make complete_checkout async where possible. Acknowledge the request quickly (return in_progress state), process the payment, then update the session to completed via polling or webhook. If sync is required, set a strict timeout on external calls.

Test: Measure p50, p95, and p99 latency for each endpoint across 20+ requests. If complete_checkout p95 exceeds 3 seconds, agents will start timing out.

Severity: High. At p99, you lose the most patient agents. At p95, you lose most agents.


10. The Signature Bypass

What happens: Your endpoint is supposed to verify HMAC-SHA256 signatures on incoming requests (per ACP spec). But the signature verification is disabled, misconfigured, or only checked in production (not sandbox). Any HTTP client can send requests to your checkout endpoint.

Why it's common: Signature verification is fiddly to implement and easy to skip during development. It stays skipped into production.

The fix: Verify signatures on every request, in every environment. Use the Vercel ACP Handler library which includes built-in signature verification.

Test: Send a request with an invalid Signature header. Expect 401 Unauthorized. Then send a request with no signature at all. Same expectation. If either succeeds, your endpoint is vulnerable.

Severity: Critical (security). An unsigned endpoint is open to request forgery and replay attacks.


11. The Schema Drift

What happens: Your endpoint returns responses that almost match the ACP/UCP spec — but with subtle differences. A field name is misspelled (fulfillment_option vs fulfillment_options). An amount is a string instead of an integer. A required field is occasionally null. Agents can't parse the response reliably.

Why it's common: Manual implementation without schema validation. Copy-paste from outdated documentation. Different code paths returning slightly different response shapes.

The fix: Validate every response against the official JSON Schema before returning it. Both ACP and UCP publish schemas on GitHub. Use ajv (JavaScript) or jsonschema (Python) for runtime validation.

Test: Run every response through the official JSON Schema validator. Any validation error is a bug — even if it "works" in practice, a spec-compliant agent may choke on it.

Severity: Medium. Usually causes intermittent failures that are extremely hard to debug.


12. The Abandoned Session Leak

What happens: Agents create checkout sessions, browse, compare, and often abandon without completing or canceling. These sessions accumulate in your database, potentially holding inventory reservations and consuming resources.

Why it's common: Human shoppers abandon carts too (70% of the time), but at least their sessions expire when the browser tab closes. Agent sessions persist until explicitly canceled — and agents rarely cancel.

The fix: Implement session TTLs. Auto-expire sessions after 30 minutes of inactivity. Release any inventory holds when sessions expire. Track abandonment rates by agent source to identify problematic integrations.

Test: Create 100 sessions without completing or canceling them. Verify they expire after your TTL. Verify inventory holds are released. Check that expired sessions return appropriate status when queried.

Severity: Low-Medium. Won't break individual checkouts, but degrades system health over time.


The Prova Testing Matrix

# Failure Test Type Severity
1 500 on out-of-stock Edge case Critical
2 Stale inventory Data quality High
3 Silent address rejection Edge case High
4 Idempotency void Protocol compliance Critical
5 Escalation dead end Flow completeness Critical
6 Tax black hole Compliance Critical
7 Missing display_text Data quality Medium
8 Fulfillment phantom Edge case Medium
9 Latency cliff Performance High
10 Signature bypass Security Critical
11 Schema drift Protocol compliance Medium
12 Abandoned session leak Operations Low-Medium

Five critical, three high, three medium, one low. If you have any of the five critical issues, your agent-driven checkout is fundamentally broken.


Prova tests for all 12 failure patterns automatically. Get your Agent Readiness Score →

?

Your agents are already shopping. Is your checkout ready?

Agent-driven commerce traffic is projected to grow 1,200% over the next two years. Businesses that aren't ready will lose sales to those that are. Prova gives your checkout the machine-readable layer it needs so AI agents can discover, validate, and complete purchases — without friction.

MCP & A2A protocols supported Sandbox-only environment SOC 2 compliance planned