You've implemented the Agentic Commerce Protocol. Your Shopify store has an ACP endpoint. Stripe's Agentic Commerce Suite is configured. But how do you know it actually works when a real AI agent tries to buy something?

This guide walks through testing your ACP endpoints systematically — from basic API validation to full synthetic agent simulation.

Before You Start: What You Need

Your ACP endpoint URL (e.g., https://api.yourstore.com/acp/v1)
A sandbox API key (Bearer token)
Your HMAC signing key (for request signatures)
A test payment token from Stripe sandbox (tok_visa works)
cURL, Postman, or any HTTP client

Level 1: Basic API Validation (10 minutes)

Before simulating agents, verify the raw API works.

Test 1: Create a Checkout Session

curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
  -H "Authorization: Bearer YOUR_SANDBOX_KEY" \
  -H "Content-Type: application/json" \
  -H "API-Version: 2026-01-30" \
  -H "Idempotency-Key: test-$(date +%s)" \
  -d '{
    "items": [{ "id": "YOUR_SKU", "quantity": 1 }],
    "buyer": {
      "first_name": "Test",
      "last_name": "Buyer",
      "email": "test@example.com"
    },
    "fulfillment_address": {
      "name": "Test Buyer",
      "line_one": "123 Main St",
      "city": "San Francisco",
      "state": "CA",
      "country": "US",
      "postal_code": "94105"
    }
  }'

What to check:

Check	Expected	If it fails
Status code	`201 Created`	Your endpoint isn't creating sessions correctly
`status` field	`not_ready_for_payment` or `ready_for_payment`	State machine is wrong
`line_items`	Non-empty array with correct SKU	Product lookup is broken
`line_items[0].total`	Greater than 0 (in minor units)	Pricing calculation is wrong
`fulfillment_options`	Non-empty array	Shipping options aren't exposed
`totals`	Array includes `subtotal`, `tax`, `total`	Missing totals calculation
`payment_handlers`	At least one handler declared	Payment config missing
`currency`	`"usd"` (lowercase ISO-4217)	Currency formatting wrong

Test 2: Idempotency

Send the exact same request with the same Idempotency-Key:

# Send twice with identical key
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
  -H "Authorization: Bearer YOUR_SANDBOX_KEY" \
  -H "Content-Type: application/json" \
  -H "API-Version: 2026-01-30" \
  -H "Idempotency-Key: idempotency-test-001" \
  -d '{"items": [{"id": "YOUR_SKU", "quantity": 1}], "buyer": {"email": "test@example.com"}}'

# Same request, same key — should return same session
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
  -H "Authorization: Bearer YOUR_SANDBOX_KEY" \
  -H "Content-Type: application/json" \
  -H "API-Version: 2026-01-30" \
  -H "Idempotency-Key: idempotency-test-001" \
  -d '{"items": [{"id": "YOUR_SKU", "quantity": 1}], "buyer": {"email": "test@example.com"}}'

Expected: Both responses return the same id. If you get two different session IDs, your idempotency implementation is broken. This is the bug that silently killed conversion on OpenAI's Instant Checkout.

Test 3: Error Responses

# Request with an invalid SKU
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
  -H "Authorization: Bearer YOUR_SANDBOX_KEY" \
  -H "Content-Type: application/json" \
  -H "API-Version: 2026-01-30" \
  -H "Idempotency-Key: error-test-$(date +%s)" \
  -d '{"items": [{"id": "NONEXISTENT_SKU", "quantity": 1}], "buyer": {"email": "test@example.com"}}'

Expected: 400 Bad Request with a proper ACP error:

{
  "type": "error",
  "code": "out_of_stock",
  "message": "Product NONEXISTENT_SKU not found",
  "param": "items[0].id"
}

If you get a 500 Internal Server Error: Your endpoint is throwing unhandled exceptions on invalid input. This is the #1 most common ACP implementation bug. Agents will receive a useless error and silently abandon the checkout.

Level 2: State Machine Validation (20 minutes)

The ACP checkout is a state machine. Each transition has rules. Test that your states work correctly.

The ACP State Machine

                    ┌──────────────────┐
                    │  not_ready_for   │
    create ────────▶│    _payment      │◀──── update (missing fields)
                    └────────┬─────────┘
                             │ update (all fields provided)
                             ▼
                    ┌──────────────────┐
                    │  ready_for       │
    update ────────▶│    _payment      │◀──── update
                    └────────┬─────────┘
                             │ complete
                             ▼
                    ┌──────────────────┐
                    │   completed      │
                    └──────────────────┘

    Any state ────▶ canceled (via cancel endpoint)

Test 4: Full Happy Path

Walk through the complete flow:

Create — Verify state is not_ready_for_payment
Update with fulfillment selection — Verify state transitions to ready_for_payment
Complete with payment token — Verify state is completed and order_id is present
Get the completed session — Verify it returns the final state

Test 5: Invalid State Transitions

Try completing a session that's not_ready_for_payment:

# Create a session without fulfillment selection (stays not_ready)
# Then immediately try to complete it
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions/{id}/complete \
  -H "Authorization: Bearer YOUR_SANDBOX_KEY" \
  -H "Content-Type: application/json" \
  -d '{"payment_data": {"handler_id": "card_tokenized", ...}}'

Expected: 400 or 422 error — not a 200 that creates an incomplete order.

Test 6: Cancel Flow

# Create a session, then cancel it
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions/{id}/cancel \
  -H "Authorization: Bearer YOUR_SANDBOX_KEY"

Expected: 200 with status: "canceled". Then verify:

GET the canceled session — still returns canceled (not deleted)
Try to update the canceled session — should return 405 Method Not Allowed
Try to complete the canceled session — should return 405

Level 3: Edge Case Testing (30 minutes)

This is where most implementations break. Real agents trigger these scenarios constantly.

Test 7: Out-of-Stock Mid-Checkout

Create a session with an item that has low inventory
Wait (or manually reduce inventory via your admin)
Try to complete the checkout

What you're looking for: Does the endpoint return a clear out_of_stock error with the specific item ID? Or does it 500?

Test 8: Price Change During Session

Create a session with an item at $49.99
Change the item's price to $59.99 via your admin
Complete the checkout

Question: Which price does the customer pay? The session should either: (a) lock the price at creation time, or (b) return an error on complete indicating the price changed. What it should NOT do: silently charge the new price without informing the agent.

Test 9: International Address

curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
  -H "Authorization: Bearer YOUR_SANDBOX_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: intl-test-$(date +%s)" \
  -d '{
    "items": [{"id": "YOUR_SKU", "quantity": 1}],
    "buyer": {"email": "test@example.com"},
    "fulfillment_address": {
      "name": "Test Buyer",
      "line_one": "10 Downing Street",
      "city": "London",
      "state": "",
      "country": "GB",
      "postal_code": "SW1A 2AA"
    }
  }'

If you don't ship internationally: You should get a clear error: {"type": "error", "code": "address_invalid", "message": "We do not ship to GB"}. Not a 500. Not a silent acceptance that creates an unfulfillable order.

Test 10: Quantity Limits

Try ordering 999 of the same item. Try ordering 0. Try ordering -1. Your endpoint should validate quantity bounds and return meaningful errors.

Test 11: Session Timeout

Create a session and don't touch it for 30+ minutes. Then try to complete it. Does your endpoint handle stale sessions gracefully? The ACP spec doesn't mandate a timeout, but real implementations should expire sessions to prevent inventory lock-up.

Level 4: Performance Testing (15 minutes)

Agents have timeout thresholds. If your endpoint takes too long, the agent abandons.

Test 12: Latency Profile

Run 20 requests against each endpoint and measure response times:

# Simple latency test (repeat 20 times)
for i in $(seq 1 20); do
  time curl -s -o /dev/null -w "%{time_total}" \
    https://api.yourstore.com/acp/v1/checkout_sessions/{id}
done

Target thresholds:

Endpoint	p50	p95	Max acceptable
create_checkout	<500ms	<1.5s	3s
get_checkout	<200ms	<500ms	1s
update_checkout	<500ms	<1.5s	3s
complete_checkout	<1s	<3s	5s

If your complete_checkout p95 exceeds 3 seconds, agents will start timing out. This was a documented issue in the Alpine Gear audit — 8.9 second p99 on complete_checkout was causing silent session failures.

Level 5: Synthetic Agent Simulation (The Full Test)

This is what AgentCheck automates — but here's how to do a basic version manually.

DIY Synthetic Agent Test

Use Claude or GPT-4o with tool-use to simulate a shopping session:

System prompt: You are testing an e-commerce checkout endpoint.
Your goal: buy one pair of running shoes under $100.

You have these tools:
- create_checkout(items, buyer, address)
- update_checkout(session_id, fulfillment_option_id)
- complete_checkout(session_id, payment_token)
- get_checkout(session_id)

Walk through a complete purchase. At each step, examine the response
and report anything unusual: missing fields, unexpected errors, slow
responses, incorrect pricing.

Use this buyer info:
- Name: Alex Test
- Email: alex@test.com
- Address: 456 Oak Ave, Portland, OR 97201

Use payment token: tok_visa

Give the LLM your endpoint URL and credentials, let it run. The agent's "unusual findings" are your bug report.

What the Agent Will Catch That You Won't

Fields your integration returns that no human would notice are missing (like display_text on totals)
Error messages that are technically correct but incomprehensible to an AI agent
State transitions that work but are semantically confusing
Fulfillment options that are listed but can't actually be selected
Payment handlers declared but not actually functional

The AgentCheck Checklist

Before going live, verify every item:

Protocol Compliance

All responses match ACP spec v2026-01-30 JSON schema
Status codes are correct (201 on create, 200 on update/complete/get)
Error responses follow { type, code, message, param } format
Idempotency-Key is properly implemented
HMAC-SHA256 signature verification works
Bearer token authentication works (and rejects invalid tokens)

Checkout Flow

Create → Update → Complete happy path works end-to-end
State machine transitions are correct
Cancel works from any non-terminal state
Invalid state transitions return errors (not silent failures)

Data Quality

Product prices match your live storefront
Inventory is current (not stale by >5 minutes)
Tax calculation is correct for test addresses
Fulfillment options include cost and delivery window

Edge Cases

Out-of-stock returns proper out_of_stock error code
Invalid addresses return address_invalid error code
Payment decline returns payment_declined error code
Quantity limits are validated
International addresses handled (accept or reject gracefully)

Performance

All endpoints respond within 3 seconds at p95
complete_checkout responds within 5 seconds at p99
No latency regression under concurrent requests

Don't want to run these tests manually? AgentCheck automates all of this →

How to Test Your ACP Endpoints: A Step-by-Step Guide