You've implemented the Agentic Commerce Protocol. Your Shopify store has an ACP endpoint. Stripe's Agentic Commerce Suite is configured. But how do you know it actually works when a real AI agent tries to buy something?
This guide walks through testing your ACP endpoints systematically — from basic API validation to full synthetic agent simulation.
Before You Start: What You Need
- Your ACP endpoint URL (e.g.,
https://api.yourstore.com/acp/v1) - A sandbox API key (Bearer token)
- Your HMAC signing key (for request signatures)
- A test payment token from Stripe sandbox (
tok_visaworks) - cURL, Postman, or any HTTP client
Level 1: Basic API Validation (10 minutes)
Before simulating agents, verify the raw API works.
Test 1: Create a Checkout Session
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
-H "Authorization: Bearer YOUR_SANDBOX_KEY" \
-H "Content-Type: application/json" \
-H "API-Version: 2026-01-30" \
-H "Idempotency-Key: test-$(date +%s)" \
-d '{
"items": [{ "id": "YOUR_SKU", "quantity": 1 }],
"buyer": {
"first_name": "Test",
"last_name": "Buyer",
"email": "test@example.com"
},
"fulfillment_address": {
"name": "Test Buyer",
"line_one": "123 Main St",
"city": "San Francisco",
"state": "CA",
"country": "US",
"postal_code": "94105"
}
}' What to check:
| Check | Expected | If it fails |
|---|---|---|
| Status code | 201 Created | Your endpoint isn't creating sessions correctly |
status field | not_ready_for_payment or ready_for_payment | State machine is wrong |
line_items | Non-empty array with correct SKU | Product lookup is broken |
line_items[0].total | Greater than 0 (in minor units) | Pricing calculation is wrong |
fulfillment_options | Non-empty array | Shipping options aren't exposed |
totals | Array includes subtotal, tax, total | Missing totals calculation |
payment_handlers | At least one handler declared | Payment config missing |
currency | "usd" (lowercase ISO-4217) | Currency formatting wrong |
Test 2: Idempotency
Send the exact same request with the same Idempotency-Key:
# Send twice with identical key
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
-H "Authorization: Bearer YOUR_SANDBOX_KEY" \
-H "Content-Type: application/json" \
-H "API-Version: 2026-01-30" \
-H "Idempotency-Key: idempotency-test-001" \
-d '{"items": [{"id": "YOUR_SKU", "quantity": 1}], "buyer": {"email": "test@example.com"}}'
# Same request, same key — should return same session
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
-H "Authorization: Bearer YOUR_SANDBOX_KEY" \
-H "Content-Type: application/json" \
-H "API-Version: 2026-01-30" \
-H "Idempotency-Key: idempotency-test-001" \
-d '{"items": [{"id": "YOUR_SKU", "quantity": 1}], "buyer": {"email": "test@example.com"}}' Expected: Both responses return the same id. If you get two different session IDs, your idempotency implementation is broken. This is the bug that silently killed conversion on OpenAI's Instant Checkout.
Test 3: Error Responses
# Request with an invalid SKU
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
-H "Authorization: Bearer YOUR_SANDBOX_KEY" \
-H "Content-Type: application/json" \
-H "API-Version: 2026-01-30" \
-H "Idempotency-Key: error-test-$(date +%s)" \
-d '{"items": [{"id": "NONEXISTENT_SKU", "quantity": 1}], "buyer": {"email": "test@example.com"}}' Expected: 400 Bad Request with a proper ACP error:
{
"type": "error",
"code": "out_of_stock",
"message": "Product NONEXISTENT_SKU not found",
"param": "items[0].id"
} If you get a 500 Internal Server Error: Your endpoint is throwing unhandled exceptions on invalid input. This is the #1 most common ACP implementation bug. Agents will receive a useless error and silently abandon the checkout.
Level 2: State Machine Validation (20 minutes)
The ACP checkout is a state machine. Each transition has rules. Test that your states work correctly.
The ACP State Machine
┌──────────────────┐
│ not_ready_for │
create ────────▶│ _payment │◀──── update (missing fields)
└────────┬─────────┘
│ update (all fields provided)
▼
┌──────────────────┐
│ ready_for │
update ────────▶│ _payment │◀──── update
└────────┬─────────┘
│ complete
▼
┌──────────────────┐
│ completed │
└──────────────────┘
Any state ────▶ canceled (via cancel endpoint) Test 4: Full Happy Path
Walk through the complete flow:
- Create — Verify state is
not_ready_for_payment - Update with fulfillment selection — Verify state transitions to
ready_for_payment - Complete with payment token — Verify state is
completedandorder_idis present - Get the completed session — Verify it returns the final state
Test 5: Invalid State Transitions
Try completing a session that's not_ready_for_payment:
# Create a session without fulfillment selection (stays not_ready)
# Then immediately try to complete it
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions/{id}/complete \
-H "Authorization: Bearer YOUR_SANDBOX_KEY" \
-H "Content-Type: application/json" \
-d '{"payment_data": {"handler_id": "card_tokenized", ...}}' Expected: 400 or 422 error — not a 200 that creates an incomplete order.
Test 6: Cancel Flow
# Create a session, then cancel it
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions/{id}/cancel \
-H "Authorization: Bearer YOUR_SANDBOX_KEY" Expected: 200 with status: "canceled". Then verify:
- GET the canceled session — still returns
canceled(not deleted) - Try to update the canceled session — should return
405 Method Not Allowed - Try to complete the canceled session — should return
405
Level 3: Edge Case Testing (30 minutes)
This is where most implementations break. Real agents trigger these scenarios constantly.
Test 7: Out-of-Stock Mid-Checkout
- Create a session with an item that has low inventory
- Wait (or manually reduce inventory via your admin)
- Try to complete the checkout
What you're looking for: Does the endpoint return a clear out_of_stock error with the specific item ID? Or does it 500?
Test 8: Price Change During Session
- Create a session with an item at $49.99
- Change the item's price to $59.99 via your admin
- Complete the checkout
Question: Which price does the customer pay? The session should either: (a) lock the price at creation time, or (b) return an error on complete indicating the price changed. What it should NOT do: silently charge the new price without informing the agent.
Test 9: International Address
curl -X POST https://api.yourstore.com/acp/v1/checkout_sessions \
-H "Authorization: Bearer YOUR_SANDBOX_KEY" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: intl-test-$(date +%s)" \
-d '{
"items": [{"id": "YOUR_SKU", "quantity": 1}],
"buyer": {"email": "test@example.com"},
"fulfillment_address": {
"name": "Test Buyer",
"line_one": "10 Downing Street",
"city": "London",
"state": "",
"country": "GB",
"postal_code": "SW1A 2AA"
}
}' If you don't ship internationally: You should get a clear error: {"type": "error", "code": "address_invalid", "message": "We do not ship to GB"}. Not a 500. Not a silent acceptance that creates an unfulfillable order.
Test 10: Quantity Limits
Try ordering 999 of the same item. Try ordering 0. Try ordering -1. Your endpoint should validate quantity bounds and return meaningful errors.
Test 11: Session Timeout
Create a session and don't touch it for 30+ minutes. Then try to complete it. Does your endpoint handle stale sessions gracefully? The ACP spec doesn't mandate a timeout, but real implementations should expire sessions to prevent inventory lock-up.
Level 4: Performance Testing (15 minutes)
Agents have timeout thresholds. If your endpoint takes too long, the agent abandons.
Test 12: Latency Profile
Run 20 requests against each endpoint and measure response times:
# Simple latency test (repeat 20 times)
for i in $(seq 1 20); do
time curl -s -o /dev/null -w "%{time_total}" \
https://api.yourstore.com/acp/v1/checkout_sessions/{id}
done Target thresholds:
| Endpoint | p50 | p95 | Max acceptable |
|---|---|---|---|
| create_checkout | <500ms | <1.5s | 3s |
| get_checkout | <200ms | <500ms | 1s |
| update_checkout | <500ms | <1.5s | 3s |
| complete_checkout | <1s | <3s | 5s |
If your complete_checkout p95 exceeds 3 seconds, agents will start timing out. This was a documented issue in the Alpine Gear audit — 8.9 second p99 on complete_checkout was causing silent session failures.
Level 5: Synthetic Agent Simulation (The Full Test)
This is what AgentCheck automates — but here's how to do a basic version manually.
DIY Synthetic Agent Test
Use Claude or GPT-4o with tool-use to simulate a shopping session:
System prompt: You are testing an e-commerce checkout endpoint.
Your goal: buy one pair of running shoes under $100.
You have these tools:
- create_checkout(items, buyer, address)
- update_checkout(session_id, fulfillment_option_id)
- complete_checkout(session_id, payment_token)
- get_checkout(session_id)
Walk through a complete purchase. At each step, examine the response
and report anything unusual: missing fields, unexpected errors, slow
responses, incorrect pricing.
Use this buyer info:
- Name: Alex Test
- Email: alex@test.com
- Address: 456 Oak Ave, Portland, OR 97201
Use payment token: tok_visa Give the LLM your endpoint URL and credentials, let it run. The agent's "unusual findings" are your bug report.
What the Agent Will Catch That You Won't
- Fields your integration returns that no human would notice are missing (like
display_texton totals) - Error messages that are technically correct but incomprehensible to an AI agent
- State transitions that work but are semantically confusing
- Fulfillment options that are listed but can't actually be selected
- Payment handlers declared but not actually functional
The AgentCheck Checklist
Before going live, verify every item:
Protocol Compliance
- All responses match ACP spec v2026-01-30 JSON schema
- Status codes are correct (201 on create, 200 on update/complete/get)
- Error responses follow
{ type, code, message, param }format Idempotency-Keyis properly implemented- HMAC-SHA256 signature verification works
- Bearer token authentication works (and rejects invalid tokens)
Checkout Flow
- Create → Update → Complete happy path works end-to-end
- State machine transitions are correct
- Cancel works from any non-terminal state
- Invalid state transitions return errors (not silent failures)
Data Quality
- Product prices match your live storefront
- Inventory is current (not stale by >5 minutes)
- Tax calculation is correct for test addresses
- Fulfillment options include cost and delivery window
Edge Cases
- Out-of-stock returns proper
out_of_stockerror code - Invalid addresses return
address_invaliderror code - Payment decline returns
payment_declinederror code - Quantity limits are validated
- International addresses handled (accept or reject gracefully)
Performance
- All endpoints respond within 3 seconds at p95
complete_checkoutresponds within 5 seconds at p99- No latency regression under concurrent requests
Don't want to run these tests manually? AgentCheck automates all of this →