Settingsbeginner

Test an Agent Stack

Use the built-in Testing tab to chat with your Agent Stack as a customer, isolate individual specialists, simulate identity verification, and watch the routing decisions in real time.

Updated May 18, 20265 min read

Test an Agent Stack

The Testing tab is a sandbox that lets you chat with your Agent Stack exactly as a customer would, without affecting any real conversations. Use it before going live, after every meaningful change, and any time a customer reports a confusing reply.

Before you start

The stack must exist with at least one specialist.
(Optional) If your stack uses identity-gated capabilities (L2/L3 security levels), have a test user ID and verification secret ready so you can simulate the auth flow.

Steps

Test the whole stack

Open Settings → Agent Stacks → [stack] → Testing.
The default mode is Stack — meaning the router decides which specialist replies, just like in production.
Type a customer-style message in the input at the bottom. Hit Enter or click Send.
The reply appears with the routing decision — which specialist took it, plus model, tokens, and latency (if you’re a Super Admin).
Continue the conversation as the customer would.
Click Reset to clear and start over.

Test a single specialist in isolation

Switch the mode dropdown from Stack to Agent.
A specialist picker appears. Pick the one you want to test.
Type a message. The router is bypassed — every message goes directly to the chosen specialist.

This is the right way to verify a specialist’s instructions, capabilities, and knowledge without the router interfering.

Simulate identity verification

If your stack has capabilities that require L1, L2, or L3 authentication, the Testing tab simulates it.

Click Verify identity in the test panel header.
Enter a test user ID and the verification secret expected by your auth flow. The auth level (L0, L1, L2, L3) updates in the panel.
Continue the conversation. Capabilities that require an auth level higher than the test session’s level will be denied.

Persistence across visits

Your test session — mode, messages, selected specialist, auth level — persists in sessionStorage and is restored when you reopen the tab. Click Reset to clear it.

Verify it worked

The conversation flows naturally and the router picks the specialist you’d expect.
Replies match the personality you configured.
Capabilities that should be available execute. Ones that shouldn’t (wrong auth level, missing scope) are denied with a clear message.
Knowledge citations point to real KB articles in the categories you granted.

Flag bad replies

When a reply isn’t right — wrong specialist, missed information, awkward tone — click the flag icon on that message. The flagged message moves into the Tuning panel on the right, where you can give feedback and review proposed fixes. See Flag and tune bad responses.

Troubleshooting

Symptom: The router routes to the wrong specialist on test messages. Fix: see Define routing topics. The Testing tab uses the same router as production, so a mis-route here is a real mis-route.
Symptom: Replies in test feel fine but production replies feel off. Fix: the test sandbox uses simulated identity. Real conversations may have different identity-gated capabilities available, plus real KB content the AI is retrieving in real time. Run a few production conversations through the Monitor module to compare.
Symptom: A capability that works in test fails in production. Fix: confirm the capability’s security level isn’t blocking it in production for unauthenticated callers. The test sandbox lets you set auth levels manually; production won’t elevate without a real verification flow.