Agent Stacksintermediate

Flag and Tune Bad Responses

Flag any AI reply that missed the mark, give feedback in plain language, and let Atender propose concrete edits — system prompt changes, routing tweaks — that you can apply with one click or revert.

May 10, 2026

Flag and Tune Bad Responses

The Tuning flow is how you teach an Agent Stack to behave better without writing prompt-engineering by hand. Flag a bad reply, describe what was wrong, and the AI proposes a concrete edit — usually a tweak to a specialist’s instructions or a routing rule — that you apply with one click. Revert if it doesn’t help.

When to use this

You spotted a reply in the Testing sandbox that’s wrong, off-tone, or missing info.
A live conversation went sideways and you want to fix the underlying behavior so it doesn’t happen again.
You’re iterating on a new specialist and want quick guided improvements rather than editing prompts blindly.

Before you start

The stack must exist with at least one specialist.
You need a bad reply to flag — either generate one in the Testing tab or open a real conversation that surfaced one. Flagging works the same way in both.

Steps

Open Settings → Agent Stacks → [stack] → Testing.
Generate a conversation that produces the bad reply (or pick an existing flagged message from the Tuning Session History at the top of the right panel).
Click the flag icon on the bad reply. The message moves to the Tuning panel on the right.
The Tuning panel asks: what went wrong? Type a short, plain-language description. Examples:
“Should have asked for the order ID before promising a refund.”
“Routed to Tech instead of Billing.”
“Tone is too casual for an enterprise customer.”
The AI may ask clarifying questions if your feedback is ambiguous. Answer them in the same panel.
Once the AI is confident in the diagnosis, it produces a proposal — a concrete edit. Common proposals:
Update a specialist’s Instructions with a new rule.
Add a routing rule to the orchestrator’s prompt.
Adjust the Scope text on a specialist (changing the responsibility).
Change a specialist’s temperature (rare; specialists are typically forced to 0 at runtime).
Read the proposal. Each one shows the diff — what’s changing, where, and the rationale. Click Apply to commit it, or Dismiss to reject it.
After applying, you can re-test the same scenario in the same panel to confirm the fix worked. The session history at the top of the panel keeps a record of every applied change.

Reverting a change

Every applied change is logged in the Session History at the top of the Tuning panel. Click the Undo action next to a change to revert it. The system prompt or routing rule snaps back to its prior state — no manual prompt editing required.

Verify it worked

Send the same message that produced the bad reply originally. The new reply should match the corrected behavior.
Send a few variations to make sure you didn’t over-correct (now the AI is too cautious about refunds, etc.).
Watch the Tuning Session History to see the change persisted to the underlying specialist or routing rule.

What feeds Self-Learning

The same flagging signal — though more aggregated — feeds Self-Learning, which proposes improvements based on patterns across many real conversations rather than single flagged messages. Tuning is for the immediate fix; Self-Learning is for the longer-term pattern.

Troubleshooting

Symptom: The Tuning panel doesn’t show any proposal — just keeps asking clarifying questions. Fix: your feedback is too abstract. Be specific: not “the reply was wrong” but “the reply confirmed a refund without asking for the order ID.”
Symptom: The proposal would change the wrong specialist. Fix: dismiss it. Re-flag the same message and clarify which specialist’s behavior is at fault.
Symptom: After applying, the bad behavior is back. Fix: the applied change might have been overwritten by a later edit (yours or someone else’s). Check the Change History tab on the stack to see the order of recent changes. Re-apply the tuning if needed.
Symptom: “Apply” button is greyed out. Fix: the proposal hasn’t fully generated yet, or the AI is waiting for a clarifying answer. Read the panel for what it’s asking.

Flag and Tune Bad Responses

Flag and Tune Bad Responses

When to use this

Before you start

Steps

Reverting a change

Verify it worked

What feeds Self-Learning

Troubleshooting

See also

See Atender in action