Flag and Tune Bad Responses
The Tuning flow is how you teach an Agent Stack to behave better without writing prompt-engineering by hand. Flag a bad reply, describe what was wrong, and the AI proposes a concrete edit — usually a tweak to a specialist’s instructions or a routing rule — that you apply with one click. Revert if it doesn’t help.
When to use this
- You spotted a reply in the Testing sandbox that’s wrong, off-tone, or missing info.
- A live conversation went sideways and you want to fix the underlying behavior so it doesn’t happen again.
- You’re iterating on a new specialist and want quick guided improvements rather than editing prompts blindly.
Before you start
- The stack must exist with at least one specialist.
- You need a bad reply to flag — either generate one in the Testing tab or open a real conversation that surfaced one. Flagging works the same way in both.
Steps
- Open Settings → Agent Stacks → [stack] → Testing.
- Generate a conversation that produces the bad reply (or pick an existing flagged message from the Tuning Session History at the top of the right panel).
- Click the flag icon on the bad reply. The message moves to the Tuning panel on the right.
- The Tuning panel asks: what went wrong? Type a short, plain-language description. Examples:
“Should have asked for the order ID before promising a refund.”
“Routed to Tech instead of Billing.”
“Tone is too casual for an enterprise customer.” - The AI may ask clarifying questions if your feedback is ambiguous. Answer them in the same panel.
- Once the AI is confident in the diagnosis, it produces a proposal — a concrete edit. Common proposals:
Update a specialist’s Instructions with a new rule.
Add a routing rule to the orchestrator’s prompt.
Adjust the Scope text on a specialist (changing the responsibility).
Change a specialist’s temperature (rare; specialists are typically forced to 0 at runtime). - Read the proposal. Each one shows the diff — what’s changing, where, and the rationale. Click Apply to commit it, or Dismiss to reject it.
- After applying, you can re-test the same scenario in the same panel to confirm the fix worked. The session history at the top of the panel keeps a record of every applied change.
Reverting a change
Every applied change is logged in the Session History at the top of the Tuning panel. Click the Undo action next to a change to revert it. The system prompt or routing rule snaps back to its prior state — no manual prompt editing required.
Verify it worked
- Send the same message that produced the bad reply originally. The new reply should match the corrected behavior.
- Send a few variations to make sure you didn’t over-correct (now the AI is too cautious about refunds, etc.).
- Watch the Tuning Session History to see the change persisted to the underlying specialist or routing rule.
What feeds Self-Learning
The same flagging signal — though more aggregated — feeds Self-Learning, which proposes improvements based on patterns across many real conversations rather than single flagged messages. Tuning is for the immediate fix; Self-Learning is for the longer-term pattern.
Troubleshooting
- Symptom: The Tuning panel doesn’t show any proposal — just keeps asking clarifying questions. Fix: your feedback is too abstract. Be specific: not “the reply was wrong” but “the reply confirmed a refund without asking for the order ID.”
- Symptom: The proposal would change the wrong specialist. Fix: dismiss it. Re-flag the same message and clarify which specialist’s behavior is at fault.
- Symptom: After applying, the bad behavior is back. Fix: the applied change might have been overwritten by a later edit (yours or someone else’s). Check the Change History tab on the stack to see the order of recent changes. Re-apply the tuning if needed.
- Symptom: “Apply” button is greyed out. Fix: the proposal hasn’t fully generated yet, or the AI is waiting for a clarifying answer. Read the panel for what it’s asking.