Skip to main content
While every agent run is automatically evaluated, you can also provide manual thumbs up/down feedback to give additional signal about what’s working. This supplements the automatic evaluation system with human judgment.
Manual feedback is optional - Automatic evaluations already drive improvement suggestions. Use manual feedback when you want to provide additional context or when the automatic score doesn’t capture the full picture.

How It Works

After each agent run, users can:
  • 👍 Thumbs up - Good execution, agent performed well
  • 👎 Thumbs down - Poor execution, something went wrong
  • 💬 Optional comment - Explain what was good or bad

Where to Provide Feedback

Feedback options appear in multiple places: Agent Execution Detail Page:
  • Click into any run from agent history
  • Thumbs up/down buttons at the top
  • Comment field for additional context
Slack Bot Responses (if Slack trigger enabled):
  • React with 👍 or 👎 emoji
  • Reply in thread for comments
Email Responses (if email trigger enabled):
  • Reply with “+1” (positive) or “-1” (negative)
  • Include explanation in email body
API Responses:
  • Include feedback via API when invoking programmatically
  • See API documentation for details

How Feedback Complements Evaluations

Automatic EvaluationsManual Feedback
Every run, alwaysOptional, when users have opinion
Objective LLM assessmentSubjective human judgment
Scores specific criteriaOverall impression
Drives AI suggestionsAdds context to patterns
Primary improvement methodSupplementary signal
Think of manual feedback as a way to say “I agree/disagree with the automatic eval” or to highlight something the automatic system might have missed.

When to Provide Feedback

If a run scored 85/100 but you think it was actually poor (or vice versa), provide feedback to give corrective signal.
Tone, professionalism, or style preferences that automatic evals might not capture well.
If you notice the same issue across multiple runs, feedback comments help identify the pattern.
“This saved our analyst 2 hours” or “This missed a critical finding” - context the eval can’t measure.

Using Feedback Insights

Feedback is tracked alongside automatic evaluations: On Agent Dashboard:
Last 30 Days
Avg Eval Score: 82/100
User Feedback: 87% thumbs up (45 total)
In Run History:
  • Filter by feedback (show only thumbs up/down runs)
  • Sort by feedback sentiment
  • See feedback comments
For Improvement Analysis:
  • Common themes in thumbs-down comments inform prompt improvements
  • Feedback helps validate that eval-driven changes are working
  • Disagreement between eval scores and feedback indicates calibration needs

Feedback-Driven Improvements

While automatic evaluations drive most improvements, feedback helps identify issues to prioritize: Example Pattern:
5 runs with thumbs down + comments like:
"Too slow"
"Takes forever"
"Can you make this faster?"

Average eval scores: 80/100 (objectively fine)
Action: Even though eval scores are good, clear user dissatisfaction about speed → investigate performance optimization.

Best Practices

Good: “Agent didn’t check user history before concluding”Less useful: “This was bad”
Feedback should take 5 seconds. If a run was clearly good or bad, just click thumbs up/down. Comments optional.
You don’t need to feedback every run. Focus on runs that surprise you (much better or worse than expected).
When you do comment, mention specific issues or wins: “Missed checking VirusTotal” or “Perfect triage, saved me time”.

Feedback Metrics

Track feedback trends over time:

Satisfaction Rate

% thumbs up out of total feedback given

Feedback Volume

How many runs receive feedback (higher = more engagement)

Sentiment Trend

Is satisfaction improving or declining?

Common Themes

Text analysis of comments to identify patterns

Correlation with Evaluations

Monitor how manual feedback aligns with automatic evaluations: Strong Agreement (good):
High eval scores (80+) → mostly thumbs up
Low eval scores (<70) → mostly thumbs down
Disagreement (investigate):
High eval scores → lots of thumbs down
OR
Low eval scores → lots of thumbs up
Disagreement suggests:
  • Evaluation criteria might need adjustment
  • There’s a subjective quality not being measured
  • Users value different things than the eval judges

Example: Feedback in Action

Run #2048 - Automatic Eval: 78/100 User Feedback: 👎 + Comment
"Agent correctly identified the alert as a false positive,
but the explanation was too technical for our tier-1
analysts to understand. Need simpler language."
Result:
  • Automatic eval scored it as “acceptable” (78/100)
  • User feedback highlights a gap: audience-appropriate language
  • This becomes a factor in future prompt improvements

When Feedback Overrides Evals

Manual feedback is particularly valuable when:
  1. Business context matters - “Technically correct but missed our policy”
  2. Audience matters - “Right answer, wrong communication style”
  3. Edge cases - “Eval couldn’t know this is a known false positive”
  4. Performance matters - “Correct but too slow for our SLA”