> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cotool.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Manual Feedback

> Supplement automatic evaluations with manual user feedback

While every agent run is automatically evaluated, you can also provide manual thumbs up/down feedback to give additional signal about what's working. This supplements the automatic evaluation system with human judgment.

<Note>
  **Manual feedback is optional** - Automatic evaluations already drive improvement suggestions. Use manual feedback when you want to provide additional context or when the automatic score doesn't capture the full picture.
</Note>

## How It Works

After each agent run, users can:

* 👍 **Thumbs up** - Good execution, agent performed well
* 👎 **Thumbs down** - Poor execution, something went wrong
* 💬 **Optional comment** - Explain what was good or bad

## Where to Provide Feedback

Feedback options appear in multiple places:

**Agent Execution Detail Page**:

* Click into any run from agent history
* Thumbs up/down buttons at the top
* Comment field for additional context

**Slack Bot Responses** (if Slack trigger enabled):

* React with 👍 or 👎 emoji
* Reply in thread for comments

**Email Responses** (if email trigger enabled):

* Reply with "+1" (positive) or "-1" (negative)
* Include explanation in email body

**API Responses**:

* Include feedback via API when invoking programmatically
* See API documentation for details

## How Feedback Complements Evaluations

| Automatic Evaluations      | Manual Feedback                   |
| -------------------------- | --------------------------------- |
| Every run, always          | Optional, when users have opinion |
| Objective LLM assessment   | Subjective human judgment         |
| Scores specific criteria   | Overall impression                |
| Drives AI suggestions      | Adds context to patterns          |
| Primary improvement method | Supplementary signal              |

<Note>
  Think of manual feedback as a way to say "I agree/disagree with the automatic eval" or to highlight something the automatic system might have missed.
</Note>

## When to Provide Feedback

<AccordionGroup>
  <Accordion title="When Auto-Eval Seems Wrong" icon="scale-unbalanced">
    If a run scored 85/100 but you think it was actually poor (or vice versa), provide feedback to give corrective signal.
  </Accordion>

  <Accordion title="For Subjective Qualities" icon="palette">
    Tone, professionalism, or style preferences that automatic evals might not capture well.
  </Accordion>

  <Accordion title="To Highlight Patterns" icon="repeat">
    If you notice the same issue across multiple runs, feedback comments help identify the pattern.
  </Accordion>

  <Accordion title="For Business Impact" icon="briefcase">
    "This saved our analyst 2 hours" or "This missed a critical finding" - context the eval can't measure.
  </Accordion>
</AccordionGroup>

## Using Feedback Insights

Feedback is tracked alongside automatic evaluations:

**On Agent Dashboard**:

```
Last 30 Days
Avg Eval Score: 82/100
User Feedback: 87% thumbs up (45 total)
```

**In Run History**:

* Filter by feedback (show only thumbs up/down runs)
* Sort by feedback sentiment
* See feedback comments

**For Improvement Analysis**:

* Common themes in thumbs-down comments inform prompt improvements
* Feedback helps validate that eval-driven changes are working
* Disagreement between eval scores and feedback indicates calibration needs

## Feedback-Driven Improvements

While automatic evaluations drive most improvements, feedback helps identify issues to prioritize:

**Example Pattern**:

```
5 runs with thumbs down + comments like:
"Too slow"
"Takes forever"
"Can you make this faster?"

Average eval scores: 80/100 (objectively fine)
```

**Action**: Even though eval scores are good, clear user dissatisfaction about speed → investigate performance optimization.

## Best Practices

<AccordionGroup>
  <Accordion title="Focus on Actionable Feedback">
    **Good**: "Agent didn't check user history before concluding"

    **Less useful**: "This was bad"
  </Accordion>

  <Accordion title="Don't Over-Think It">
    Feedback should take 5 seconds. If a run was clearly good or bad, just click thumbs up/down. Comments optional.
  </Accordion>

  <Accordion title="Use for Exceptions">
    You don't need to feedback every run. Focus on runs that surprise you (much better or worse than expected).
  </Accordion>

  <Accordion title="Be Specific in Comments">
    When you do comment, mention specific issues or wins: "Missed checking VirusTotal" or "Perfect triage, saved me time".
  </Accordion>
</AccordionGroup>

## Feedback Metrics

Track feedback trends over time:

<CardGroup cols={2}>
  <Card title="Satisfaction Rate" icon="percentage">
    % thumbs up out of total feedback given
  </Card>

  <Card title="Feedback Volume" icon="chart-line">
    How many runs receive feedback (higher = more engagement)
  </Card>

  <Card title="Sentiment Trend" icon="arrow-trend-up">
    Is satisfaction improving or declining?
  </Card>

  <Card title="Common Themes" icon="tags">
    Text analysis of comments to identify patterns
  </Card>
</CardGroup>

## Correlation with Evaluations

Monitor how manual feedback aligns with automatic evaluations:

**Strong Agreement** (good):

```
High eval scores (80+) → mostly thumbs up
Low eval scores (<70) → mostly thumbs down
```

**Disagreement** (investigate):

```
High eval scores → lots of thumbs down
OR
Low eval scores → lots of thumbs up
```

Disagreement suggests:

* Evaluation criteria might need adjustment
* There's a subjective quality not being measured
* Users value different things than the eval judges

## Example: Feedback in Action

**Run #2048 - Automatic Eval: 78/100**

**User Feedback**: 👎 + Comment

```
"Agent correctly identified the alert as a false positive,
but the explanation was too technical for our tier-1
analysts to understand. Need simpler language."
```

**Result**:

* Automatic eval scored it as "acceptable" (78/100)
* User feedback highlights a gap: audience-appropriate language
* This becomes a factor in future prompt improvements

## When Feedback Overrides Evals

Manual feedback is particularly valuable when:

1. **Business context matters** - "Technically correct but missed our policy"
2. **Audience matters** - "Right answer, wrong communication style"
3. **Edge cases** - "Eval couldn't know this is a known false positive"
4. **Performance matters** - "Correct but too slow for our SLA"
