Skip to main content
Acceptance criteria let you define specific conditions each agent run must satisfy. After every run, the Cotool Eval Harness evaluates each criterion individually and reports which passed and which failed. When any criterion is not met, you can automatically send notifications to Slack, webhooks, or PagerDuty.

How It Works

Criteria you define are injected into the agent’s system prompt so the agent actively tries to satisfy them. After every run, the Cotool Eval Harness independently grades each criterion as met or not met and explains why. Failed criteria can optionally trigger notifications to Slack, webhooks, or PagerDuty.

Adding Acceptance Criteria

  1. Navigate to your agent’s details page
  2. Click Edit
  3. Scroll to the Acceptance Criteria section
  4. Add each criterion as a clear, evaluable statement
  5. Click Save Changes
Write criteria as specific, observable conditions the Cotool Eval Harness can verify from the run transcript. Vague criteria like “be helpful” produce unreliable results.

Good vs. Poor Criteria

GoodPoor
”Check EDR telemetry before closing an alert as false positive""Investigate alerts properly"
"Never disable a detection rule without documenting justification""Be careful with rule changes"
"Correlate across at least two log sources before determining scope""Use tools correctly"
"Escalate to Tier 2 if lateral movement indicators are present""Don’t miss threats”

Viewing Results

After each run, acceptance criteria results appear in two places:
  • Eval score popover — click the evaluation score badge on any run to see which criteria passed or failed, along with the judge’s explanation for each
  • Run issues warning — a warning icon appears next to runs that have failed criteria, combined with any critical issues the judge identified

Notifications

Acceptance criteria failures can trigger notifications through the same output destinations used for agent outputs (Slack, webhooks, PagerDuty). Notification configuration is independent from regular output delivery — you can enable one without the other.

Setting Up Notifications

  1. Click Edit on the agent details page
  2. In the Acceptance Criteria section, toggle Notify on failure
  3. Select one or more destinations from the dropdown (or create a new one)
  4. Click Save Changes
Destinations are shared across your organization. A Slack channel or webhook created for output delivery can also be used for acceptance criteria notifications.

Notification Format

Slack messages include the agent name (linked to the run), a timestamp, a pass/fail summary, and each failed criterion with the judge’s explanation. Webhooks receive a JSON payload:
{
  "organizationId": "...",
  "agentId": "...",
  "agentName": "My Agent",
  "runId": "...",
  "runTimestamp": "2026-03-25T00:28:23.667Z",
  "failedCriteria": [
    {
      "criterion": "Provide a severity rating",
      "explanation": "The agent responded without including any severity classification"
    }
  ]
}
PagerDuty creates an alert with the failed criteria count in the summary and individual criteria details in custom fields.

Best Practices

Begin with the most important conditions and expand over time. Too many criteria can dilute signal and slow down the feedback loop.
Each criterion should be something the judge can clearly confirm or deny from the run transcript. “Always include a confidence score” is falsifiable; “produce good output” is not.
If a prompt change fixes a recurring issue, add a criterion that encodes the fix (e.g. “Never close an alert without checking log ingestion status”). This prevents the behavior from regressing silently.
When criteria consistently fail, check the Suggested Improvements panel. The system may already have a prompt fix ready for you.

Limits

  • Maximum 20 criteria per agent
  • Each criterion can be up to 500 characters