AI-Suggested Improvements

When agent runs score poorly on automatic evaluations, Cotool analyzes the issues and suggests specific prompt improvements. You simply review the suggested changes and accept or reject them - no manual prompt writing required.

How It Works

Pattern Detection

System identifies repeated issues across multiple low-scoring runs

Issue Clustering

Similar problems are grouped together to identify common root causes

Suggested Improvements Panel

You see a panel showing detected patterns and a “Generate Diff” button

Generate Diff

Click the button - AI generates specific prompt changes to fix the issues

Review Changes

See side-by-side diff of current vs. improved prompt

Accept or Reject

Apply the changes or reject if they don’t fit your needs

You don’t write the improvements - The AI analyzes what went wrong and generates the fix. Your role is to review and approve, not write from scratch.

Suggested Improvements Panel

When available, you’ll see a panel below the system prompt with suggested improvements. Here’s an example:

Issues Found
• Agent encountered empty log query results and immediately concluded the alert was noise without investigating why logs were missing
• Aent disabled the detection rule globally (enabled: false) rather than implementing targeted tuning or allow lists
• Agent marked the issue as 'Done' and took irreversible action without considering data ingestion issues or alternative explanations for missing logs

Suggested Change
• Require investigation before disabling rules
• Add validation for file hash existence
• Clarify severity thresholds

Citations
• Agent Execution: f9c63e37-dfa9-422f-bcc9-d27e36d255ca
• Eval Score: 60/100

Issue Categories

The system automatically detects and suggests fixes for:

Missing Steps

Agent skipped important investigation or validation stepsFix: Add explicit steps to prompt

Poor Decision Logic

Agent made incorrect conclusions or didn’t follow criteria properlyFix: Clarify decision criteria and edge cases

Tool Misuse

Wrong tools used, correct tools skipped, or inefficient patternsFix: Add tool usage guidance

Output Issues

Missing information, unclear formatting, or insufficient detailFix: Specify required output structure

Generating the Diff

When you click “Generate Diff”, the AI:

Analyzes the specific issues identified
Reviews your current system prompt
Generates targeted changes to fix the problems
Shows you a side-by-side comparison

Example Diff View:

Before (Current)
After (Suggested)

## Your Responsibilities
Analyze the alert details
Search logs for related activity
Determine if alert is valid
Update the ticket

## Your Responsibilities
1. Analyze the alert details and extract key indicators
2. Search logs for related activity
   - If logs are empty, investigate WHY before concluding
   - Check data ingestion status
   - Verify time range and query syntax
3. Determine if alert is valid based on evidence
   - Do not dismiss alerts solely due to missing logs
   - Consider alternative data sources
4. Update the ticket with findings and confidence level

Changes are highlighted in the diff view. Green shows additions, red shows removals, yellow shows modifications.

Multiple Improvements

You may see multiple suggestion cards if several low-scoring runs identified different issues:

Suggested Improvements (3)

1. Require investigation before disabling rules
   Eval: 60/100 | Run: f9c63e37...

2. Add validation for file hash existence  
   Eval: 55/100 | Run: a3d12f89...

3. Clarify severity thresholds
   Eval: 68/100 | Run: 7bd94c21...

You can:

Generate diffs for each individually
Address the lowest-scoring issue first
Reject suggestions that don’t apply to your use case

What Gets Improved

Investigation Completeness

Issue: Agent skipped checking user history before closing alertFix: “Before determining severity, always: 1) Check user’s recent activity…”

Error Handling

Issue: Agent failed when API returned no dataFix: “If VirusTotal returns no data (404), classify as Medium and note that hash is unknown…”

Edge Case Handling

Issue: Agent mishandled service account activityFix: “If user matches pattern ‘svc_’ or ‘service-’, verify activity against scheduled job list…”

Output Specificity

Issue: Agent output was vague: “Alert looks suspicious”Fix: “Always include: 1) Specific indicators found, 2) Confidence level (High/Medium/Low), 3) Recommended actions…”

Best Practices

Act on Repeated Issues

If you see the same issue across multiple runs, accept the suggestion. Repeated patterns indicate a real problem.

Test After Accepting

Use Agent Builder to test the updated prompt with edge cases before relying on it in production.

Monitor Impact

After accepting changes, watch evaluation scores for the next 10-20 runs to confirm improvement.

Don't Over-Optimize

Not every low score needs a prompt change. Sometimes the issue is data quality, API availability, or legitimate edge cases.

Example: Real Improvement Flow

Run #1834 - Evaluation Score: 60/100 Issue Detected:

Agent encountered empty Splunk results and immediately 
marked alert as false positive without investigating why 
logs were missing. This could miss real threats if log 
ingestion is delayed.

Generated Diff:

## Investigation Steps
1. Search Splunk for related activity
+  - If query returns 0 results, verify:
+    a) Data ingestion status (check indexer health)
+    b) Time range (expand if near ingestion delay window)
+    c) Query syntax (test with broader search)
+  - Only conclude "no activity" after ruling out data issues

2. Assess findings
-  - No logs = likely false positive
+  - No logs = investigate further before concluding
+  - Document if missing logs prevent full assessment

You: Review diff, looks good ✅ Action: Click “Accept” Result: Prompt v5 created, agent now handles empty query results properly Validation: Next 15 runs average 82/100 (up from 65/100)

Manual Prompt Editing

You can always edit prompts manually instead of using suggestions:

Go to Agent Settings → System Prompt
Click “Edit”
Make changes directly
Save (creates new version)

But the AI-suggested flow is faster and often catches issues you might miss.

Getting Started

Core Concepts

Chat

Agents

Detections

Improving Agents

Settings

How It Works

Suggested Improvements Panel

Issue Categories

Missing Steps

Poor Decision Logic

Tool Misuse

Output Issues

Generating the Diff

Multiple Improvements

What Gets Improved

Best Practices

Example: Real Improvement Flow

Manual Prompt Editing

Getting Started

Core Concepts

Chat

Agents

Detections

Improving Agents

Settings

​How It Works

​Suggested Improvements Panel

​Issue Categories

Missing Steps

Poor Decision Logic

Tool Misuse

Output Issues

​Generating the Diff

​Multiple Improvements

​What Gets Improved

​Best Practices

​Example: Real Improvement Flow

​Manual Prompt Editing

How It Works

Suggested Improvements Panel

Issue Categories

Generating the Diff

Multiple Improvements

What Gets Improved

Best Practices

Example: Real Improvement Flow

Manual Prompt Editing