> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cotool.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AI-Suggested Improvements

> Automatic prompt improvements based on evaluation results

When agent runs score poorly on automatic evaluations, Cotool analyzes the issues and suggests specific prompt improvements. You simply review the suggested changes and accept or reject them - no manual prompt writing required.

## How It Works

<Steps>
  <Step title="Pattern Detection">
    System identifies repeated issues across multiple low-scoring runs
  </Step>

  <Step title="Issue Clustering">
    Similar problems are grouped together to identify common root causes
  </Step>

  <Step title="Suggested Improvements Panel">
    You see a panel showing detected patterns and a "Generate Diff" button
  </Step>

  <Step title="Generate Diff">
    Click the button - AI generates specific prompt changes to fix the issues
  </Step>

  <Step title="Review Changes">
    See side-by-side diff of current vs. improved prompt
  </Step>

  <Step title="Accept or Reject">
    Apply the changes or reject if they don't fit your needs
  </Step>
</Steps>

<Note>
  **You don't write the improvements** - The AI analyzes what went wrong and generates the fix. Your role is to review and approve, not write from scratch.
</Note>

## Suggested Improvements Panel

When available, you'll see a panel below the system prompt with suggested improvements. Here's an example:

```
Issues Found
• Agent encountered empty log query results and immediately concluded the alert was noise without investigating why logs were missing
• Aent disabled the detection rule globally (enabled: false) rather than implementing targeted tuning or allow lists
• Agent marked the issue as 'Done' and took irreversible action without considering data ingestion issues or alternative explanations for missing logs

Suggested Change
• Require investigation before disabling rules
• Add validation for file hash existence
• Clarify severity thresholds

Citations
• Agent Execution: f9c63e37-dfa9-422f-bcc9-d27e36d255ca
• Eval Score: 60/100
```

## Issue Categories

The system automatically detects and suggests fixes for:

<CardGroup cols={2}>
  <Card title="Missing Steps" icon="list-ul">
    Agent skipped important investigation or validation steps

    **Fix**: Add explicit steps to prompt
  </Card>

  <Card title="Poor Decision Logic" icon="scale-balanced">
    Agent made incorrect conclusions or didn't follow criteria properly

    **Fix**: Clarify decision criteria and edge cases
  </Card>

  <Card title="Tool Misuse" icon="wrench">
    Wrong tools used, correct tools skipped, or inefficient patterns

    **Fix**: Add tool usage guidance
  </Card>

  <Card title="Output Issues" icon="file-lines">
    Missing information, unclear formatting, or insufficient detail

    **Fix**: Specify required output structure
  </Card>
</CardGroup>

## Generating the Diff

When you click **"Generate Diff"**, the AI:

1. Analyzes the specific issues identified
2. Reviews your current system prompt
3. Generates targeted changes to fix the problems
4. Shows you a side-by-side comparison

**Example Diff View**:

<Tabs>
  <Tab title="Before (Current)">
    ```markdown theme={null}
    ## Your Responsibilities
    1. Analyze the alert details
    2. Search logs for related activity
    3. Determine if alert is valid
    4. Update the ticket
    ```
  </Tab>

  <Tab title="After (Suggested)">
    ```markdown theme={null}
    ## Your Responsibilities
    1. Analyze the alert details and extract key indicators
    2. Search logs for related activity
       - If logs are empty, investigate WHY before concluding
       - Check data ingestion status
       - Verify time range and query syntax
    3. Determine if alert is valid based on evidence
       - Do not dismiss alerts solely due to missing logs
       - Consider alternative data sources
    4. Update the ticket with findings and confidence level
    ```
  </Tab>
</Tabs>

<Note>
  Changes are **highlighted** in the diff view. Green shows additions, red shows removals, yellow shows modifications.
</Note>

## Multiple Improvements

You may see multiple suggestion cards if several low-scoring runs identified different issues:

```
Suggested Improvements (3)

1. Require investigation before disabling rules
   Eval: 60/100 | Run: f9c63e37...

2. Add validation for file hash existence  
   Eval: 55/100 | Run: a3d12f89...

3. Clarify severity thresholds
   Eval: 68/100 | Run: 7bd94c21...
```

You can:

* Generate diffs for each individually
* Address the lowest-scoring issue first
* Reject suggestions that don't apply to your use case

## What Gets Improved

<AccordionGroup>
  <Accordion title="Investigation Completeness">
    **Issue**: Agent skipped checking user history before closing alert

    **Fix**: "Before determining severity, always: 1) Check user's recent activity..."
  </Accordion>

  <Accordion title="Error Handling">
    **Issue**: Agent failed when API returned no data

    **Fix**: "If VirusTotal returns no data (404), classify as Medium and note that hash is unknown..."
  </Accordion>

  <Accordion title="Edge Case Handling">
    **Issue**: Agent mishandled service account activity

    **Fix**: "If user matches pattern 'svc\_*' or 'service-*', verify activity against scheduled job list..."
  </Accordion>

  <Accordion title="Output Specificity">
    **Issue**: Agent output was vague: "Alert looks suspicious"

    **Fix**: "Always include: 1) Specific indicators found, 2) Confidence level (High/Medium/Low), 3) Recommended actions..."
  </Accordion>
</AccordionGroup>

## Best Practices

<AccordionGroup>
  <Accordion title="Act on Repeated Issues">
    If you see the same issue across multiple runs, accept the suggestion. Repeated patterns indicate a real problem.
  </Accordion>

  <Accordion title="Test After Accepting">
    Use Agent Builder to test the updated prompt with edge cases before relying on it in production.
  </Accordion>

  <Accordion title="Monitor Impact">
    After accepting changes, watch evaluation scores for the next 10-20 runs to confirm improvement.
  </Accordion>

  <Accordion title="Don't Over-Optimize">
    Not every low score needs a prompt change. Sometimes the issue is data quality, API availability, or legitimate edge cases.
  </Accordion>
</AccordionGroup>

## Example: Real Improvement Flow

**Run #1834 - Evaluation Score: 60/100**

**Issue Detected**:

```
Agent encountered empty Splunk results and immediately 
marked alert as false positive without investigating why 
logs were missing. This could miss real threats if log 
ingestion is delayed.
```

**Generated Diff**:

```diff theme={null}
## Investigation Steps
1. Search Splunk for related activity
+  - If query returns 0 results, verify:
+    a) Data ingestion status (check indexer health)
+    b) Time range (expand if near ingestion delay window)
+    c) Query syntax (test with broader search)
+  - Only conclude "no activity" after ruling out data issues

2. Assess findings
-  - No logs = likely false positive
+  - No logs = investigate further before concluding
+  - Document if missing logs prevent full assessment
```

**You**: Review diff, looks good ✅

**Action**: Click "Accept"

**Result**: Prompt v5 created, agent now handles empty query results properly

**Validation**: Next 15 runs average 82/100 (up from 65/100)

## Manual Prompt Editing

You can always edit prompts manually instead of using suggestions:

1. Go to Agent Settings → System Prompt
2. Click "Edit"
3. Make changes directly
4. Save (creates new version)

But the AI-suggested flow is faster and often catches issues you might miss.
