> ## Documentation Index > Fetch the complete documentation index at: https://docs.cotool.ai/llms.txt > Use this file to discover all available pages before exploring further. # AI-Suggested Improvements > Automatic prompt improvements based on evaluation results When agent runs score poorly on automatic evaluations, Cotool analyzes the issues and suggests specific prompt improvements. You simply review the suggested changes and accept or reject them - no manual prompt writing required. ## How It Works System identifies repeated issues across multiple low-scoring runs Similar problems are grouped together to identify common root causes You see a panel showing detected patterns and a "Generate Diff" button Click the button - AI generates specific prompt changes to fix the issues See side-by-side diff of current vs. improved prompt Apply the changes or reject if they don't fit your needs **You don't write the improvements** - The AI analyzes what went wrong and generates the fix. Your role is to review and approve, not write from scratch. ## Suggested Improvements Panel When available, you'll see a panel below the system prompt with suggested improvements. Here's an example: ``` Issues Found • Agent encountered empty log query results and immediately concluded the alert was noise without investigating why logs were missing • Aent disabled the detection rule globally (enabled: false) rather than implementing targeted tuning or allow lists • Agent marked the issue as 'Done' and took irreversible action without considering data ingestion issues or alternative explanations for missing logs Suggested Change • Require investigation before disabling rules • Add validation for file hash existence • Clarify severity thresholds Citations • Agent Execution: f9c63e37-dfa9-422f-bcc9-d27e36d255ca • Eval Score: 60/100 ``` ## Issue Categories The system automatically detects and suggests fixes for: Agent skipped important investigation or validation steps **Fix**: Add explicit steps to prompt Agent made incorrect conclusions or didn't follow criteria properly **Fix**: Clarify decision criteria and edge cases Wrong tools used, correct tools skipped, or inefficient patterns **Fix**: Add tool usage guidance Missing information, unclear formatting, or insufficient detail **Fix**: Specify required output structure ## Generating the Diff When you click **"Generate Diff"**, the AI: 1. Analyzes the specific issues identified 2. Reviews your current system prompt 3. Generates targeted changes to fix the problems 4. Shows you a side-by-side comparison **Example Diff View**: ```markdown theme={null} ## Your Responsibilities 1. Analyze the alert details 2. Search logs for related activity 3. Determine if alert is valid 4. Update the ticket ``` ```markdown theme={null} ## Your Responsibilities 1. Analyze the alert details and extract key indicators 2. Search logs for related activity - If logs are empty, investigate WHY before concluding - Check data ingestion status - Verify time range and query syntax 3. Determine if alert is valid based on evidence - Do not dismiss alerts solely due to missing logs - Consider alternative data sources 4. Update the ticket with findings and confidence level ``` Changes are **highlighted** in the diff view. Green shows additions, red shows removals, yellow shows modifications. ## Multiple Improvements You may see multiple suggestion cards if several low-scoring runs identified different issues: ``` Suggested Improvements (3) 1. Require investigation before disabling rules Eval: 60/100 | Run: f9c63e37... 2. Add validation for file hash existence Eval: 55/100 | Run: a3d12f89... 3. Clarify severity thresholds Eval: 68/100 | Run: 7bd94c21... ``` You can: * Generate diffs for each individually * Address the lowest-scoring issue first * Reject suggestions that don't apply to your use case ## What Gets Improved **Issue**: Agent skipped checking user history before closing alert **Fix**: "Before determining severity, always: 1) Check user's recent activity..." **Issue**: Agent failed when API returned no data **Fix**: "If VirusTotal returns no data (404), classify as Medium and note that hash is unknown..." **Issue**: Agent mishandled service account activity **Fix**: "If user matches pattern 'svc\_*' or 'service-*', verify activity against scheduled job list..." **Issue**: Agent output was vague: "Alert looks suspicious" **Fix**: "Always include: 1) Specific indicators found, 2) Confidence level (High/Medium/Low), 3) Recommended actions..." ## Best Practices If you see the same issue across multiple runs, accept the suggestion. Repeated patterns indicate a real problem. Use Agent Builder to test the updated prompt with edge cases before relying on it in production. After accepting changes, watch evaluation scores for the next 10-20 runs to confirm improvement. Not every low score needs a prompt change. Sometimes the issue is data quality, API availability, or legitimate edge cases. ## Example: Real Improvement Flow **Run #1834 - Evaluation Score: 60/100** **Issue Detected**: ``` Agent encountered empty Splunk results and immediately marked alert as false positive without investigating why logs were missing. This could miss real threats if log ingestion is delayed. ``` **Generated Diff**: ```diff theme={null} ## Investigation Steps 1. Search Splunk for related activity + - If query returns 0 results, verify: + a) Data ingestion status (check indexer health) + b) Time range (expand if near ingestion delay window) + c) Query syntax (test with broader search) + - Only conclude "no activity" after ruling out data issues 2. Assess findings - - No logs = likely false positive + - No logs = investigate further before concluding + - Document if missing logs prevent full assessment ``` **You**: Review diff, looks good ✅ **Action**: Click "Accept" **Result**: Prompt v5 created, agent now handles empty query results properly **Validation**: Next 15 runs average 82/100 (up from 65/100) ## Manual Prompt Editing You can always edit prompts manually instead of using suggestions: 1. Go to Agent Settings → System Prompt 2. Click "Edit" 3. Make changes directly 4. Save (creates new version) But the AI-suggested flow is faster and often catches issues you might miss.