How It Works
1
Pattern Detection
System identifies repeated issues across multiple low-scoring runs
2
Issue Clustering
Similar problems are grouped together to identify common root causes
3
Suggested Improvements Panel
You see a panel showing detected patterns and a “Generate Diff” button
4
Generate Diff
Click the button - AI generates specific prompt changes to fix the issues
5
Review Changes
See side-by-side diff of current vs. improved prompt
6
Accept or Reject
Apply the changes or reject if they don’t fit your needs
You don’t write the improvements - The AI analyzes what went wrong and generates the fix. Your role is to review and approve, not write from scratch.
Suggested Improvements Panel
When available, you’ll see a panel below the system prompt with suggested improvements. Here’s an example:Issue Categories
The system automatically detects and suggests fixes for:Missing Steps
Agent skipped important investigation or validation stepsFix: Add explicit steps to prompt
Poor Decision Logic
Agent made incorrect conclusions or didn’t follow criteria properlyFix: Clarify decision criteria and edge cases
Tool Misuse
Wrong tools used, correct tools skipped, or inefficient patternsFix: Add tool usage guidance
Output Issues
Missing information, unclear formatting, or insufficient detailFix: Specify required output structure
Generating the Diff
When you click “Generate Diff”, the AI:- Analyzes the specific issues identified
- Reviews your current system prompt
- Generates targeted changes to fix the problems
- Shows you a side-by-side comparison
- Before (Current)
- After (Suggested)
Changes are highlighted in the diff view. Green shows additions, red shows removals, yellow shows modifications.
Multiple Improvements
You may see multiple suggestion cards if several low-scoring runs identified different issues:- Generate diffs for each individually
- Address the lowest-scoring issue first
- Reject suggestions that don’t apply to your use case
What Gets Improved
Investigation Completeness
Investigation Completeness
Issue: Agent skipped checking user history before closing alertFix: “Before determining severity, always: 1) Check user’s recent activity…”
Error Handling
Error Handling
Issue: Agent failed when API returned no dataFix: “If VirusTotal returns no data (404), classify as Medium and note that hash is unknown…”
Edge Case Handling
Edge Case Handling
Issue: Agent mishandled service account activityFix: “If user matches pattern ‘svc_’ or ‘service-’, verify activity against scheduled job list…”
Output Specificity
Output Specificity
Issue: Agent output was vague: “Alert looks suspicious”Fix: “Always include: 1) Specific indicators found, 2) Confidence level (High/Medium/Low), 3) Recommended actions…”
Best Practices
Act on Repeated Issues
Act on Repeated Issues
If you see the same issue across multiple runs, accept the suggestion. Repeated patterns indicate a real problem.
Test After Accepting
Test After Accepting
Use Agent Builder to test the updated prompt with edge cases before relying on it in production.
Monitor Impact
Monitor Impact
After accepting changes, watch evaluation scores for the next 10-20 runs to confirm improvement.
Don't Over-Optimize
Don't Over-Optimize
Not every low score needs a prompt change. Sometimes the issue is data quality, API availability, or legitimate edge cases.
Example: Real Improvement Flow
Run #1834 - Evaluation Score: 60/100 Issue Detected:Manual Prompt Editing
You can always edit prompts manually instead of using suggestions:- Go to Agent Settings → System Prompt
- Click “Edit”
- Make changes directly
- Save (creates new version)