> ## Documentation Index > Fetch the complete documentation index at: https://docs.cotool.ai/llms.txt > Use this file to discover all available pages before exploring further. # Automatic Evaluations > Every agent run is automatically scored to identify improvement opportunities Every agent execution is automatically evaluated by an LLM judge that scores performance from 0-100. These scores drive automatic improvement suggestions and help you track agent quality over time. ## How It Works Your agent completes a run (triggered by user, schedule, or event) Immediately after completion, an LLM judge analyzes the execution and assigns a score (0-100) Score is logged and displayed in run history Low-scoring runs automatically trigger AI-generated improvement suggestions **No configuration needed** - Evaluations run automatically on every execution. The system learns what "good" looks like based on your agent's goals and past performance. ## What Gets Evaluated The LLM judge assesses multiple dimensions: Did the agent correctly assess the situation and reach the right conclusion? Did it gather all necessary information before deciding? Were tools used appropriately and efficiently? Is the response clear, well-formatted, and actionable? ## Evaluation Scores Scores range from 0-100: * **90-100**: Excellent - Agent performed optimally * **75-89**: Good - Minor improvements possible * **60-74**: Acceptable - Some issues identified * **Below 60**: Needs improvement ## Improvement Workflow When multiple runs show similar low-scoring patterns, the system automatically: 1. **Analyzes the execution** to identify specific issues 2. **Generates improvement suggestions** for your system prompt 3. **Shows "Suggested Improvements" panel** (see AI-Suggested Improvements) 4. **Lets you review and apply fixes** with one click Learn how to review and apply automatic improvement suggestions ## Best Practices When you see low scores, click into the run to understand what went wrong. The evaluation breakdown shows exactly what needs improvement. Don't ignore the "Suggested Improvements" panel. Most issues can be fixed by accepting the AI-generated prompt changes. One low score isn't a problem. Look for patterns - if average score is declining or consistently low for certain scenarios, take action. Evaluations are automatic, but you can also provide thumbs up/down feedback to give additional signal about what's working. ## What Makes a Good Score? **Don't aim for perfect 100s** - Agents don't need to be perfect, they need to be useful. A consistent 80-90 range usually indicates a well-tuned agent. Focus on: * **Consistency**: Is the agent reliable across different scenarios? * **Improvement**: Are scores trending upward after applying suggestions? * **User satisfaction**: Do evaluation scores align with user feedback?