agent-orchestration-improve-agent
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
Agent Performance Optimization Workflow
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
[Extended thinking: Agent optimization requires a data-driven approach combining performance metrics, user feedback analysis, and advanced prompt engineering techniques. Success depends on systematic evaluation, targeted improvements, and rigorous testing with rollback capabilities for production safety.]
Use this skill when
Do not use this skill when
Instructions
Safety
Phase 1: Performance Analysis and Baseline Metrics
Comprehensive analysis of agent performance using context-manager for historical data collection.
1.1 Gather Performance Data
Use: context-manager
Command: analyze-agent-performance $ARGUMENTS --days 30Collect metrics including:
1.2 User Feedback Pattern Analysis
Identify recurring patterns in user interactions:
1.3 Failure Mode Classification
Categorize failures by root cause:
1.4 Baseline Performance Report
Generate quantitative baseline metrics:
Performance Baseline:
Task Success Rate: [X%]
Average Corrections per Task: [Y]
Tool Call Efficiency: [Z%]
User Satisfaction Score: [1-10]
Average Response Latency: [Xms]
Token Efficiency Ratio: [X:Y] Phase 2: Prompt Engineering Improvements
Apply advanced prompt optimization techniques using prompt-engineer agent.
2.1 Chain-of-Thought Enhancement
Implement structured reasoning patterns:
Use: prompt-engineer
Technique: chain-of-thought-optimization2.2 Few-Shot Example Optimization
Curate high-quality examples from successful interactions:
Example structure:
Good Example:
Input: [User request]
Reasoning: [Step-by-step thought process]
Output: [Successful response]
Why this works: [Key success factors]Bad Example:
Input: [Similar request]
Output: [Failed response]
Why this fails: [Specific issues]
Correct approach: [Fixed version]
2.3 Role Definition Refinement
Strengthen agent identity and capabilities:
2.4 Constitutional AI Integration
Implement self-correction mechanisms:
Constitutional Principles:
Verify factual accuracy before responding
Self-check for potential biases or harmful content
Validate output format matches requirements
Ensure response completeness
Maintain consistency with previous responses Add critique-and-revise loops:
2.5 Output Format Tuning
Optimize response structure:
Phase 3: Testing and Validation
Comprehensive testing framework with A/B comparison.
3.1 Test Suite Development
Create representative test scenarios:
Test Categories:
Golden path scenarios (common successful cases)
Previously failed tasks (regression testing)
Edge cases and corner scenarios
Stress tests (complex, multi-step tasks)
Adversarial inputs (potential breaking points)
Cross-domain tasks (combining capabilities) 3.2 A/B Testing Framework
Compare original vs improved agent:
Use: parallel-test-runner
Config:
- Agent A: Original version
- Agent B: Improved version
- Test set: 100 representative tasks
- Metrics: Success rate, speed, token usage
- Evaluation: Blind human review + automated scoringStatistical significance testing:
3.3 Evaluation Metrics
Comprehensive scoring framework:
Task-Level Metrics:
Quality Metrics:
Performance Metrics:
3.4 Human Evaluation Protocol
Structured human review process:
Phase 4: Version Control and Deployment
Safe rollout with monitoring and rollback capabilities.
4.1 Version Management
Systematic versioning strategy:
Version Format: agent-name-v[MAJOR].[MINOR].[PATCH]
Example: customer-support-v2.3.1MAJOR: Significant capability changes
MINOR: Prompt improvements, new examples
PATCH: Bug fixes, minor adjustments
Maintain version history:
4.2 Staged Rollout
Progressive deployment strategy:
4.3 Rollback Procedures
Quick recovery mechanism:
Rollback Triggers:
Success rate drops >10% from baseline
Critical errors increase >5%
User complaints spike
Cost per task increases >20%
Safety violations detected Rollback Process:
Detect issue via monitoring
Alert team immediately
Switch to previous stable version
Analyze root cause
Fix and re-test before retry 4.4 Continuous Monitoring
Real-time performance tracking:
Success Criteria
Agent improvement is successful when:
Post-Deployment Review
After 30 days of production use:
Continuous Improvement Cycle
Establish regular improvement cadence:
Remember: Agent optimization is an iterative process. Each cycle builds upon previous learnings, gradually improving performance while maintaining stability and safety.