agent-orchestration-improve-agent
通过对现有智能体进行性能分析、提示工程优化以及持续迭代,实现系统性的性能提升。
Agent Performance Optimization Workflow
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
[Extended thinking: Agent optimization requires a data-driven approach combining performance metrics, user feedback analysis, and advanced prompt engineering techniques. Success depends on systematic evaluation, targeted improvements, and rigorous testing with rollback capabilities for production safety.]
Use this skill when
Do not use this skill when
Instructions
Safety
Phase 1: Performance Analysis and Baseline Metrics
Comprehensive analysis of agent performance using context-manager for historical data collection.
1.1 Gather Performance Data
Use: context-manager
Command: analyze-agent-performance $ARGUMENTS --days 30Collect metrics including:
1.2 User Feedback Pattern Analysis
Identify recurring patterns in user interactions:
1.3 Failure Mode Classification
Categorize failures by root cause:
1.4 Baseline Performance Report
Generate quantitative baseline metrics:
Performance Baseline:
Task Success Rate: [X%]
Average Corrections per Task: [Y]
Tool Call Efficiency: [Z%]
User Satisfaction Score: [1-10]
Average Response Latency: [Xms]
Token Efficiency Ratio: [X:Y] Phase 2: Prompt Engineering Improvements
Apply advanced prompt optimization techniques using prompt-engineer agent.
2.1 Chain-of-Thought Enhancement
Implement structured reasoning patterns:
Use: prompt-engineer
Technique: chain-of-thought-optimization2.2 Few-Shot Example Optimization
Curate high-quality examples from successful interactions:
Example structure:
Good Example:
Input: [User request]
Reasoning: [Step-by-step thought process]
Output: [Successful response]
Why this works: [Key success factors]Bad Example:
Input: [Similar request]
Output: [Failed response]
Why this fails: [Specific issues]
Correct approach: [Fixed version]
2.3 Role Definition Refinement
Strengthen agent identity and capabilities:
2.4 Constitutional AI Integration
Implement self-correction mechanisms:
Constitutional Principles:
Verify factual accuracy before responding
Self-check for potential biases or harmful content
Validate output format matches requirements
Ensure response completeness
Maintain consistency with previous responses Add critique-and-revise loops:
2.5 Output Format Tuning
Optimize response structure:
Phase 3: Testing and Validation
Comprehensive testing framework with A/B comparison.
3.1 Test Suite Development
Create representative test scenarios:
Test Categories:
Golden path scenarios (common successful cases)
Previously failed tasks (regression testing)
Edge cases and corner scenarios
Stress tests (complex, multi-step tasks)
Adversarial inputs (potential breaking points)
Cross-domain tasks (combining capabilities) 3.2 A/B Testing Framework
Compare original vs improved agent:
Use: parallel-test-runner
Config:
- Agent A: Original version
- Agent B: Improved version
- Test set: 100 representative tasks
- Metrics: Success rate, speed, token usage
- Evaluation: Blind human review + automated scoringStatistical significance testing:
3.3 Evaluation Metrics
Comprehensive scoring framework:
Task-Level Metrics:
Quality Metrics:
Performance Metrics:
3.4 Human Evaluation Protocol
Structured human review process:
Phase 4: Version Control and Deployment
Safe rollout with monitoring and rollback capabilities.
4.1 Version Management
Systematic versioning strategy:
Version Format: agent-name-v[MAJOR].[MINOR].[PATCH]
Example: customer-support-v2.3.1MAJOR: Significant capability changes
MINOR: Prompt improvements, new examples
PATCH: Bug fixes, minor adjustments
Maintain version history:
4.2 Staged Rollout
Progressive deployment strategy:
4.3 Rollback Procedures
Quick recovery mechanism:
Rollback Triggers:
Success rate drops >10% from baseline
Critical errors increase >5%
User complaints spike
Cost per task increases >20%
Safety violations detected Rollback Process:
Detect issue via monitoring
Alert team immediately
Switch to previous stable version
Analyze root cause
Fix and re-test before retry 4.4 Continuous Monitoring
Real-time performance tracking:
Success Criteria
Agent improvement is successful when:
Post-Deployment Review
After 30 days of production use:
Continuous Improvement Cycle
Establish regular improvement cadence:
Remember: Agent optimization is an iterative process. Each cycle builds upon previous learnings, gradually improving performance while maintaining stability and safety.