Workflow Orchestration Patterns
Master workflow orchestration architecture with Temporal, covering fundamental design decisions, resilience patterns, and best practices for building reliable distributed systems.
Use this skill when
Working on workflow orchestration patterns tasks or workflowsNeeding guidance, best practices, or checklists for workflow orchestration patternsDo not use this skill when
The task is unrelated to workflow orchestration patternsYou need a different domain or tool outside this scopeInstructions
Clarify goals, constraints, and required inputs.Apply relevant best practices and validate outcomes.Provide actionable steps and verification.If detailed examples are required, open resources/implementation-playbook.md.When to Use Workflow Orchestration
Ideal Use Cases (Source: docs.temporal.io)
Multi-step processes spanning machines/services/databasesDistributed transactions requiring all-or-nothing semanticsLong-running workflows (hours to years) with automatic state persistenceFailure recovery that must resume from last successful stepBusiness processes: bookings, orders, campaigns, approvalsEntity lifecycle management: inventory tracking, account management, cart workflowsInfrastructure automation: CI/CD pipelines, provisioning, deploymentsHuman-in-the-loop systems requiring timeouts and escalationsWhen NOT to Use
Simple CRUD operations (use direct API calls)Pure data processing pipelines (use Airflow, batch processing)Stateless request/response (use standard APIs)Real-time streaming (use Kafka, event processors)Critical Design Decision: Workflows vs Activities
The Fundamental Rule (Source: temporal.io/blog/workflow-engine-principles):
Workflows = Orchestration logic and decision-makingActivities = External interactions (APIs, databases, network calls)Workflows (Orchestration)
Characteristics:
Contain business logic and coordinationMUST be deterministic (same inputs → same outputs)Cannot perform direct external callsState automatically preserved across failuresCan run for years despite infrastructure failuresExample workflow tasks:
Decide which steps to executeHandle compensation logicManage timeouts and retriesCoordinate child workflowsActivities (External Interactions)
Characteristics:
Handle all external system interactionsCan be non-deterministic (API calls, DB writes)Include built-in timeouts and retry logicMust be idempotent (calling N times = calling once)Short-lived (seconds to minutes typically)Example activity tasks:
Call payment gateway APIWrite to databaseSend emails or notificationsQuery external servicesDesign Decision Framework
Does it touch external systems? → Activity
Is it orchestration/decision logic? → Workflow
Core Workflow Patterns
1. Saga Pattern with Compensation
Purpose: Implement distributed transactions with rollback capability
Pattern (Source: temporal.io/blog/compensating-actions-part-of-a-complete-breakfast-with-sagas):
For each step:
1. Register compensation BEFORE executing
2. Execute the step (via activity)
3. On failure, run all compensations in reverse order (LIFO)
Example: Payment Workflow
Reserve inventory (compensation: release inventory)Charge payment (compensation: refund payment)Fulfill order (compensation: cancel fulfillment)Critical Requirements:
Compensations must be idempotentRegister compensation BEFORE executing stepRun compensations in reverse orderHandle partial failures gracefully2. Entity Workflows (Actor Model)
Purpose: Long-lived workflow representing single entity instance
Pattern (Source: docs.temporal.io/evaluate/use-cases-design-patterns):
One workflow execution = one entity (cart, account, inventory item)Workflow persists for entity lifetimeReceives signals for state changesSupports queries for current stateExample Use Cases:
Shopping cart (add items, checkout, expiration)Bank account (deposits, withdrawals, balance checks)Product inventory (stock updates, reservations)Benefits:
Encapsulates entity behaviorGuarantees consistency per entityNatural event sourcing3. Fan-Out/Fan-In (Parallel Execution)
Purpose: Execute multiple tasks in parallel, aggregate results
Pattern:
Spawn child workflows or parallel activitiesWait for all to completeAggregate resultsHandle partial failuresScaling Rule (Source: temporal.io/blog/workflow-engine-principles):
Don't scale individual workflowsFor 1M tasks: spawn 1K child workflows × 1K tasks eachKeep each workflow bounded4. Async Callback Pattern
Purpose: Wait for external event or human approval
Pattern:
Workflow sends request and waits for signalExternal system processes asynchronouslySends signal to resume workflowWorkflow continues with responseUse Cases:
Human approval workflowsWebhook callbacksLong-running external processesState Management and Determinism
Automatic State Preservation
How Temporal Works (Source: docs.temporal.io/workflows):
Complete program state preserved automaticallyEvent History records every command and eventSeamless recovery from crashesApplications restore pre-failure stateDeterminism Constraints
Workflows Execute as State Machines:
Replay behavior must be consistentSame inputs → identical outputs every timeProhibited in Workflows (Source: docs.temporal.io/workflows):
❌ Threading, locks, synchronization primitives❌ Random number generation (random())❌ Global state or static variables❌ System time (datetime.now())❌ Direct file I/O or network calls❌ Non-deterministic librariesAllowed in Workflows:
✅ workflow.now() (deterministic time)✅ workflow.random() (deterministic random)✅ Pure functions and calculations✅ Calling activities (non-deterministic operations)Versioning Strategies
Challenge: Changing workflow code while old executions still running
Solutions:
Versioning API: Use workflow.get_version() for safe changesNew Workflow Type: Create new workflow, route new executions to itBackward Compatibility: Ensure old events replay correctlyResilience and Error Handling
Retry Policies
Default Behavior: Temporal retries activities forever
Configure Retry:
Initial retry intervalBackoff coefficient (exponential backoff)Maximum interval (cap retry delay)Maximum attempts (eventually fail)Non-Retryable Errors:
Invalid input (validation failures)Business rule violationsPermanent failures (resource not found)Idempotency Requirements
Why Critical (Source: docs.temporal.io/activities):
Activities may execute multiple timesNetwork failures trigger retriesDuplicate execution must be safeImplementation Strategies:
Idempotency keys (deduplication)Check-then-act with unique constraintsUpsert operations instead of insertTrack processed request IDsActivity Heartbeats
Purpose: Detect stalled long-running activities
Pattern:
Activity sends periodic heartbeatIncludes progress informationTimeout if no heartbeat receivedEnables progress-based retryBest Practices
Workflow Design
Keep workflows focused - Single responsibility per workflowSmall workflows - Use child workflows for scalabilityClear boundaries - Workflow orchestrates, activities executeTest locally - Use time-skipping test environmentActivity Design
Idempotent operations - Safe to retryShort-lived - Seconds to minutes, not hoursTimeout configuration - Always set timeoutsHeartbeat for long tasks - Report progressError handling - Distinguish retryable vs non-retryableCommon Pitfalls
Workflow Violations:
Using datetime.now() instead of workflow.now()Threading or async operations in workflow codeCalling external APIs directly from workflowNon-deterministic logic in workflowsActivity Mistakes:
Non-idempotent operations (can't handle retries)Missing timeouts (activities run forever)No error classification (retry validation errors)Ignoring payload limits (2MB per argument)Operational Considerations
Monitoring:
Workflow execution durationActivity failure ratesRetry attempts and backoffPending workflow countsScalability:
Horizontal scaling with workersTask queue partitioningChild workflow decompositionActivity batching when appropriateAdditional Resources
Official Documentation:
Temporal Core Concepts: docs.temporal.io/workflowsWorkflow Patterns: docs.temporal.io/evaluate/use-cases-design-patternsBest Practices: docs.temporal.io/develop/best-practicesSaga Pattern: temporal.io/blog/saga-pattern-made-easyKey Principles:
Workflows = orchestration, Activities = external callsDeterminism is non-negotiable for workflowsIdempotency is critical for activitiesState preservation is automaticDesign for failure and recovery