workflow-orchestration-patterns

Workflow Orchestration Patterns

Master workflow orchestration architecture with Temporal, covering fundamental design decisions, resilience patterns, and best practices for building reliable distributed systems.

Use this skill when

Working on workflow orchestration patterns tasks or workflows

Needing guidance, best practices, or checklists for workflow orchestration patterns

Do not use this skill when

The task is unrelated to workflow orchestration patterns

You need a different domain or tool outside this scope

Instructions

Clarify goals, constraints, and required inputs.

Apply relevant best practices and validate outcomes.

Provide actionable steps and verification.

If detailed examples are required, open resources/implementation-playbook.md.

When to Use Workflow Orchestration

Ideal Use Cases (Source: docs.temporal.io)

Multi-step processes spanning machines/services/databases

Distributed transactions requiring all-or-nothing semantics

Long-running workflows (hours to years) with automatic state persistence

Failure recovery that must resume from last successful step

Business processes: bookings, orders, campaigns, approvals

Entity lifecycle management: inventory tracking, account management, cart workflows

Infrastructure automation: CI/CD pipelines, provisioning, deployments

Human-in-the-loop systems requiring timeouts and escalations

When NOT to Use

Simple CRUD operations (use direct API calls)

Pure data processing pipelines (use Airflow, batch processing)

Stateless request/response (use standard APIs)

Real-time streaming (use Kafka, event processors)

Critical Design Decision: Workflows vs Activities

The Fundamental Rule (Source: temporal.io/blog/workflow-engine-principles):

Workflows = Orchestration logic and decision-making

Activities = External interactions (APIs, databases, network calls)

Workflows (Orchestration)

Characteristics:

Contain business logic and coordination

MUST be deterministic (same inputs → same outputs)

Cannot perform direct external calls

State automatically preserved across failures

Can run for years despite infrastructure failures

Example workflow tasks:

Decide which steps to execute

Handle compensation logic

Manage timeouts and retries

Coordinate child workflows

Activities (External Interactions)

Characteristics:

Handle all external system interactions

Can be non-deterministic (API calls, DB writes)

Include built-in timeouts and retry logic

Must be idempotent (calling N times = calling once)

Short-lived (seconds to minutes typically)

Example activity tasks:

Call payment gateway API

Write to database

Send emails or notifications

Query external services

Design Decision Framework

Does it touch external systems? → Activity
Is it orchestration/decision logic? → Workflow

Core Workflow Patterns

1. Saga Pattern with Compensation

Purpose: Implement distributed transactions with rollback capability

Pattern (Source: temporal.io/blog/compensating-actions-part-of-a-complete-breakfast-with-sagas):

For each step:
  1. Register compensation BEFORE executing
  2. Execute the step (via activity)
  3. On failure, run all compensations in reverse order (LIFO)

Example: Payment Workflow

Reserve inventory (compensation: release inventory)

Charge payment (compensation: refund payment)

Fulfill order (compensation: cancel fulfillment)

Critical Requirements:

Compensations must be idempotent

Run compensations in reverse order

Handle partial failures gracefully

2. Entity Workflows (Actor Model)

Purpose: Long-lived workflow representing single entity instance

Pattern (Source: docs.temporal.io/evaluate/use-cases-design-patterns):

One workflow execution = one entity (cart, account, inventory item)

Workflow persists for entity lifetime

Receives signals for state changes

Supports queries for current state

Example Use Cases:

Shopping cart (add items, checkout, expiration)

Bank account (deposits, withdrawals, balance checks)

Product inventory (stock updates, reservations)

Benefits:

Encapsulates entity behavior

Guarantees consistency per entity

Natural event sourcing

3. Fan-Out/Fan-In (Parallel Execution)

Purpose: Execute multiple tasks in parallel, aggregate results

Pattern:

Spawn child workflows or parallel activities

Wait for all to complete

Aggregate results

Handle partial failures

Scaling Rule (Source: temporal.io/blog/workflow-engine-principles):

Don't scale individual workflows

For 1M tasks: spawn 1K child workflows × 1K tasks each

Keep each workflow bounded

4. Async Callback Pattern

Purpose: Wait for external event or human approval

Pattern:

Workflow sends request and waits for signal

External system processes asynchronously

Sends signal to resume workflow

Workflow continues with response

Use Cases:

Human approval workflows

Webhook callbacks

Long-running external processes

State Management and Determinism

Automatic State Preservation

How Temporal Works (Source: docs.temporal.io/workflows):

Complete program state preserved automatically

Event History records every command and event

Seamless recovery from crashes

Applications restore pre-failure state

Determinism Constraints

Workflows Execute as State Machines:

Replay behavior must be consistent

Same inputs → identical outputs every time

Prohibited in Workflows (Source: docs.temporal.io/workflows):

❌ Threading, locks, synchronization primitives

❌ Random number generation (random())

❌ Global state or static variables

❌ System time (datetime.now())

❌ Direct file I/O or network calls

❌ Non-deterministic libraries

Allowed in Workflows:

✅ workflow.now() (deterministic time)

✅ workflow.random() (deterministic random)

✅ Pure functions and calculations

✅ Calling activities (non-deterministic operations)

Versioning Strategies

Challenge: Changing workflow code while old executions still running

Solutions:

Versioning API: Use workflow.get_version() for safe changes

New Workflow Type: Create new workflow, route new executions to it

Backward Compatibility: Ensure old events replay correctly

Resilience and Error Handling

Retry Policies

Default Behavior: Temporal retries activities forever

Configure Retry:

Initial retry interval

Backoff coefficient (exponential backoff)

Maximum interval (cap retry delay)

Maximum attempts (eventually fail)

Non-Retryable Errors:

Invalid input (validation failures)

Business rule violations

Permanent failures (resource not found)

Idempotency Requirements

Why Critical (Source: docs.temporal.io/activities):

Activities may execute multiple times

Network failures trigger retries

Duplicate execution must be safe

Implementation Strategies:

Idempotency keys (deduplication)

Check-then-act with unique constraints

Upsert operations instead of insert

Track processed request IDs

Activity Heartbeats

Purpose: Detect stalled long-running activities

Pattern:

Activity sends periodic heartbeat

Includes progress information

Timeout if no heartbeat received

Enables progress-based retry

Best Practices

Workflow Design

Keep workflows focused - Single responsibility per workflow

Small workflows - Use child workflows for scalability

Clear boundaries - Workflow orchestrates, activities execute

Test locally - Use time-skipping test environment

Activity Design

Idempotent operations - Safe to retry

Short-lived - Seconds to minutes, not hours

Timeout configuration - Always set timeouts

Heartbeat for long tasks - Report progress

Error handling - Distinguish retryable vs non-retryable

Common Pitfalls

Workflow Violations:

Using datetime.now() instead of workflow.now()

Threading or async operations in workflow code

Calling external APIs directly from workflow

Non-deterministic logic in workflows

Activity Mistakes:

Non-idempotent operations (can't handle retries)

Missing timeouts (activities run forever)

No error classification (retry validation errors)

Ignoring payload limits (2MB per argument)

Operational Considerations

Monitoring:

Workflow execution duration

Activity failure rates

Retry attempts and backoff

Pending workflow counts

Scalability:

Horizontal scaling with workers

Task queue partitioning

Child workflow decomposition

Activity batching when appropriate

Additional Resources

Official Documentation:

Temporal Core Concepts: docs.temporal.io/workflows

Workflow Patterns: docs.temporal.io/evaluate/use-cases-design-patterns

Best Practices: docs.temporal.io/develop/best-practices

Saga Pattern: temporal.io/blog/saga-pattern-made-easy

Key Principles:

Workflows = orchestration, Activities = external calls

Determinism is non-negotiable for workflows

Idempotency is critical for activities

State preservation is automatic

Design for failure and recovery