terraform-skill - Agent Skills

Terraform Skill for Claude

Comprehensive Terraform and OpenTofu guidance covering testing, modules, CI/CD, and production patterns. Based on terraform-best-practices.com and enterprise experience.

When to Use This Skill

Activate this skill when:

Creating new Terraform or OpenTofu configurations or modules

Setting up testing infrastructure for IaC code

Deciding between testing approaches (validate, plan, frameworks)

Structuring multi-environment deployments

Implementing CI/CD for infrastructure-as-code

Reviewing or refactoring existing Terraform/OpenTofu projects

Choosing between module patterns or state management approaches

Don't use this skill for:

Basic Terraform/OpenTofu syntax questions (Claude knows this)

Provider-specific API reference (link to docs instead)

Cloud platform questions unrelated to Terraform/OpenTofu

Core Principles

1. Code Structure Philosophy

Module Hierarchy:

Type	When to Use	Scope
Resource Module	Single logical group of connected resources	VPC + subnets, Security group + rules
Infrastructure Module	Collection of resource modules for a purpose	Multiple resource modules in one region/account
Composition	Complete infrastructure	Spans multiple regions/accounts

Hierarchy: Resource → Resource Module → Infrastructure Module → Composition

Directory Structure:

environments/ # Environment-specific configurations ├── prod/ ├── staging/ └── dev/ modules/ # Reusable modules ├── networking/ ├── compute/ └── data/

examples/ # Module usage examples (also serve as tests) ├── complete/ └── minimal/

Key principle from terraform-best-practices.com:

Separate environments (prod, staging) from modules (reusable components)

Use examples/ as both documentation and integration test fixtures

Keep modules small and focused (single responsibility)

For detailed module architecture, see: Code Patterns: Module Types & Hierarchy

2. Naming Conventions

Resources:

# Good: Descriptive, contextual
resource "aws_instance" "web_server" { }
resource "aws_s3_bucket" "application_logs" { }
Good: "this" for singleton resources (only one of that type)

resource "aws_vpc" "this" { }
resource "aws_security_group" "this" { }
Avoid: Generic names for non-singletons

resource "aws_instance" "main" { }
resource "aws_s3_bucket" "bucket" { }

Singleton Resources:

Use "this" when your module creates only one resource of that type:

✅ DO:

resource "aws_vpc" "this" {}           # Module creates one VPC
resource "aws_security_group" "this" {}  # Module creates one SG

❌ DON'T use "this" for multiple resources:

resource "aws_subnet" "this" {}  # If creating multiple subnets

Use descriptive names when creating multiple resources of the same type.

Variables:

# Prefix with context when needed
var.vpc_cidr_block          # Not just "cidr"
var.database_instance_class # Not just "instance_class"

Files:

main.tf - Primary resources

variables.tf - Input variables

outputs.tf - Output values

versions.tf - Provider versions

data.tf - Data sources (optional)

Testing Strategy Framework

Decision Matrix: Which Testing Approach?

Your Situation	Recommended Approach	Tools	Cost
Quick syntax check	Static analysis	`terraform validate`, `fmt`	Free
Pre-commit validation	Static + lint	`validate`, `tflint`, `trivy`, `checkov`	Free
Terraform 1.6+, simple logic	Native test framework	Built-in `terraform test`	Free-Low
Pre-1.6, or Go expertise	Integration testing	Terratest	Low-Med
Security/compliance focus	Policy as code	OPA, Sentinel	Free
Cost-sensitive workflow	Mock providers (1.7+)	Native tests + mocking	Free
Multi-cloud, complex	Full integration	Terratest + real infra	Med-High

Testing Pyramid for Infrastructure

/\
       /  \          End-to-End Tests (Expensive)
      /____\         - Full environment deployment
     /      \        - Production-like setup
    /________\
   /          \      Integration Tests (Moderate)
  /____________\     - Module testing in isolation
 /              \    - Real resources in test account
/________________\   Static Analysis (Cheap)
                     - validate, fmt, lint
                     - Security scanning

Native Test Best Practices (1.6+)

Before generating test code:

Validate schemas with Terraform MCP:

Search provider docs → Get resource schema → Identify block types

Choose correct command mode:

- command = plan - Fast, for input validation
- command = apply - Required for computed values and set-type blocks

Handle set-type blocks correctly:

- Cannot index with [0]
- Use for expressions to iterate
- Or use command = apply to materialize

Common patterns:

S3 encryption rules: set (use for expressions)

Lifecycle transitions: set (use for expressions)

IAM policy statements: set (use for expressions)

For detailed testing guides, see:

Testing Frameworks Guide - Deep dive into static analysis, native tests, and Terratest

Quick Reference - Decision flowchart and command cheat sheet

Code Structure Standards

Resource Block Ordering

Strict ordering for consistency:

count or for_each FIRST (blank line after)

Other arguments

tags as last real argument

depends_on after tags (if needed)

lifecycle at the very end (if needed)

# ✅ GOOD - Correct ordering
resource "aws_nat_gateway" "this" {
  count = var.create_nat_gateway ? 1 : 0
  allocation_id = aws_eip.this[0].id
  subnet_id     = aws_subnet.public[0].id
  tags = {
    Name = "${var.name}-nat"
  }
  depends_on = [aws_internet_gateway.this]  lifecycle {
    create_before_destroy = true
  }
}

Variable Block Ordering

description (ALWAYS required)

type

default

validation

nullable (when setting to false)

variable "environment" {
  description = "Environment name for resource tagging"
  type        = string
  default     = "dev"
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be one of: dev, staging, prod."
  }  nullable = false
}

For complete structure guidelines, see: Code Patterns: Block Ordering & Structure

Count vs For_Each: When to Use Each

Quick Decision Guide

Scenario	Use	Why
Boolean condition (create or don't)	`count = condition ? 1 : 0`	Simple on/off toggle
Simple numeric replication	`count = 3`	Fixed number of identical resources
Items may be reordered/removed	`for_each = toset(list)`	Stable resource addresses
Reference by key	`for_each = map`	Named access to resources
Multiple named resources	`for_each`	Better maintainability

Common Patterns

Boolean conditions:

# ✅ GOOD - Boolean condition
resource "aws_nat_gateway" "this" {
  count = var.create_nat_gateway ? 1 : 0
  # ...
}

Stable addressing with for_each:

# ✅ GOOD - Removing "us-east-1b" only affects that subnet
resource "aws_subnet" "private" {
  for_each = toset(var.availability_zones)
  availability_zone = each.key
  # ...
}
❌ BAD - Removing middle AZ recreates all subsequent subnets

resource "aws_subnet" "private" {
  count = length(var.availability_zones)  availability_zone = var.availability_zones[count.index]
  # ...
}

For migration guides and detailed examples, see: Code Patterns: Count vs For_Each

Locals for Dependency Management

Use locals to ensure correct resource deletion order:

# Problem: Subnets might be deleted after CIDR blocks, causing errors
Solution: Use try() in locals to hint deletion order
locals {
  # References secondary CIDR first, falling back to VPC
  # Forces Terraform to delete subnets before CIDR association
  vpc_id = try(
    aws_vpc_ipv4_cidr_block_association.this[0].vpc_id,
    aws_vpc.this.id,
    ""
  )
}
resource "aws_vpc" "this" {
  cidr_block = "10.0.0.0/16"
}
resource "aws_vpc_ipv4_cidr_block_association" "this" {
  count = var.add_secondary_cidr ? 1 : 0
  vpc_id     = aws_vpc.this.id
  cidr_block = "10.1.0.0/16"
}resource "aws_subnet" "public" {
  vpc_id     = local.vpc_id  # Uses local, not direct reference
  cidr_block = "10.1.0.0/24"
}

Why this matters:

Prevents deletion errors when destroying infrastructure

Ensures correct dependency order without explicit depends_on

Particularly useful for VPC configurations with secondary CIDR blocks

For detailed examples, see: Code Patterns: Locals for Dependency Management

Module Development

Standard Module Structure

my-module/
├── README.md           # Usage documentation
├── main.tf             # Primary resources
├── variables.tf        # Input variables with descriptions
├── outputs.tf          # Output values
├── versions.tf         # Provider version constraints
├── examples/
│   ├── minimal/        # Minimal working example
│   └── complete/       # Full-featured example
└── tests/              # Test files
    └── module_test.tftest.hcl  # Or .go

Best Practices Summary

Variables:

✅ Always include description

✅ Use explicit type constraints

✅ Provide sensible default values where appropriate

✅ Add validation blocks for complex constraints

✅ Use sensitive = true for secrets

Outputs:

✅ Always include description

✅ Mark sensitive outputs with sensitive = true

✅ Consider returning objects for related values

✅ Document what consumers should do with each output

For detailed module patterns, see:

Module Patterns Guide - Variable best practices, output design, ✅ DO vs ❌ DON'T patterns

Quick Reference - Resource naming, variable naming, file organization

CI/CD Integration

Recommended Workflow Stages

Validate - Format check + syntax validation + linting

Test - Run automated tests (native or Terratest)

Plan - Generate and review execution plan

Apply - Execute changes (with approvals for production)

Cost Optimization Strategy

Use mocking for PR validation (free)

Run integration tests only on main branch (controlled cost)

Implement auto-cleanup (prevent orphaned resources)

Tag all test resources (track spending)

For complete CI/CD templates, see:

CI/CD Workflows Guide - GitHub Actions, GitLab CI, Atlantis integration, cost optimization

Quick Reference - Common CI/CD issues and solutions

Security & Compliance

Essential Security Checks

# Static security scanning
trivy config .
checkov -d .

Common Issues to Avoid

❌ Don't:

Store secrets in variables

Use default VPC

Skip encryption

Open security groups to 0.0.0.0/0

✅ Do:

Use AWS Secrets Manager / Parameter Store

Create dedicated VPCs

Enable encryption at rest

Use least-privilege security groups

For detailed security guidance, see:

Security & Compliance Guide - Trivy/Checkov integration, secrets management, state file security, compliance testing

Version Management

Version Constraint Syntax

version = "5.0.0"      # Exact (avoid - inflexible)
version = "~> 5.0"     # Recommended: 5.0.x only
version = ">= 5.0"     # Minimum (risky - breaking changes)

Strategy by Component

Component	Strategy	Example
Terraform	Pin minor version	`required_version = "~> 1.9"`
Providers	Pin major version	`version = "~> 5.0"`
Modules (prod)	Pin exact version	`version = "5.1.2"`
Modules (dev)	Allow patch updates	`version = "~> 5.1"`

Update Workflow

# Lock versions initially
terraform init              # Creates .terraform.lock.hcl
Update to latest within constraints

terraform init -upgrade     # Updates providers
Review and test

terraform plan

For detailed version management, see: Code Patterns: Version Management

Modern Terraform Features (1.0+)

Feature Availability by Version

Feature	Version	Use Case
`try()` function	0.13+	Safe fallbacks, replaces `element(concat())`
`nullable = false`	1.1+	Prevent null values in variables
`moved` blocks	1.1+	Refactor without destroy/recreate
`optional()` with defaults	1.3+	Optional object attributes
Native testing	1.6+	Built-in test framework
Mock providers	1.7+	Cost-free unit testing
Provider functions	1.8+	Provider-specific data transformation
Cross-variable validation	1.9+	Validate relationships between variables
Write-only arguments	1.11+	Secrets never stored in state

Quick Examples

# try() - Safe fallbacks (0.13+)
output "sg_id" {
  value = try(aws_security_group.this[0].id, "")
}
optional() - Optional attributes with defaults (1.3+)

variable "config" {
  type = object({
    name    = string
    timeout = optional(number, 300)  # Default: 300
  })
}
Cross-variable validation (1.9+)

variable "environment" { type = string }
variable "backup_days" {
  type = number
  validation {
    condition     = var.environment == "prod" ? var.backup_days >= 7 : true
    error_message = "Production requires backup_days >= 7"
  }
}

For complete patterns and examples, see: Code Patterns: Modern Terraform Features

Version-Specific Guidance

Terraform 1.0-1.5

Use Terratest for testing

No native testing framework available

Focus on static analysis and plan validation

Terraform 1.6+ / OpenTofu 1.6+

New: Native terraform test / tofu test command

Consider migrating from external frameworks for simple tests

Keep Terratest only for complex integration tests

Terraform 1.7+ / OpenTofu 1.7+

New: Mock providers for unit testing

Reduce cost by mocking external dependencies

Use real integration tests for final validation

Terraform vs OpenTofu

Both are fully supported by this skill. For licensing, governance, and feature comparison, see Quick Reference: Terraform vs OpenTofu.

Detailed Guides

This skill uses progressive disclosure - essential information is in this main file, detailed guides are available when needed:

📚 Reference Files:

Testing Frameworks - In-depth guide to static analysis, native tests, and Terratest

Module Patterns - Module structure, variable/output best practices, ✅ DO vs ❌ DON'T patterns

CI/CD Workflows - GitHub Actions, GitLab CI templates, cost optimization, automated cleanup

Security & Compliance - Trivy/Checkov integration, secrets management, compliance testing

Quick Reference - Command cheat sheets, decision flowcharts, troubleshooting guide

How to use: When you need detailed information on a topic, reference the appropriate guide. Claude will load it on demand to provide comprehensive guidance.

License

This skill is licensed under the Apache License 2.0. See the LICENSE file for full terms.