Terraform Best Practices for Production Infrastructure

Terraform can provision your entire cloud infrastructure in minutes — or it can be the source of your worst production incidents. After using it to manage enterprise AWS and Azure environments, here are the practices that made the difference.

1. Remote State is Non-Negotiable

Never use local state in a team environment. The moment two engineers run terraform apply against local state files, you have a split-brain situation.

# backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "prod/networking/terraform.tfstate"
    region         = "ap-south-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # prevents concurrent applies
  }
}

The DynamoDB lock table is critical — it prevents two engineers from running apply simultaneously. Without it, you will have a race condition at the worst possible time.

2. Workspace-per-Environment (with Caveats)

Terraform workspaces let you use the same code for multiple environments:

locals {
  env = terraform.workspace   # "staging" or "production"

  config = {
    staging = {
      instance_type  = "t3.medium"
      min_capacity   = 1
      max_capacity   = 3
    }
    production = {
      instance_type  = "r6i.xlarge"
      min_capacity   = 3
      max_capacity   = 20
    }
  }

  current = local.config[local.env]
}

The caveat: workspaces share a backend bucket. For true environment isolation (separate AWS accounts), use separate state files and Terragrunt. For a single-account setup, workspaces work fine.

3. Module Design: Keep Them Small and Focused

The temptation is to build a giant "vpc" module that does everything. Resist it.

modules/
├── networking/
│   ├── vpc/           # just the VPC, subnets, IGW
│   ├── security-groups/
│   └── vpc-peering/
├── compute/
│   ├── ec2/
│   ├── asg/           # Auto Scaling Group
│   └── alb/
├── database/
│   ├── rds/
│   └── elasticache/
└── observability/
    ├── cloudwatch/
    └── sns-alarms/

Small modules are testable, reusable, and composable. A 2000-line "infrastructure" module is a maintenance nightmare.

4. Use `locals` to Reduce Repetition

Every resource tagging strategy ends up with the same 6 tags. Don't repeat them:

locals {
  common_tags = {
    Environment = var.environment
    Project     = var.project_name
    ManagedBy   = "terraform"
    Owner       = var.team_email
    CostCenter  = var.cost_center
    CreatedAt   = timestamp()
  }
}

resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = local.current.instance_type

  tags = merge(local.common_tags, {
    Name = "${var.project_name}-${var.environment}-app"
    Role = "application"
  })
}

5. Secrets: Never in .tfvars

I've seen .tfvars files with database passwords committed to Git. Don't be that team.

Pattern 1: AWS Secrets Manager + data source

data "aws_secretsmanager_secret_version" "db" {
  secret_id = "prod/rds/master-password"
}

resource "aws_db_instance" "main" {
  password = jsondecode(data.aws_secretsmanager_secret_version.db.secret_string)["password"]
}

Pattern 2: Environment variables for provider credentials

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

Never put AWS credentials in your Terraform files.

6. The Drift Detection Habit

Before every apply in production, run plan and review it carefully. We automated this into CI:

# GitHub Actions
- name: Terraform Plan
  run: |
    terraform plan -out=tfplan -detailed-exitcode
  # Exit code 2 = changes detected
  # Exit code 0 = no changes
  # Exit code 1 = error

A PR that only touches documentation shouldn't have Terraform changes. If your plan shows unexpected changes, stop and investigate before applying.

7. `terraform_remote_state` Over Hard-coded Values

Avoid hardcoding resource IDs between modules. Use remote state references:

# In the compute module, referencing networking outputs
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "company-terraform-state"
    key    = "prod/networking/terraform.tfstate"
    region = "ap-south-1"
  }
}

resource "aws_instance" "app" {
  subnet_id              = data.terraform_remote_state.networking.outputs.private_subnet_ids[0]
  vpc_security_group_ids = [data.terraform_remote_state.networking.outputs.app_sg_id]
}

The Result: Hours → 15 Minutes

Before Terraform, standing up a new client environment (VPC, subnets, security groups, EC2, RDS, ALB) took 3-4 hours of console clicking and was error-prone. With the module library established, a terraform apply runs in under 15 minutes with zero manual steps.

The investment in module design and remote state setup pays for itself on the third environment provisioned.

Quick Reference Checklist

→✅ Remote state with S3 + DynamoDB lock
→✅ Separate state files per environment/component
→✅ Small, focused modules
→✅ locals for common tags and computed values
→✅ Secrets from AWS Secrets Manager, never in tfvars
→✅ terraform plan reviewed in every CI pipeline
→✅ terraform_remote_state for cross-module references
→✅ Version constraints on providers and modules

All Articles

// Written by Lavi Singodiya · February 10, 2026