Real-world Terraform scenarios to test and improve your Infrastructure as Code skills

Terraform Scenario Questions

15 min read

Introduction

Terraform is a powerful Infrastructure as Code tool that presents various challenges in real-world scenarios. These scenario-based questions cover common issues and advanced use cases that DevOps engineers encounter when managing cloud infrastructure with Terraform.

Scenario 1: Multi-Environment State Management

Situation

Your organization has development, staging, and production environments. Currently, all environments share the same Terraform state file, causing conflicts and accidental production changes.

Current Problem

# Current problematic structure
terraform/
├── main.tf
├── variables.tf
├── terraform.tfvars
└── terraform.tfstate  # Shared across all environments

Solution: Workspace-based Approach

# Directory structure
terraform/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── production/
│       ├── main.tf
│       ├── variables.tf
│       └── terraform.tfvars
├── modules/
│   ├── vpc/
│   ├── ec2/
│   └── rds/
└── scripts/

Backend Configuration

# environments/dev/backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

# environments/production/backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

Environment-specific Variables

# environments/dev/terraform.tfvars
environment = "dev"
instance_type = "t3.micro"
instance_count = 1
db_instance_class = "db.t3.micro"

# environments/production/terraform.tfvars
environment = "production"
instance_type = "m5.large"
instance_count = 3
db_instance_class = "db.r5.large"

Scenario 2: Large Infrastructure Refactoring

Situation

You have a monolithic Terraform configuration managing 200+ resources. You need to refactor it into modules without causing downtime or state corruption.

Current Monolithic Structure

# Current problematic main.tf (simplified)
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "public" {
  count = 3
  # ... 50+ networking resources
}

resource "aws_instance" "web" {
  count = 5
  # ... 100+ compute resources
}

resource "aws_rds_cluster" "database" {
  # ... 50+ database resources
}

Modular Refactoring Strategy

# Step 1: Create modules
# modules/networking/main.tf
resource "aws_vpc" "this" {
  cidr_block = var.cidr_block
  tags       = var.tags
}

resource "aws_subnet" "public" {
  count = length(var.public_subnets)
  # ... other networking resources
}

# modules/compute/main.tf
resource "aws_instance" "this" {
  count = var.instance_count
  # ... compute resources
}

# Step 2: Use state moves
terraform state mv aws_vpc.main module.networking.aws_vpc.this
terraform state mv aws_subnet.public module.networking.aws_subnet.public

Safe Refactoring Process

# 1. Plan refactoring in feature branch
git checkout -b infrastructure-refactor

# 2. Create modules incrementally
# Start with least critical resources

# 3. Use terraform state mv commands
terraform state list > state-backup.txt
terraform state mv aws_security_group.web module.networking.aws_security_group.web

# 4. Verify with plan
terraform plan -detailed-exitcode

# 5. Test in non-production first
# 6. Create rollback plan

Scenario 3: Secrets Management in Terraform

Situation

Your Terraform configurations contain database passwords and API keys in plain text. Security audit requires immediate remediation.

Problematic Code

# Insecure variables.tf
variable "db_password" {
  description = "Database password"
  type        = string
  default     = "SuperSecret123!"  # ❌ Plain text password
}

resource "aws_db_instance" "database" {
  password = var.db_password  # ❌ Exposed in state file
}

Solution: External Secrets Management

# Using AWS Secrets Manager
data "aws_secretsmanager_secret" "db_credentials" {
  name = "prod/database/credentials"
}

data "aws_secretsmanager_secret_version" "db_credentials" {
  secret_id = data.aws_secretsmanager_secret.db_credentials.id
}

locals {
  db_credentials = jsondecode(data.aws_secretsmanager_secret_version.db_credentials.secret_string)
}

resource "aws_db_instance" "database" {
  username = local.db_credentials.username
  password = local.db_credentials.password  # ✅ Not in state file
}

Using HashiCorp Vault

# terraform/vault.tf
data "vault_generic_secret" "db_credentials" {
  path = "secret/data/prod/database"
}

resource "aws_db_instance" "database" {
  username = data.vault_generic_secret.db_credentials.data["username"]
  password = data.vault_generic_secret.db_credentials.data["password"]
}

# Initialize Vault provider
provider "vault" {
  address = "https://vault.company.com:8200"
  token   = var.vault_token
}

Scenario 4: Zero-Downtime Resource Updates

Situation

You need to update an AWS Launch Configuration that's used by an Auto Scaling Group without causing downtime to the application.

Problem: Force Replacement

resource "aws_launch_configuration" "web" {
  name_prefix   = "web-lc-"
  image_id      = "ami-12345678"
  instance_type = "t3.micro"  # Changing to t3.small forces replacement
  
  lifecycle {
    create_before_destroy = true  # ✅ Helps but doesn't prevent downtime completely
  }
}

resource "aws_autoscaling_group" "web" {
  launch_configuration = aws_launch_configuration.web.name
  min_size = 2
  max_size = 6
}

Solution: Launch Template with Blue-Green

# Create new launch template
resource "aws_launch_template" "web_new" {
  name_prefix   = "web-template-"
  image_id      = var.ami_id
  instance_type = "t3.small"
  
  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "web-instance-new"
    }
  }
}

# Create new ASG with new template
resource "aws_autoscaling_group" "web_new" {
  name_prefix          = "web-asg-new-"
  launch_template {
    id      = aws_launch_template.web_new.id
    version = "$Latest"
  }
  min_size = 2
  max_size = 6
  desired_capacity = 2
  
  tag {
    key                 = "Environment"
    value               = "new"
    propagate_at_launch = true
  }
  
  lifecycle {
    ignore_changes = [desired_capacity]
  }
}

# Gradually shift traffic
resource "aws_autoscaling_attachment" "web_new" {
  autoscaling_group_name = aws_autoscaling_group.web_new.id
  alb_target_group_arn   = aws_lb_target_group.web_new.arn
}

Scenario 5: Cross-Account Resource Management

Situation

Your organization uses multiple AWS accounts (dev, staging, prod) and you need to create resources that span across these accounts.

Cross-Account IAM Roles

# In production account - assume role policy
resource "aws_iam_role" "cross_account_access" {
  name = "CrossAccountTerraformRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::DEV_ACCOUNT_ID:root"
        }
      }
    ]
  })
}

# In dev account - terraform configuration
provider "aws" {
  alias = "production"
  
  assume_role {
    role_arn = "arn:aws:iam::PROD_ACCOUNT_ID:role/CrossAccountTerraformRole"
  }
}

resource "aws_s3_bucket" "cross_account" {
  provider = aws.production  # Create in production account
  bucket   = "company-cross-account-bucket"
}

Cross-Account VPC Peering

# In account A (requester)
resource "aws_vpc_peering_connection" "peer" {
  vpc_id        = aws_vpc.this.id
  peer_vpc_id   = var.peer_vpc_id
  peer_owner_id = var.peer_account_id
  auto_accept   = false
  
  tags = {
    Side = "Requester"
  }
}

# In account B (accepter)
provider "aws" {
  alias = "accepter"
  # Assume role configuration for account B
}

resource "aws_vpc_peering_connection_accepter" "peer" {
  provider                  = aws.accepter
  vpc_peering_connection_id = aws_vpc_peering_connection.peer.id
  auto_accept               = true
  
  tags = {
    Side = "Accepter"
  }
}

Scenario 6: Conditional Resource Creation

Situation

You need to create certain resources only in specific environments or based on feature flags.

Conditional Resources with Count

# Create NAT gateway only in production
resource "aws_nat_gateway" "this" {
  count = var.environment == "production" ? 1 : 0
  
  allocation_id = aws_eip.nat[0].id
  subnet_id     = aws_subnet.public[0].id
  
  tags = {
    Name = "nat-${var.environment}"
  }
}

# Conditional EIP for NAT gateway
resource "aws_eip" "nat" {
  count = var.environment == "production" ? 1 : 0
  vpc   = true
}

Feature Flag Based Resources

# variables.tf
variable "enable_advanced_monitoring" {
  description = "Enable advanced CloudWatch monitoring"
  type        = bool
  default     = false
}

variable "enable_backup" {
  description = "Enable AWS Backup"
  type        = bool
  default     = true
}

# Conditional resources
resource "aws_cloudwatch_dashboard" "advanced" {
  count = var.enable_advanced_monitoring ? 1 : 0
  
  dashboard_name = "advanced-monitoring"
  dashboard_body = jsonencode({
    # Dashboard configuration
  })
}

resource "aws_backup_plan" "this" {
  count = var.enable_backup ? 1 : 0
  
  name = "backup-plan"
  
  rule {
    rule_name         = "daily-backup"
    target_vault_name = aws_backup_vault.this[0].name
    schedule          = "cron(0 2 * * ? *)"
  }
}

Scenario 7: Terraform State Recovery

Situation

Your Terraform state file has been corrupted or accidentally deleted. You need to recover and import existing infrastructure back into Terraform management.

State Recovery Process

# 1. Identify existing resources
# Using AWS CLI to list resources
aws ec2 describe-instances --filters "Name=tag:Project,Values=my-project"
aws rds describe-db-instances
aws s3api list-buckets

# 2. Create minimal Terraform configuration
# recovery.tf
resource "aws_instance" "web" {
  # Minimal configuration - will be updated after import
  ami           = "placeholder"
  instance_type = "t3.micro"
}

# 3. Import resources
terraform import aws_instance.web i-1234567890abcdef0

# 4. Update configuration to match imported resource
data "aws_instance" "web" {
  instance_id = "i-1234567890abcdef0"
}

resource "aws_instance" "web" {
  ami           = data.aws_instance.web.ami
  instance_type = data.aws_instance.web.instance_type
  # ... other attributes from data source
}

Bulk Import Script

#!/bin/bash
# bulk-import.sh

# Array of resources to import
declare -A resources=(
    ["aws_instance.web"]="i-1234567890abcdef0"
    ["aws_lb.main"]="arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/my-lb/1234567890123456"
    ["aws_rds_cluster.database"]="my-database-cluster"
)

for resource in "${!resources[@]}"; do
    echo "Importing $resource with ID ${resources[$resource]}"
    terraform import "$resource" "${resources[$resource]}"
    
    if [ $? -eq 0 ]; then
        echo "✅ Successfully imported $resource"
    else
        echo "❌ Failed to import $resource"
    fi
done

Scenario 8: Complex Variable Validation

Situation

You need to enforce strict validation on input variables to prevent misconfiguration in production environments.

Advanced Variable Validation

# variables.tf with complex validation
variable "environment" {
  description = "Deployment environment"
  type        = string
  
  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  }
}

variable "instance_count" {
  description = "Number of instances"
  type        = number
  
  validation {
    condition     = var.instance_count >= 1 && var.instance_count <= 10
    error_message = "Instance count must be between 1 and 10."
  }
}

variable "cidr_blocks" {
  description = "List of CIDR blocks"
  type        = list(string)
  
  validation {
    condition = alltrue([
      for cidr in var.cidr_blocks : can(regex("^([0-9]{1,3}\\.){3}[0-9]{1,3}/([0-9]|[1-2][0-9]|3[0-2])$", cidr))
    ])
    error_message = "All CIDR blocks must be valid IPv4 CIDR notation."
  }
}

variable "database_config" {
  description = "Database configuration"
  type = object({
    instance_class = string
    allocated_storage = number
    backup_retention_period = number
    multi_az = bool
  })
  
  validation {
    condition = contains(["db.t3.micro", "db.t3.small", "db.r5.large"], var.database_config.instance_class)
    error_message = "Invalid database instance class."
  }
  
  validation {
    condition = var.database_config.allocated_storage >= 20 && var.database_config.allocated_storage <= 1000
    error_message = "Allocated storage must be between 20 and 1000 GB."
  }
}

Scenario 9: Terraform in CI/CD Pipeline

Situation

You need to integrate Terraform into your CI/CD pipeline with proper planning, approval gates, and security controls.

GitHub Actions Workflow

# .github/workflows/terraform.yml
name: 'Terraform'

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

env:
  TF_VERSION: '1.5.0'
  AWS_REGION: 'us-east-1'

jobs:
  terraform:
    name: 'Terraform'
    runs-on: ubuntu-latest
    environment: production
    
    steps:
    - name: Checkout
      uses: actions/checkout@v3
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
      with:
        terraform_version: ${{ env.TF_VERSION }}
    
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ env.AWS_REGION }}
    
    - name: Terraform Format
      id: fmt
      run: terraform fmt -check
    
    - name: Terraform Init
      id: init
      run: terraform init
    
    - name: Terraform Validate
      id: validate
      run: terraform validate -no-color
    
    - name: Terraform Plan
      id: plan
      run: terraform plan -input=false -no-color
      continue-on-error: true
    
    - name: Terraform Plan Status
      if: steps.plan.outcome == 'failure'
      run: exit 1
    
    - name: Upload Terraform Plan
      uses: actions/upload-artifact@v3
      with:
        name: terraform-plan
        path: terraform.tfplan
    
    - name: Terraform Apply
      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
      run: terraform apply -auto-approve -input=false

Approval Gates with Slack

# Add this step before apply
- name: Notify for Approval
  if: github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged == true
  uses: 8398a7/action-slack@v3
  with:
    status: custom
    custom_payload: |
      {
        "text": "Terraform changes ready for production deployment",
        "attachments": [
          {
            "title": "Terraform Plan",
            "title_link": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}",
            "text": "Review and approve the Terraform plan",
            "actions": [
              {
                "type": "button",
                "text": "Approve Deployment",
                "url": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}/approve"
              }
            ]
          }
        ]
      }
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Scenario 10: Cost Estimation and Optimization

Situation

Your cloud costs are escalating unexpectedly. You need to implement cost estimation and optimization in your Terraform workflow.

Infracost Integration

# .github/workflows/infracost.yml
name: Infracost
on:
  pull_request:
    branches: [ main ]

jobs:
  infracost:
    name: Infracost
    runs-on: ubuntu-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v3
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
    
    - name: Terraform Init
      run: terraform init
    
    - name: Generate Terraform plan
      run: terraform plan -out=plan.tfplan
    
    - name: Install Infracost
      uses: infracost/actions/setup@v2
      with:
        api-key: ${{ secrets.INFRACOST_API_KEY }}
    
    - name: Run Infracost
      run: |
        infracost breakdown --path plan.tfplan \
                            --format json \
                            --out-file infracost.json
        
        infracost comment github --path infracost.json \
                                --repo $GITHUB_REPOSITORY \
                                --pull-request ${{ github.event.pull_request.number }} \
                                --github-token ${{ secrets.GITHUB_TOKEN }}

Cost-Optimized Resource Configuration

# cost-optimized compute
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.environment == "production" ? "m5.large" : "t3.micro"
  
  # Use spot instances for non-production
  instance_market_options {
    market_type = var.environment == "production" ? "on-demand" : "spot"
    
    dynamic "spot_options" {
      for_each = var.environment == "production" ? [] : [1]
      content {
        max_price = "0.05"  # Maximum bid price
      }
    }
  }
  
  # Enable termination protection in production
  disable_api_termination = var.environment == "production"
}

# Cost-optimized storage
resource "aws_ebs_volume" "data" {
  availability_zone = aws_instance.web.availability_zone
  size              = var.disk_size
  type              = var.environment == "production" ? "gp3" : "gp2"
  iops              = var.environment == "production" ? 3000 : null
  
  tags = {
    CostCenter = "infrastructure"
  }
}

Best Practices Summary

Key Recommendations

Use remote state with locking
Implement proper state isolation between environments
Use modules for reusable components
Implement strict variable validation
Use cost estimation tools
Implement proper secret management
Use CI/CD with approval gates
Regularly update Terraform and providers

Security Considerations

# Secure Terraform configuration
1. Never commit secrets to version control
2. Use least privilege IAM roles
3. Enable state file encryption
4. Use private modules and providers
5. Implement network security controls
6. Regular security scanning of Terraform code

Conclusion

These Terraform scenarios demonstrate the complex challenges of managing infrastructure as code at scale. The key to successful Terraform implementation is planning, security, and maintainability. By addressing these common scenarios proactively, you can build robust, efficient, and secure infrastructure that scales with your organization's needs.

Remember: Continuously monitor your Terraform configurations, keep state files secure, and regularly review and optimize your infrastructure based on usage patterns and cost analysis.