Real-world Terraform scenarios to test and improve your Infrastructure as Code skills
15 min read
Introduction
Terraform is a powerful Infrastructure as Code tool that presents various challenges in real-world scenarios. These scenario-based questions cover common issues and advanced use cases that DevOps engineers encounter when managing cloud infrastructure with Terraform.
Scenario 1: Multi-Environment State Management
Situation
Your organization has development, staging, and production environments. Currently, all environments share the same Terraform state file, causing conflicts and accidental production changes.
Current Problem
# Current problematic structure terraform/ ├── main.tf ├── variables.tf ├── terraform.tfvars └── terraform.tfstate # Shared across all environments
Solution: Workspace-based Approach
# Directory structure terraform/ ├── environments/ │ ├── dev/ │ │ ├── main.tf │ │ ├── variables.tf │ │ └── terraform.tfvars │ ├── staging/ │ │ ├── main.tf │ │ ├── variables.tf │ │ └── terraform.tfvars │ └── production/ │ ├── main.tf │ ├── variables.tf │ └── terraform.tfvars ├── modules/ │ ├── vpc/ │ ├── ec2/ │ └── rds/ └── scripts/
Backend Configuration
# environments/dev/backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "dev/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
# environments/production/backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "production/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
Environment-specific Variables
# environments/dev/terraform.tfvars environment = "dev" instance_type = "t3.micro" instance_count = 1 db_instance_class = "db.t3.micro" # environments/production/terraform.tfvars environment = "production" instance_type = "m5.large" instance_count = 3 db_instance_class = "db.r5.large"
Scenario 2: Large Infrastructure Refactoring
Situation
You have a monolithic Terraform configuration managing 200+ resources. You need to refactor it into modules without causing downtime or state corruption.
Current Monolithic Structure
# Current problematic main.tf (simplified)
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
}
resource "aws_subnet" "public" {
count = 3
# ... 50+ networking resources
}
resource "aws_instance" "web" {
count = 5
# ... 100+ compute resources
}
resource "aws_rds_cluster" "database" {
# ... 50+ database resources
}
Modular Refactoring Strategy
# Step 1: Create modules
# modules/networking/main.tf
resource "aws_vpc" "this" {
cidr_block = var.cidr_block
tags = var.tags
}
resource "aws_subnet" "public" {
count = length(var.public_subnets)
# ... other networking resources
}
# modules/compute/main.tf
resource "aws_instance" "this" {
count = var.instance_count
# ... compute resources
}
# Step 2: Use state moves
terraform state mv aws_vpc.main module.networking.aws_vpc.this
terraform state mv aws_subnet.public module.networking.aws_subnet.public
Safe Refactoring Process
# 1. Plan refactoring in feature branch git checkout -b infrastructure-refactor # 2. Create modules incrementally # Start with least critical resources # 3. Use terraform state mv commands terraform state list > state-backup.txt terraform state mv aws_security_group.web module.networking.aws_security_group.web # 4. Verify with plan terraform plan -detailed-exitcode # 5. Test in non-production first # 6. Create rollback plan
Scenario 3: Secrets Management in Terraform
Situation
Your Terraform configurations contain database passwords and API keys in plain text. Security audit requires immediate remediation.
Problematic Code
# Insecure variables.tf
variable "db_password" {
description = "Database password"
type = string
default = "SuperSecret123!" # ❌ Plain text password
}
resource "aws_db_instance" "database" {
password = var.db_password # ❌ Exposed in state file
}
Solution: External Secrets Management
# Using AWS Secrets Manager
data "aws_secretsmanager_secret" "db_credentials" {
name = "prod/database/credentials"
}
data "aws_secretsmanager_secret_version" "db_credentials" {
secret_id = data.aws_secretsmanager_secret.db_credentials.id
}
locals {
db_credentials = jsondecode(data.aws_secretsmanager_secret_version.db_credentials.secret_string)
}
resource "aws_db_instance" "database" {
username = local.db_credentials.username
password = local.db_credentials.password # ✅ Not in state file
}
Using HashiCorp Vault
# terraform/vault.tf
data "vault_generic_secret" "db_credentials" {
path = "secret/data/prod/database"
}
resource "aws_db_instance" "database" {
username = data.vault_generic_secret.db_credentials.data["username"]
password = data.vault_generic_secret.db_credentials.data["password"]
}
# Initialize Vault provider
provider "vault" {
address = "https://vault.company.com:8200"
token = var.vault_token
}
Scenario 4: Zero-Downtime Resource Updates
Situation
You need to update an AWS Launch Configuration that's used by an Auto Scaling Group without causing downtime to the application.
Problem: Force Replacement
resource "aws_launch_configuration" "web" {
name_prefix = "web-lc-"
image_id = "ami-12345678"
instance_type = "t3.micro" # Changing to t3.small forces replacement
lifecycle {
create_before_destroy = true # ✅ Helps but doesn't prevent downtime completely
}
}
resource "aws_autoscaling_group" "web" {
launch_configuration = aws_launch_configuration.web.name
min_size = 2
max_size = 6
}
Solution: Launch Template with Blue-Green
# Create new launch template
resource "aws_launch_template" "web_new" {
name_prefix = "web-template-"
image_id = var.ami_id
instance_type = "t3.small"
tag_specifications {
resource_type = "instance"
tags = {
Name = "web-instance-new"
}
}
}
# Create new ASG with new template
resource "aws_autoscaling_group" "web_new" {
name_prefix = "web-asg-new-"
launch_template {
id = aws_launch_template.web_new.id
version = "$Latest"
}
min_size = 2
max_size = 6
desired_capacity = 2
tag {
key = "Environment"
value = "new"
propagate_at_launch = true
}
lifecycle {
ignore_changes = [desired_capacity]
}
}
# Gradually shift traffic
resource "aws_autoscaling_attachment" "web_new" {
autoscaling_group_name = aws_autoscaling_group.web_new.id
alb_target_group_arn = aws_lb_target_group.web_new.arn
}
Scenario 5: Cross-Account Resource Management
Situation
Your organization uses multiple AWS accounts (dev, staging, prod) and you need to create resources that span across these accounts.
Cross-Account IAM Roles
# In production account - assume role policy
resource "aws_iam_role" "cross_account_access" {
name = "CrossAccountTerraformRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::DEV_ACCOUNT_ID:root"
}
}
]
})
}
# In dev account - terraform configuration
provider "aws" {
alias = "production"
assume_role {
role_arn = "arn:aws:iam::PROD_ACCOUNT_ID:role/CrossAccountTerraformRole"
}
}
resource "aws_s3_bucket" "cross_account" {
provider = aws.production # Create in production account
bucket = "company-cross-account-bucket"
}
Cross-Account VPC Peering
# In account A (requester)
resource "aws_vpc_peering_connection" "peer" {
vpc_id = aws_vpc.this.id
peer_vpc_id = var.peer_vpc_id
peer_owner_id = var.peer_account_id
auto_accept = false
tags = {
Side = "Requester"
}
}
# In account B (accepter)
provider "aws" {
alias = "accepter"
# Assume role configuration for account B
}
resource "aws_vpc_peering_connection_accepter" "peer" {
provider = aws.accepter
vpc_peering_connection_id = aws_vpc_peering_connection.peer.id
auto_accept = true
tags = {
Side = "Accepter"
}
}
Scenario 6: Conditional Resource Creation
Situation
You need to create certain resources only in specific environments or based on feature flags.
Conditional Resources with Count
# Create NAT gateway only in production
resource "aws_nat_gateway" "this" {
count = var.environment == "production" ? 1 : 0
allocation_id = aws_eip.nat[0].id
subnet_id = aws_subnet.public[0].id
tags = {
Name = "nat-${var.environment}"
}
}
# Conditional EIP for NAT gateway
resource "aws_eip" "nat" {
count = var.environment == "production" ? 1 : 0
vpc = true
}
Feature Flag Based Resources
# variables.tf
variable "enable_advanced_monitoring" {
description = "Enable advanced CloudWatch monitoring"
type = bool
default = false
}
variable "enable_backup" {
description = "Enable AWS Backup"
type = bool
default = true
}
# Conditional resources
resource "aws_cloudwatch_dashboard" "advanced" {
count = var.enable_advanced_monitoring ? 1 : 0
dashboard_name = "advanced-monitoring"
dashboard_body = jsonencode({
# Dashboard configuration
})
}
resource "aws_backup_plan" "this" {
count = var.enable_backup ? 1 : 0
name = "backup-plan"
rule {
rule_name = "daily-backup"
target_vault_name = aws_backup_vault.this[0].name
schedule = "cron(0 2 * * ? *)"
}
}
Scenario 7: Terraform State Recovery
Situation
Your Terraform state file has been corrupted or accidentally deleted. You need to recover and import existing infrastructure back into Terraform management.
State Recovery Process
# 1. Identify existing resources
# Using AWS CLI to list resources
aws ec2 describe-instances --filters "Name=tag:Project,Values=my-project"
aws rds describe-db-instances
aws s3api list-buckets
# 2. Create minimal Terraform configuration
# recovery.tf
resource "aws_instance" "web" {
# Minimal configuration - will be updated after import
ami = "placeholder"
instance_type = "t3.micro"
}
# 3. Import resources
terraform import aws_instance.web i-1234567890abcdef0
# 4. Update configuration to match imported resource
data "aws_instance" "web" {
instance_id = "i-1234567890abcdef0"
}
resource "aws_instance" "web" {
ami = data.aws_instance.web.ami
instance_type = data.aws_instance.web.instance_type
# ... other attributes from data source
}
Bulk Import Script
#!/bin/bash
# bulk-import.sh
# Array of resources to import
declare -A resources=(
["aws_instance.web"]="i-1234567890abcdef0"
["aws_lb.main"]="arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/my-lb/1234567890123456"
["aws_rds_cluster.database"]="my-database-cluster"
)
for resource in "${!resources[@]}"; do
echo "Importing $resource with ID ${resources[$resource]}"
terraform import "$resource" "${resources[$resource]}"
if [ $? -eq 0 ]; then
echo "✅ Successfully imported $resource"
else
echo "❌ Failed to import $resource"
fi
done
Scenario 8: Complex Variable Validation
Situation
You need to enforce strict validation on input variables to prevent misconfiguration in production environments.
Advanced Variable Validation
# variables.tf with complex validation
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
variable "instance_count" {
description = "Number of instances"
type = number
validation {
condition = var.instance_count >= 1 && var.instance_count <= 10
error_message = "Instance count must be between 1 and 10."
}
}
variable "cidr_blocks" {
description = "List of CIDR blocks"
type = list(string)
validation {
condition = alltrue([
for cidr in var.cidr_blocks : can(regex("^([0-9]{1,3}\\.){3}[0-9]{1,3}/([0-9]|[1-2][0-9]|3[0-2])$", cidr))
])
error_message = "All CIDR blocks must be valid IPv4 CIDR notation."
}
}
variable "database_config" {
description = "Database configuration"
type = object({
instance_class = string
allocated_storage = number
backup_retention_period = number
multi_az = bool
})
validation {
condition = contains(["db.t3.micro", "db.t3.small", "db.r5.large"], var.database_config.instance_class)
error_message = "Invalid database instance class."
}
validation {
condition = var.database_config.allocated_storage >= 20 && var.database_config.allocated_storage <= 1000
error_message = "Allocated storage must be between 20 and 1000 GB."
}
}
Scenario 9: Terraform in CI/CD Pipeline
Situation
You need to integrate Terraform into your CI/CD pipeline with proper planning, approval gates, and security controls.
GitHub Actions Workflow
# .github/workflows/terraform.yml
name: 'Terraform'
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
env:
TF_VERSION: '1.5.0'
AWS_REGION: 'us-east-1'
jobs:
terraform:
name: 'Terraform'
runs-on: ubuntu-latest
environment: production
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Terraform Format
id: fmt
run: terraform fmt -check
- name: Terraform Init
id: init
run: terraform init
- name: Terraform Validate
id: validate
run: terraform validate -no-color
- name: Terraform Plan
id: plan
run: terraform plan -input=false -no-color
continue-on-error: true
- name: Terraform Plan Status
if: steps.plan.outcome == 'failure'
run: exit 1
- name: Upload Terraform Plan
uses: actions/upload-artifact@v3
with:
name: terraform-plan
path: terraform.tfplan
- name: Terraform Apply
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve -input=false
Approval Gates with Slack
# Add this step before apply
- name: Notify for Approval
if: github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged == true
uses: 8398a7/action-slack@v3
with:
status: custom
custom_payload: |
{
"text": "Terraform changes ready for production deployment",
"attachments": [
{
"title": "Terraform Plan",
"title_link": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}",
"text": "Review and approve the Terraform plan",
"actions": [
{
"type": "button",
"text": "Approve Deployment",
"url": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}/approve"
}
]
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
Scenario 10: Cost Estimation and Optimization
Situation
Your cloud costs are escalating unexpectedly. You need to implement cost estimation and optimization in your Terraform workflow.
Infracost Integration
# .github/workflows/infracost.yml
name: Infracost
on:
pull_request:
branches: [ main ]
jobs:
infracost:
name: Infracost
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Init
run: terraform init
- name: Generate Terraform plan
run: terraform plan -out=plan.tfplan
- name: Install Infracost
uses: infracost/actions/setup@v2
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Run Infracost
run: |
infracost breakdown --path plan.tfplan \
--format json \
--out-file infracost.json
infracost comment github --path infracost.json \
--repo $GITHUB_REPOSITORY \
--pull-request ${{ github.event.pull_request.number }} \
--github-token ${{ secrets.GITHUB_TOKEN }}
Cost-Optimized Resource Configuration
# cost-optimized compute
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = var.environment == "production" ? "m5.large" : "t3.micro"
# Use spot instances for non-production
instance_market_options {
market_type = var.environment == "production" ? "on-demand" : "spot"
dynamic "spot_options" {
for_each = var.environment == "production" ? [] : [1]
content {
max_price = "0.05" # Maximum bid price
}
}
}
# Enable termination protection in production
disable_api_termination = var.environment == "production"
}
# Cost-optimized storage
resource "aws_ebs_volume" "data" {
availability_zone = aws_instance.web.availability_zone
size = var.disk_size
type = var.environment == "production" ? "gp3" : "gp2"
iops = var.environment == "production" ? 3000 : null
tags = {
CostCenter = "infrastructure"
}
}
Best Practices Summary
Key Recommendations
- Use remote state with locking
- Implement proper state isolation between environments
- Use modules for reusable components
- Implement strict variable validation
- Use cost estimation tools
- Implement proper secret management
- Use CI/CD with approval gates
- Regularly update Terraform and providers
Security Considerations
# Secure Terraform configuration 1. Never commit secrets to version control 2. Use least privilege IAM roles 3. Enable state file encryption 4. Use private modules and providers 5. Implement network security controls 6. Regular security scanning of Terraform code
Conclusion
These Terraform scenarios demonstrate the complex challenges of managing infrastructure as code at scale. The key to successful Terraform implementation is planning, security, and maintainability. By addressing these common scenarios proactively, you can build robust, efficient, and secure infrastructure that scales with your organization's needs.
Remember: Continuously monitor your Terraform configurations, keep state files secure, and regularly review and optimize your infrastructure based on usage patterns and cost analysis.
Comments
Post a Comment