Infrastructure as Code: Terraform, Pulumi & IaC Best Practices
Modern cloud systems demand reproducibility. Manual clicking through consoles doesn't scale, doesn't version, and doesn't survive an audit. Infrastructure as code solves this by treating your cloud resources the same way you treat application code — written, reviewed, tested, and deployed through pipelines.
This guide covers the IaC landscape end-to-end: paradigms, tooling, state management, GitOps integration, testing, and multi-cloud patterns.
What Is Infrastructure as Code?#
Infrastructure as code (IaC) is the practice of defining and managing cloud resources — servers, networks, databases, IAM policies — through machine-readable configuration files rather than manual processes.
Key benefits:
- Reproducibility — spin up identical environments on demand
- Version control — every change is a commit with history and rollback
- Collaboration — pull requests for infrastructure changes
- Drift detection — compare desired state with actual state
- Compliance — policy-as-code enforces guardrails automatically
Declarative vs Imperative#
| Aspect | Declarative | Imperative |
|---|---|---|
| Philosophy | Describe what you want | Describe how to get there |
| Examples | Terraform, CloudFormation | Pulumi, AWS CDK, scripts |
| State | Tool manages convergence | You manage sequencing |
| Learning curve | DSL-specific | General-purpose languages |
Most production teams lean declarative for predictability, but imperative approaches shine when you need conditionals, loops, or dynamic composition that a DSL makes awkward.
Terraform#
Terraform by HashiCorp is the most widely adopted IaC tool. It uses HCL (HashiCorp Configuration Language), a declarative DSL.
HCL Basics#
provider "aws" {
region = "us-east-1"
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "production-vpc"
Environment = "prod"
}
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
}
output "vpc_id" {
value = aws_vpc.main.id
}
Modules#
Modules are reusable packages of Terraform configuration:
module "networking" {
source = "./modules/networking"
vpc_cidr = "10.0.0.0/16"
environment = "staging"
}
module "database" {
source = "./modules/rds"
subnet_ids = module.networking.private_subnet_ids
engine = "postgres"
version = "15.4"
}
Workspaces#
Workspaces let you manage multiple environments (dev, staging, prod) from a single configuration. Each workspace maintains its own state file:
terraform workspace new staging
terraform workspace select staging
terraform apply -var-file=staging.tfvars
State Management#
Terraform tracks every resource in a state file. Remote backends are essential for teams:
terraform {
backend "s3" {
bucket = "my-tf-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "tf-locks"
encrypt = true
}
}
State locking (via DynamoDB, Consul, etc.) prevents concurrent applies from corrupting state.
Pulumi#
Pulumi takes the imperative approach — you write infrastructure in TypeScript, Python, Go, or C#. No DSL to learn.
TypeScript Example#
import * as aws from "@pulumi/aws";
const vpc = new aws.ec2.Vpc("main", {
cidrBlock: "10.0.0.0/16",
tags: { Name: "production-vpc", Environment: "prod" },
});
const subnet = new aws.ec2.Subnet("public", {
vpcId: vpc.id,
cidrBlock: "10.0.1.0/24",
availabilityZone: "us-east-1a",
});
export const vpcId = vpc.id;
Python Example#
import pulumi_aws as aws
vpc = aws.ec2.Vpc("main",
cidr_block="10.0.0.0/16",
tags={"Name": "production-vpc"})
subnet = aws.ec2.Subnet("public",
vpc_id=vpc.id,
cidr_block="10.0.1.0/24")
pulumi.export("vpc_id", vpc.id)
The advantage: full language features — conditionals, loops, abstractions, type checking, unit tests — with no DSL limitations.
CloudFormation#
AWS CloudFormation is the native IaC tool for AWS. It uses JSON or YAML templates and is deeply integrated with the AWS ecosystem. It handles rollbacks natively but is limited to AWS and can be verbose.
Tool Comparison#
| Feature | Terraform | Pulumi | CloudFormation |
|---|---|---|---|
| Language | HCL | TS, Python, Go, C# | YAML/JSON |
| Multi-cloud | Yes | Yes | AWS only |
| State | Self-managed or Terraform Cloud | Pulumi Cloud or self-managed | AWS-managed |
| Ecosystem | Largest provider library | Growing | AWS-native |
| Learning curve | Moderate (HCL) | Low (familiar languages) | Moderate (verbose templates) |
| Drift detection | terraform plan | pulumi preview | Drift detection API |
| Cost | Open source + paid tiers | Open source + paid tiers | Free (AWS charges apply) |
State Management & Drift Detection#
State drift occurs when actual infrastructure diverges from your declared configuration — someone manually changed a security group, an auto-scaling event added instances, or a colleague applied without committing.
Detection strategies:
- Scheduled plans — run
terraform planon a cron and alert on drift - Cloud provider APIs — CloudFormation has built-in drift detection
- Policy enforcement — tools like Sentinel or OPA block manual changes
- Reconciliation loops — GitOps controllers continuously converge state
GitOps Infrastructure Workflow#
GitOps applies Git-based workflows to infrastructure management:
1. Developer opens PR with Terraform changes
2. CI runs `terraform plan` and posts diff as PR comment
3. Team reviews the plan (adds, changes, destroys)
4. On merge to main, CD pipeline runs `terraform apply`
5. State file updates, drift detection monitors continuously
Tools like Atlantis, Spacelift, and env0 automate this loop. The key principle: Git is the single source of truth for both application and infrastructure code.
Testing IaC#
Terratest#
Terratest (by Gruntwork) lets you write Go tests that deploy real infrastructure, validate it, then tear it down:
func TestVpc(t *testing.T) {
opts := &terraform.Options{
TerraformDir: "../modules/networking",
Vars: map[string]interface{}{
"vpc_cidr": "10.0.0.0/16",
},
}
defer terraform.Destroy(t, opts)
terraform.InitAndApply(t, opts)
vpcId := terraform.Output(t, opts, "vpc_id")
assert.NotEmpty(t, vpcId)
}
Checkov#
Checkov is a static analysis tool that scans Terraform, CloudFormation, and Kubernetes configs for security misconfigurations:
checkov -d ./terraform --framework terraform
# Checks: 800+ built-in policies
# Covers: encryption, public access, IAM, networking
Additional Testing Layers#
terraform validate— syntax and internal consistencytflint— linting and provider-specific rules- OPA / Sentinel — policy-as-code gating in CI
infracost— cost estimation before apply
Multi-Cloud Patterns#
For organizations targeting multiple cloud providers:
- Abstraction modules — wrap provider-specific resources behind a common interface
- Workload placement — use Terraform providers selectively per service
- Shared state — reference outputs across configurations with
terraform_remote_state - Provider-agnostic layers — Kubernetes, databases, and DNS often abstract well; IAM and networking rarely do
The pragmatic approach: don't abstract prematurely. Use multi-cloud where it provides genuine resilience or vendor leverage, not as a default.
IaC Best Practices#
- Pin provider versions — avoid surprise breaking changes
- Use remote state with locking — never commit
.tfstatefiles - Modularize early — small, composable modules over monolithic configs
- Separate environments — workspaces or directory-based isolation
- Automate plans in CI — every PR gets a plan output
- Enforce policies — Sentinel, OPA, or Checkov in the pipeline
- Tag everything — cost allocation, ownership, environment
- Limit blast radius — split state files by domain (networking, compute, data)
- Document variables — descriptions and validation blocks in every variable
- Review destroy operations — treat resource deletion with extra scrutiny
Wrapping Up#
Infrastructure as code has moved from best practice to baseline expectation. Whether you choose Terraform for its ecosystem, Pulumi for language familiarity, or CloudFormation for AWS-native integration, the principles remain the same: version everything, automate deployment, test before apply, and detect drift continuously.
The next frontier — GitOps infrastructure with policy-as-code guardrails — makes IaC not just a provisioning tool but a full governance framework.
Start building with these patterns at codelit.io.
143 articles on system design at codelit.io/blog.
Try it on Codelit
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
GitHub Integration
Paste a repo URL and generate architecture from your actual codebase
Related articles
Try these templates
Cloud File Storage Platform
Dropbox-like file storage with sync, sharing, versioning, and real-time collaboration.
8 componentsDropbox Cloud Storage Platform
Cloud file storage and sync with real-time collaboration, versioning, sharing, and cross-device sync.
10 componentsCI/CD Pipeline Architecture
End-to-end continuous integration and deployment with testing, security scanning, staging, and production rollout.
10 componentsBuild this architecture
Generate an interactive architecture for Infrastructure as Code in seconds.
Try it in Codelit →
Comments