Terraform for Enterprise Platforms
This page covers how I structure Terraform in enterprise environments — not the basics of writing resources, but the decisions around organisation, state, modules, and the newer Stacks feature that change how everything fits together at scale.
Modules vs Stacks — picking the right model
One of the questions that came up repeatedly as HCP Terraform's Stacks feature matured: if modules do the job, why would you need Stacks? The answer took me a while to articulate clearly.
Modules are reusable configurations. They're the right unit for composing infrastructure — a vnet module, a storage_account module, a landing_zone module that assembles several smaller ones. Modules reduce duplication and enforce consistency within a single Terraform working directory. They share a backend and a state file.
Stacks are an orchestration layer above modules. Each component in a Stack has its own state, and HCP Terraform manages the dependency graph between states — including cross-stack outputs and deployment ordering. When you have infrastructure split across multiple state files that need to deploy in the right sequence, Stacks move that orchestration logic out of your CI/CD scripts and into the platform.
The practical difference:
| Modules | Stacks | |
|---|---|---|
| State | Shared single state | Each component has its own state |
| Dependencies | Internal to one working directory | Cross-state, managed by HCP Terraform |
| Orchestration | Your CI/CD pipeline | HCP Terraform handles sequencing |
| Best for | Composable, reusable infrastructure within a deployment | Multi-layer infrastructure (networking → identity → workloads) |
I still use modules everywhere. Stacks are the right choice when orchestrating the kind of layered landing zone deployments where networking must exist before identity can be configured, and identity must exist before workloads can deploy — and I don't want to manage that sequencing in YAML.
Module design principles
When I'm building a module that will be reused across teams, these are the things I get right upfront:
Explicit over implicit. Every input that controls behaviour should be a variable. Hardcoded values in modules become hard-to-trace bugs when someone uses the module in a different context.
Minimal interface. Don't expose every possible property as a variable. Expose what callers actually need to customise. Everything else gets a sensible default.
Outputs for every resource. Always output the IDs and names of created resources so callers can reference them.
Version pinning. Pin the azurerm provider version from the module definition, not just the root configuration.
# Good module interface — VNet module example
variable "name" { type = string }
variable "address_space" { type = list(string) }
variable "location" { type = string }
variable "resource_group_name" { type = string }
variable "dns_servers" { type = list(string); default = [] }
variable "tags" { type = map(string); default = {} }
output "id" { value = azurerm_virtual_network.this.id }
output "name" { value = azurerm_virtual_network.this.name }
Conditional resource provisioning
I use for_each with a conditional expression to optionally provision resources based on input. A common pattern: conditionally add a resource to a service group when a service group ID is provided.
variable "service_group_id" {
type = string
default = ""
description = "Optional. When set, registers this resource with the specified service group."
}
resource "azurerm_resource_group_template_deployment" "service_group_membership" {
# Only create this resource if a service group ID was provided
for_each = var.service_group_id != "" ? { register = var.service_group_id } : {}
name = "sg-membership-${each.value}"
resource_group_name = azurerm_resource_group.this.name
deployment_mode = "Incremental"
template_content = jsonencode({
"$schema" = "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#"
contentVersion = "1.0.0.0"
resources = []
# Service group membership registration...
})
}
The for_each = condition ? { key = value } : {} pattern is the standard Terraform idiom for optional single-resource provisioning. I prefer this over count because it produces a named instance rather than an index.
Dynamic subnet allocation
When I need to provision subnets dynamically without hardcoding prefixes, I use Terraform's actions capability (preview feature on HCP Terraform) to calculate the next available address range before the plan runs.
For the common case of just needing non-overlapping subnets within a known VNet address space, I pre-calculate using cidrsubnet:
locals {
# Define subnets relative to the VNet address space
# VNet: 10.1.0.0/16
# Subnets start at offset 0, each /24
subnets = {
"snet-app" = { prefix = cidrsubnet("10.1.0.0/16", 8, 0) } # 10.1.0.0/24
"snet-data" = { prefix = cidrsubnet("10.1.0.0/16", 8, 1) } # 10.1.1.0/24
"snet-mgmt" = { prefix = cidrsubnet("10.1.0.0/16", 8, 2) } # 10.1.2.0/24
"AzureBastionSubnet" = { prefix = cidrsubnet("10.1.0.0/16", 10, 12) } # /26
}
}
resource "azurerm_subnet" "this" {
for_each = local.subnets
name = each.key
resource_group_name = azurerm_resource_group.this.name
virtual_network_name = azurerm_virtual_network.this.name
address_prefixes = [each.value.prefix]
}
State management at scale
For multi-team enterprise use, state organisation matters as much as the Terraform code itself. The model I use:
One state file per deployment boundary. A deployment boundary is the thing that should deploy and roll back together. Platform networking is one boundary. An application workload is another. Don't mix them.
State files in separate containers by environment:
# Storage account: sttfstate<org><env>
# Containers: platform-connectivity, platform-identity, platform-management, app-<name>
# Platform connectivity (hub VNet, firewall, ExpressRoute)
key = "platform/connectivity/terraform.tfstate"
# Application landing zone
key = "apps/myapp/prod/terraform.tfstate"
Lock every state file during apply. Azure Blob Storage does this automatically via lease mechanisms. I've never needed to explicitly configure locking — but I do need to handle the failure mode when a lock is left behind. See GitHub Actions for Terraform for how I handle that in pipelines.
Remote state lookups
When an application Terraform configuration needs to reference platform infrastructure (the hub VNet ID, a subnet ID), I use data sources rather than hardcoded values:
# Reference platform networking state from application configuration
data "terraform_remote_state" "connectivity" {
backend = "azurerm"
config = {
resource_group_name = "rg-terraform-state"
storage_account_name = "sttfstateorgprod"
container_name = "platform-connectivity"
key = "platform/connectivity/terraform.tfstate"
}
}
# Use the hub VNet ID from the platform state
resource "azurerm_virtual_network_peering" "spoke_to_hub" {
remote_virtual_network_id = data.terraform_remote_state.connectivity.outputs.hub_vnet_id
# ...
}
This explicit coupling makes the dependency between states visible and auditable.