Skip to main content

Terraform Check Blocks for Azure Validation

For a long time, the way I validated Azure infrastructure in Terraform was a combination of lifecycle conditions, external scripts run post-apply in CI, and occasionally null resources that called out to Azure CLI. All of these worked, but they were all held together with duct tape. The check block landed in Terraform 1.5 and I've been steadily replacing those patterns ever since.

The thing that makes check blocks different from every other validation mechanism in Terraform is that a failed check does not block the operation. It emits a warning and keeps going. That sounds like a limitation, but it's actually the feature. It means I can validate real-world infrastructure state — things that exist outside the Terraform resource graph, things that might legitimately take time to stabilise — without turning those assertions into hard blockers that cause apply failures.

What check blocks replace

Post-apply shell scripts in CI

The most common pattern I'd inherited: after terraform apply, the CI pipeline runs a Bash or PowerShell script that pings endpoints, queries Azure APIs, checks resource tags, and exits non-zero if something looks wrong. The problems are predictable — the script isn't colocated with the infrastructure code, it's not version-controlled as part of the module, and when it fails it's completely disconnected from the Terraform output.

A check block moves that validation into the Terraform run itself:

check "app_service_health" {
data "http" "health" {
url = "https://${azurerm_linux_web_app.api.default_hostname}/health"
}

assert {
condition = data.http.health.status_code == 200
error_message = "App Service health endpoint returned ${data.http.health.status_code} after deployment"
}
}

One advantage that's easy to miss: the nested data source is fetched as the very last step of the apply — after all resources are provisioned. That timing is intentional and makes health checks actually useful rather than racing against provisioning.

Null resources for validation

The null resource / local-exec combo for post-deploy checks is painful to debug and produces no useful output in the Terraform plan. Check blocks produce structured warnings that show up in the plan and apply output with proper context.

External lifecycle validations scattered across modules

I'd see modules with precondition and postcondition blocks inside resource lifecycle blocks to validate state. These are the right tool for "this resource cannot be created unless X is true" — hard blockers. But when I wanted "warn me if this resource drifts into a bad state," check blocks are a better fit. They don't fail the apply, and they run every plan, so drift shows up continuously.

Azure-specific use cases

Continuous validation on HCP Terraform

In HCP Terraform, check blocks can be used for continuous health validation — not just during apply, but on a schedule. I use this for:

  • Verifying that Key Vault certificates are not approaching expiry
  • Checking that NSG rules haven't drifted from approved state
  • Confirming that private endpoints are still returning healthy status
  • Validating that APIM management APIs are reachable after config changes

The setup is the same as a regular check block. HCP Terraform runs the checks on a schedule against live infrastructure, separate from any Terraform apply. This replaced half of what I had in Azure Monitor alerts for infrastructure config drift.

Validating Azure resource tags post-deployment

Tag compliance is one of those things that's easy to enforce with lifecycle.precondition at apply time within a module you control, but harder when you're validating tags on resources managed by other teams or by other Terraform roots. A check block with a data source can reach out and verify:

check "cost_centre_tag" {
data "azurerm_resource_group" "workload" {
name = azurerm_resource_group.workload.name
}

assert {
condition = contains(keys(data.azurerm_resource_group.workload.tags), "cost-centre")
error_message = "Resource group ${azurerm_resource_group.workload.name} is missing required 'cost-centre' tag"
}

assert {
condition = contains(keys(data.azurerm_resource_group.workload.tags), "environment")
error_message = "Resource group ${azurerm_resource_group.workload.name} is missing required 'environment' tag"
}
}

Both assertions run inside the same check block. One block, multiple failures surfaced in a single warning.

Post-deploy endpoint validation

After deploying an Azure Front Door or Application Gateway configuration, I want to know the origin is actually reachable and returning the right status. This is the use case the nested data block was built for:

check "frontdoor_origin_health" {
data "http" "origin" {
url = "https://${azurerm_cdn_frontdoor_endpoint.main.host_name}/health"

request_headers = {
X-Forwarded-Host = var.primary_domain
}
}

assert {
condition = data.http.origin.status_code == 200
error_message = <<-EOT
Front Door origin health check failed.
Endpoint: ${azurerm_cdn_frontdoor_endpoint.main.host_name}
Status: ${data.http.origin.status_code}
EOT
}
}

I've had this catch Front Door routing rule misconfigurations that the resource itself applied successfully — the resource was valid, the routing was wrong. The check block surfaces that before the next PR merges.

Key Vault certificate and secret expiry

I can't enforce certificate expiry in a precondition without making every apply fail. A check block is the right instrument:

check "acme_certificate_expiry" {
data "azurerm_key_vault_certificate" "acme" {
name = "acme-tls"
key_vault_id = azurerm_key_vault.shared.id
}

assert {
condition = timecmp(data.azurerm_key_vault_certificate.acme.expires, timeadd(timestamp(), "720h")) > 0
error_message = "ACME TLS certificate expires within 30 days. Renew before ${data.azurerm_key_vault_certificate.acme.expires}."
}
}

Combined with regular HCP Terraform health check runs, this means the expiry warning shows up in Terraform output weeks before there's a problem rather than in a 2am incident.

Validating RBAC state after assignment

Role assignment propagation in Azure is eventually consistent. I use a check block to validate that the managed identity I just assigned a role to can actually access the target resource, rather than assuming the AzureRM provider's success means propagation is complete:

check "storage_rbac_propagated" {
data "azurerm_role_assignment" "app_identity" {
scope = azurerm_storage_account.data.id
role_definition_id = data.azurerm_role_definition.storage_contributor.role_definition_id
principal_id = azurerm_user_assigned_identity.app.principal_id
}

assert {
condition = data.azurerm_role_assignment.app_identity.id != null
error_message = "Role assignment for ${azurerm_user_assigned_identity.app.name} on storage account not yet visible. RBAC propagation may still be in progress."
}
}

This doesn't block the apply — the resource was created successfully. It flags when the assignment isn't readable yet, which saves time debugging "why can't my app auth to storage" issues where the real answer is "wait 60 seconds."

DNS resolution after private endpoint creation

After deploying a private endpoint and the associated private DNS zone link, I want to confirm the hostname resolves correctly inside the VNet. I can't run DNS lookups in Terraform natively, but I can call an Azure-hosted function or use a canary resource:

check "private_endpoint_dns" {
data "http" "dns_check" {
url = "https://${var.network_validation_function_url}/dns-check?hostname=${local.storage_private_hostname}"

request_headers = {
x-functions-key = var.validation_function_key
}
}

assert {
condition = jsondecode(data.http.dns_check.response_body).resolves == true
error_message = "Private DNS for ${local.storage_private_hostname} did not resolve. Check private DNS zone links on ${azurerm_virtual_network.hub.name}."
}
}

This pattern involves maintaining a thin validation function, which has overhead — but for critical networking changes, the early detection is worth it.

How it sits alongside other validation mechanisms

I think of Terraform's validation tools as three tiers with different purposes:

MechanismBlocks on failureWhen it runsBest for
Variable validationYesBefore planInvalid input values
lifecycle.preconditionYesDuring planPre-conditions a resource requires
lifecycle.postconditionYesAfter applyGuarantees a resource must satisfy
check blockNo (warning only)End of plan/applyAmbient health, drift detection, real-world state

The non-blocking nature is the key differentiator. precondition and postcondition are assertions about the Terraform graph. check blocks are assertions about the real world — they can observe things that Terraform doesn't own, or observe the results of things that take time to propagate.

Things I've gotten wrong

Using check blocks where I should use preconditions. If something must be true for the apply to be meaningful, it should block. A check block that fires a warning nobody looks at is no better than no check at all. I now ask: "if this fails, should we stop?" If yes, precondition. If no, check.

Not scoping assertions tightly enough. Early on I wrote check blocks that pulled large data sources and wrote generic error messages. The error messages are the output — they need to be specific and actionable. I now treat the error_message like I treat a runbook alert: it should tell you exactly what failed and what to look at.

Forgetting that nested data source errors become warnings. If the provider fails to fetch the nested data source — authentication error, resource doesn't exist yet, transient API issue — Terraform masks it as a warning. That means a silent non-result. I always explicitly assert on things that would be falsy if the data source returned empty, so a failure to fetch surfaces as a failed assertion rather than a silent pass.

Putting check blocks in modules. Check blocks in modules run every time the module is used, across every workspace. For ambient health checks I want to run continuously, I put them in the root module where I can control when they're active. Module-level check blocks are fine for contract tests, but not for infrastructure health polling.