Databricks Private Network Setup with Terraform
Every time I've deployed a VNet-injected Databricks workspace from scratch, something unexpected bites me. The Terraform docs are fine for individual resources, but they don't tell you the order things have to happen, the NSG trap, or why the AVM module's private endpoint support silently breaks. This page documents the exact setup I've landed on for a fully private workspace with no public IPs.
What I'm deploying
The architecture is a hub/spoke layout:
- Hub VNet (
10.50.0.0/16) holds the private endpoint subnet and private DNS zones - Spoke VNet (
10.51.0.0/16) holds the Databricks compute plane (public and private subnets), plus an Azure Bastion and jump VM for management access - Bidirectional VNet peering connects the two
- All Databricks traffic flows through private endpoints — the workspace URL never resolves to a public IP
The configuration is driven from a single config variable:
config = {
business_prefix = "techanalytics"
location = "australiaeast"
location_code = "ae"
naming_number = "001"
subscription_id = "870d576c-7cb8-4488-b3b5-73a200077ac2"
address_space_hub = ["10.50.0.0/16"]
address_space_spoke = ["10.51.0.0/16"]
subnets = {
private_endpoint = "10.50.3.0/24"
public = "10.51.3.0/24"
private = "10.51.4.0/24"
bastion = "10.51.1.0/27"
vm = "10.51.2.0/24"
}
vm_admin_username = "azureadmin"
vm_size = "Standard_B2s"
}
Subnet requirements for VNet injection
Databricks VNet injection requires two subnets in the spoke: a public (host) subnet and a private (container) subnet. Both need Microsoft.Databricks/workspaces service delegation. There's a catch I've hit: once a subnet has delegation and network policies are applied, you cannot remove the delegation without force-replacing the subnet. Azure rejects it with SubnetDelegationsCannotBeRemovedWhenSubnetHasNetworkPolicies. So get the delegation right the first time.
resource "azurerm_subnet" "public" {
name = local.subnet_names.public
resource_group_name = azurerm_resource_group.spoke.name
virtual_network_name = azurerm_virtual_network.spoke.name
address_prefixes = [var.config.subnets.public]
delegation {
name = "databricks-del-public"
service_delegation {
name = "Microsoft.Databricks/workspaces"
actions = [
"Microsoft.Network/virtualNetworks/subnets/join/action",
"Microsoft.Network/virtualNetworks/subnets/prepareNetworkPolicies/action",
"Microsoft.Network/virtualNetworks/subnets/unprepareNetworkPolicies/action",
]
}
}
}
The private subnet has identical delegation. Service endpoints on these subnets are not required — the private endpoint on the DFS subresource handles ADLS Gen2 access from cluster nodes without them.
The NSG requirement
A VNet-injected workspace requires an NSG attached to both Databricks subnets. The NSG doesn't need any rules, but it must exist and be associated before the workspace can be created. Without it, the Databricks resource create fails.
I set network_security_group_rules_required = "NoAzureDatabricksRules" on the workspace, which tells the platform not to inject its default security group rules. This gives full control over traffic policy. The NSG is essentially just a placeholder to satisfy the injection requirement.
resource "azurerm_network_security_group" "databricks" {
name = local.nsg_databricks_name
location = azurerm_resource_group.spoke.location
resource_group_name = azurerm_resource_group.spoke.name
tags = local.tags
}
resource "azurerm_subnet_network_security_group_association" "public" {
subnet_id = azurerm_subnet.public.id
network_security_group_id = azurerm_network_security_group.databricks.id
}
resource "azurerm_subnet_network_security_group_association" "private" {
subnet_id = azurerm_subnet.private.id
network_security_group_id = azurerm_network_security_group.databricks.id
}
Access connector and UAMI
The access connector is what wires a managed identity to the workspace. When default_storage_firewall_enabled = true is set on the workspace, every storage access goes through the connector identity rather than keys or tokens. I use a UAMI (User Assigned Managed Identity) rather than a system-assigned identity because the lifecycle is independent. A system-assigned identity is tied to the access connector resource — if you ever need to replace the connector, you lose the identity and have to re-assign all the roles.
resource "azurerm_user_assigned_identity" "databricks" {
name = local.uami_databricks_name
location = azurerm_resource_group.hub.location
resource_group_name = azurerm_resource_group.hub.name
tags = local.tags
}
resource "azurerm_databricks_access_connector" "databricks" {
name = local.access_connector_name
location = azurerm_resource_group.hub.location
resource_group_name = azurerm_resource_group.hub.name
identity {
type = "SystemAssigned, UserAssigned"
identity_ids = [azurerm_user_assigned_identity.databricks.id]
}
tags = local.tags
}
Workspace via the AVM module
I'm using the Azure Verified Module Azure/avm-res-databricks-workspace/azurerm at version 0.5.0. It handles the workspace resource and the diagnostic settings, but there's a critical issue with its private endpoint support as of this version.
The module uses azapi_resource internally rather than azurerm_databricks_workspace. When you pass a private_endpoints block to the module, its internal PE creation logic can't correctly resolve the workspace resource ID from the azapi_resource before attempting to attach the private service connection. The apply fails with an empty or partially-known resource ID on the PE. The fix is simple: don't pass the private_endpoints block to the module. Instead, create the private endpoints as standalone resources that explicitly depend on the module completing.
module "databricks" {
source = "Azure/avm-res-databricks-workspace/azurerm"
version = "0.5.0"
name = local.databricks_workspace_name
location = azurerm_resource_group.hub.location
resource_group_name = azurerm_resource_group.hub.name
sku = "premium"
access_connector_id = azurerm_databricks_access_connector.databricks.id
custom_parameters = {
no_public_ip = true
public_subnet_name = azurerm_subnet.public.name
public_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.public.id
private_subnet_name = azurerm_subnet.private.name
private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.private.id
virtual_network_id = azurerm_virtual_network.spoke.id
}
default_storage_firewall_enabled = true
network_security_group_rules_required = "NoAzureDatabricksRules"
public_network_access_enabled = false
tags = local.tags
depends_on = [
azurerm_virtual_network_peering.hub_to_spoke,
azurerm_virtual_network_peering.spoke_to_hub,
azurerm_private_dns_zone_virtual_network_link.databricks_hub,
azurerm_private_dns_zone_virtual_network_link.databricks_spoke,
]
}
The depends_on here is important. Terraform won't automatically know the workspace needs the peerings and DNS links in place before it starts creating — they're not referenced in the workspace resource attributes. Without the explicit dependency, Terraform may try to create the workspace before the DNS zone is linked to the VNet, which means the workspace's initial health checks fail to resolve DNS internally.
Private endpoints — do these separately
Two private endpoints are needed: databricks_ui_api (for the workspace URL) and browser_authentication (for the login flow). The auth endpoint depends on the UI API endpoint existing first.
Both go on the private endpoint subnet in the hub VNet, with DNS records in the privatelink.azuredatabricks.net zone:
resource "azurerm_private_endpoint" "databricks_ui_api" {
name = local.pep_ui_api_name
location = azurerm_resource_group.hub.location
resource_group_name = azurerm_resource_group.hub.name
subnet_id = azurerm_subnet.private_endpoint.id
private_service_connection {
name = local.psc_ui_api_name
private_connection_resource_id = module.databricks.resource_id
is_manual_connection = false
subresource_names = ["databricks_ui_api"]
}
private_dns_zone_group {
name = "pdnszg-dbx-ui"
private_dns_zone_ids = [azurerm_private_dns_zone.databricks.id]
}
}
resource "azurerm_private_endpoint" "databricks_auth" {
name = local.pep_auth_name
location = azurerm_resource_group.hub.location
resource_group_name = azurerm_resource_group.hub.name
subnet_id = azurerm_subnet.private_endpoint.id
private_service_connection {
name = local.psc_auth_name
private_connection_resource_id = module.databricks.resource_id
is_manual_connection = false
subresource_names = ["browser_authentication"]
}
private_dns_zone_group {
name = "pdnszg-dbx-auth"
private_dns_zone_ids = [azurerm_private_dns_zone.databricks.id]
}
depends_on = [azurerm_private_endpoint.databricks_ui_api]
}
VNet peering
Both directions of the peering must be created. Terraform won't infer the reverse peering — it's two separate resources:
resource "azurerm_virtual_network_peering" "hub_to_spoke" {
name = local.peering_hub_to_spoke_name
resource_group_name = azurerm_resource_group.hub.name
virtual_network_name = azurerm_virtual_network.hub.name
remote_virtual_network_id = azurerm_virtual_network.spoke.id
allow_forwarded_traffic = true
allow_gateway_transit = false
}
resource "azurerm_virtual_network_peering" "spoke_to_hub" {
name = local.peering_spoke_to_hub_name
resource_group_name = azurerm_resource_group.spoke.name
virtual_network_name = azurerm_virtual_network.spoke.name
remote_virtual_network_id = azurerm_virtual_network.hub.id
allow_forwarded_traffic = true
use_remote_gateways = false
}
Things I've gotten wrong
Forgetting to create both peering directions. Traffic just doesn't flow and the workspace either times out or shows nodes stuck in pending. Terraform creates whichever direction is defined — it won't warn you the reverse is missing.
Setting public_network_access_enabled = false without DNS zone links in place. The workspace creates successfully, but opening the URL returns a connection reset. The browser hits the public IP, which is blocked, and the private IP can't be resolved because the DNS link isn't registered yet. The DNS zone must be linked to both VNets before the workspace is usable.
Passing private_endpoints to the AVM module. The apply completes the workspace creation then immediately fails on the PE attachment with a resource ID error. The standalone resource approach is the only reliable path until the module resolves this internally.
Using system-assigned identity on the access connector then replacing the connector. Replacing the resource (not just updating it) destroys and recreates the system identity, which means all downstream role assignments are invalid and need to be redone. UAMI avoids this entirely.