How virtual network service endpoints work
Posted on: November 20, 2023Virtual Network service endpoints topic has come up few times in the last month, so I decided to write a blog post about it. It’s not a new thing but it’s still something that people have missed or haven’t quite understood how it works in practice. Therefore, I’ll try to go through with very concrete example.
From Virtual Network service endpoints documentation:
Today, any routes in your virtual network that force internet traffic to your on-premises and/or virtual appliances also force Azure service traffic to take the same route as the internet traffic. Service endpoints provide optimal routing for Azure traffic. … Endpoints always take service traffic directly from your virtual network to the service on the Microsoft Azure backbone network.
Test setup
To show this in practice, I’m going to build the following test setup:
Application running in virtual machine and it’s sending data to Azure Table Storage. Virtual machine is running in subnet which has User-Defined Route (UDR) and it forces all traffic to Network Virtual Appliance (NVA).
Here are the Azure resources that are needed for this test setup:
Application is simple PowerShell script that sends data to Azure Table Storage. I’m using Ubuntu virtual machine so I installed PowerShell to it.
Here is simplified vanilla PowerShell version of the script using managed identity and Azure Table Storage REST API:
$storageName = "stvnetstorageendpoints"
$operationsTableName = "operations"
$ticksPerDay = [timespan]::FromDays(1).Ticks
$messageNumber = 1
$url = "https://$storageName.table.core.windows.net/$operationsTableName"
$headers = @{
"x-ms-version" = "2023-11-03"
"Accept" = "application/json;odata=nometadata"
"Prefer" = "return-no-content"
}
$token = Invoke-RestMethod `
-Headers @{ Metadata = "true" } `
-Uri "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://$storageName.table.core.windows.net/"
$secureAccessToken = ConvertTo-SecureString -AsPlainText -String $token.access_token
Invoke-RestMethod `
-Body (ConvertTo-Json @{ "TableName" = "$operationsTableName" }) `
-ContentType "application/json" `
-Method "POST" `
-Authentication Bearer `
-Headers $headers `
-Token $secureAccessToken `
-Uri "https://$storageName.table.core.windows.net/Tables" `
-ErrorAction SilentlyContinue
while ($true) {
$body = ConvertTo-Json @{
"PartitionKey" = Get-Date -AsUtc -Format "yyyy-MM-dd"
"RowKey" = [string]($ticksPerDay - (Get-Date -AsUtc).TimeOfDay.Ticks)
"MessageTime" = Get-Date -AsUtc
"Message" = "OK"
"MessageNumber" = $messageNumber++
}
Invoke-RestMethod `
-Body $body `
-ContentType "application/json" `
-Method "POST" `
-Authentication Bearer `
-Headers $headers `
-Token $secureAccessToken `
-TimeoutSec 5 `
-Uri $url | Out-Null
Start-Sleep -Seconds 1
}
And as always, full source code is available in the GitHub:
Now we have all the pieces in place, and we can start testing.
Test 1: No UDRs and no service endpoints
We haven’t yet enabled any services endpoints for that subnet:
No big surprises in this test. Everything works as expected. Traffic flows using the default system routes. Public IP is used for outbound communication to the internet.
We can see data in the table storage exactly as we expected:
Okay this is slightly off-topic, but you might be wondering, is the outbound traffic really using public IP address?
If you do curl https://myip.jannemattila.com
then response will show your Public IP.
So yes in that regards.
Therefore, you might be tempted to think that you can use that IP for Storage account firewall:
But after you enable that you see that our application is not able to communicate to the storage account anymore:
The error message is:
{
"odata.error": {
"code": "AuthorizationFailure",
"message": {
"lang": "en-US",
"value": "This request is not authorized to perform this operation\nRequestId:5f73e099-a002-006e-5230-17757c000000\nTime:2023-11-14T19:30:02.6638006Z"
}
}
}
Reason is simple and documented in Restrictions for IP network rules.
You can’t use IP network rules in the following cases:
… IP network rules have no effect on requests that originate from the same Azure region as the storage account. … Services deployed in the same region as the storage account use private Azure IP addresses for communication. So, you can’t restrict access to specific Azure services based on their public outbound IP address range.
And no, you cannot use private IP address for storage account firewall either:
Test 2: UDR and no service endpoints
In this test setup we will use UDR to force all traffic to NVA.
But since this is test setup, we don’t have NVA running so those packages will be sent to /dev/null
(=dropped).
With this script we can enable 0.0.0.0/0
route to NVA, let it impact for e.g., 120 seconds and then remove that route:
# Add route to NVA
$routeTable = Get-AzRouteTable -ResourceGroupName "rg-vnet-service-endpoints-demo" -Name "rt-app"
Add-AzRouteConfig -Name "to-nva" -AddressPrefix 0.0.0.0/0 -NextHopType "VirtualAppliance" -NextHopIpAddress 10.10.10.10 -RouteTable $routeTable
$routeTable | Set-AzRouteTable
Start-Sleep -Seconds 120
# Remove route
$routeTable | Remove-AzRouteConfig -Name "to-nva" | Set-AzRouteTable
When to-vna
route is in use, you should see this in the route table:
Any connections, like SSH, will get stuck during this period.
This is also visible in our data since data upload was blocked for 2 minutes and 13 seconds:
That was due to default value of -TimeoutSec
parameter in
Invoke-RestMethod
with indefinite
timeout.
If I change that to e.g., 5 seconds, then I’ll see packet drops:
If I now improve upload logic by introducing simple queue for failed messages and re-process them, I can now see those errors in the table:
The error message is: The request was canceled due to the configured HttpClient.Timeout of 5 seconds elapsing.
Now we have seen that UDR works as expected. It forces all traffic to NVA and our application is not able to send data to the table storage in this test setup.
Since our UDR forced all the traffic to NVA, it means that no other outbound traffic to the internet actually worked during the test period.
Test 3: UDR and Storage service endpoint
Virtual network service endpoints are enabled on a per-subnet basis within a VNet.
In this test setup we will enable Microsoft.Storage
service endpoint for our test subnet:
Above setting changes the routing behavior in the network. The best way to see the effective routes is to go to:
Virtual Machine > Networking > Network interface > Effective routes
Virtual machine must be running in order to see the effective routes. And unfortunately, no, you cannot test this with just a network interface.
Here are the effective routes from our test setup:
There are two VirtualNetworkServiceEndpoint
routes with many many many address prefixes.
Yes, it would be nice to have some additional metadata in the view to making it easier to understand
which service endpoint is responsible for which route.
You can pick any of the IP addresses and validate that it’s indeed from Storage account IP address range:
Get-AzureDatacenterIPOrNo -IP 191.239.203.0 | Format-Table
Here is the output:
IpRange Source SystemService Ip Region
------- ------ ------------- -- ------
191.239.203.0/28 ServiceTags_Public_20231106 AzureStorage 191.239.203.0
191.239.203.0/28 ServiceTags_Public_20231106 AzureStorage 191.239.203.0 westeurope
191.239.200.0/22 ServiceTags_Public_20231106 191.239.203.0 westeurope
191.239.200.0/22 ServiceTags_Public_20231106 191.239.203.0
The above routing definitions are more specific and thus override our to-nva
route.
Traffic is therefore directly routed to the Storage account.
It’s also important to understand how Azure selects a route:
If multiple routes contain the same address prefix, Azure selects the route type, based on the following priority:
- User-defined route
- BGP route
- System route
Let’s test our scenario now in practice. I will enable to-nva
route for 120 seconds and then remove it
similarly as in previous test.
SSH gets stuck as previously:
However, our application continuously pumps data into the table storage without any issues:
So, it works as expected. Traffic to the Storage account is not impacted by the UDR. This also explains why you don’t see this traffic in your NVA logs.
Summary
I hope I have managed to explain how Virtual Network service endpoints work in practice. I think it’s valuable to give tools for people to understand how things work.
I can’t recommend this video enough to get better understanding about routing in Azure:
“Does my traffic stay on the Microsoft Network?” – Adam Stuart
I’m planning to write more about this topic in the future and especially about SNAT port exhaustion which every now and then causes issues for people.
I hope you find this useful!