Janne Mattila

From programmer to programmer -- Programming just for the fun of it

How virtual network service endpoints work

Posted on: November 20, 2023

Virtual Network service endpoints topic has come up few times in the last month, so I decided to write a blog post about it. It’s not a new thing but it’s still something that people have missed or haven’t quite understood how it works in practice. Therefore, I’ll try to go through with very concrete example.

From Virtual Network service endpoints documentation:

Today, any routes in your virtual network that force internet traffic to your on-premises and/or virtual appliances also force Azure service traffic to take the same route as the internet traffic. Service endpoints provide optimal routing for Azure traffic. … Endpoints always take service traffic directly from your virtual network to the service on the Microsoft Azure backbone network.

Test setup

To show this in practice, I’m going to build the following test setup:

Application running in virtual machine and it’s sending data to Azure Table Storage. Virtual machine is running in subnet which has User-Defined Route (UDR) and it forces all traffic to Network Virtual Appliance (NVA).

Here are the Azure resources that are needed for this test setup:

Application is simple PowerShell script that sends data to Azure Table Storage. I’m using Ubuntu virtual machine so I installed PowerShell to it.

Here is simplified vanilla PowerShell version of the script using managed identity and Azure Table Storage REST API:

$storageName = "stvnetstorageendpoints"
$operationsTableName = "operations"
$ticksPerDay = [timespan]::FromDays(1).Ticks
$messageNumber = 1

$url = "https://$storageName.table.core.windows.net/$operationsTableName"
$headers = @{
    "x-ms-version" = "2023-11-03"
    "Accept"       = "application/json;odata=nometadata"
    "Prefer"       = "return-no-content"
}

$token = Invoke-RestMethod `
    -Headers @{ Metadata = "true" } `
    -Uri "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://$storageName.table.core.windows.net/"
$secureAccessToken = ConvertTo-SecureString -AsPlainText -String $token.access_token

Invoke-RestMethod `
    -Body (ConvertTo-Json @{ "TableName" = "$operationsTableName" }) `
    -ContentType "application/json" `
    -Method "POST" `
    -Authentication Bearer `
    -Headers $headers `
    -Token $secureAccessToken `
    -Uri "https://$storageName.table.core.windows.net/Tables" `
    -ErrorAction SilentlyContinue

while ($true) {
    $body = ConvertTo-Json @{ 
        "PartitionKey"  = Get-Date -AsUtc -Format "yyyy-MM-dd"
        "RowKey"        = [string]($ticksPerDay - (Get-Date -AsUtc).TimeOfDay.Ticks)
        "MessageTime"   = Get-Date -AsUtc
        "Message"       = "OK"
        "MessageNumber" = $messageNumber++
    }

    Invoke-RestMethod `
        -Body $body `
        -ContentType "application/json" `
        -Method "POST" `
        -Authentication Bearer `
        -Headers $headers `
        -Token $secureAccessToken `
        -TimeoutSec 5 `
        -Uri $url | Out-Null

    Start-Sleep -Seconds 1
}

And as always, full source code is available in the GitHub:

Now we have all the pieces in place, and we can start testing.

Test 1: No UDRs and no service endpoints

We haven’t yet enabled any services endpoints for that subnet:

No big surprises in this test. Everything works as expected. Traffic flows using the default system routes. Public IP is used for outbound communication to the internet.

We can see data in the table storage exactly as we expected:

Okay this is slightly off-topic, but you might be wondering, is the outbound traffic really using public IP address?

If you do curl https://myip.jannemattila.com then response will show your Public IP. So yes in that regards.

Therefore, you might be tempted to think that you can use that IP for Storage account firewall:

But after you enable that you see that our application is not able to communicate to the storage account anymore:

The error message is:

{
  "odata.error": {
    "code": "AuthorizationFailure",
    "message": {
    "lang": "en-US",
    "value": "This request is not authorized to perform this operation\nRequestId:5f73e099-a002-006e-5230-17757c000000\nTime:2023-11-14T19:30:02.6638006Z"
    }
  }
}

Reason is simple and documented in Restrictions for IP network rules.

You can’t use IP network rules in the following cases:
IP network rules have no effect on requests that originate from the same Azure region as the storage account. … Services deployed in the same region as the storage account use private Azure IP addresses for communication. So, you can’t restrict access to specific Azure services based on their public outbound IP address range.

And no, you cannot use private IP address for storage account firewall either:

Test 2: UDR and no service endpoints

In this test setup we will use UDR to force all traffic to NVA. But since this is test setup, we don’t have NVA running so those packages will be sent to /dev/null (=dropped).

With this script we can enable 0.0.0.0/0 route to NVA, let it impact for e.g., 120 seconds and then remove that route:

# Add route to NVA
$routeTable = Get-AzRouteTable -ResourceGroupName "rg-vnet-service-endpoints-demo" -Name "rt-app"
Add-AzRouteConfig -Name "to-nva" -AddressPrefix 0.0.0.0/0 -NextHopType "VirtualAppliance" -NextHopIpAddress 10.10.10.10 -RouteTable $routeTable 
$routeTable | Set-AzRouteTable

Start-Sleep -Seconds 120

# Remove route
$routeTable | Remove-AzRouteConfig -Name "to-nva" | Set-AzRouteTable

When to-vna route is in use, you should see this in the route table:

Any connections, like SSH, will get stuck during this period.

This is also visible in our data since data upload was blocked for 2 minutes and 13 seconds:

That was due to default value of -TimeoutSec parameter in Invoke-RestMethod with indefinite timeout.

If I change that to e.g., 5 seconds, then I’ll see packet drops:

If I now improve upload logic by introducing simple queue for failed messages and re-process them, I can now see those errors in the table:

The error message is: The request was canceled due to the configured HttpClient.Timeout of 5 seconds elapsing.

Now we have seen that UDR works as expected. It forces all traffic to NVA and our application is not able to send data to the table storage in this test setup.

Since our UDR forced all the traffic to NVA, it means that no other outbound traffic to the internet actually worked during the test period.

Test 3: UDR and Storage service endpoint

Virtual network service endpoints are enabled on a per-subnet basis within a VNet. In this test setup we will enable Microsoft.Storage service endpoint for our test subnet:

Above setting changes the routing behavior in the network. The best way to see the effective routes is to go to:

Virtual Machine > Networking > Network interface > Effective routes

Virtual machine must be running in order to see the effective routes. And unfortunately, no, you cannot test this with just a network interface.

Here are the effective routes from our test setup:

There are two VirtualNetworkServiceEndpoint routes with many many many address prefixes. Yes, it would be nice to have some additional metadata in the view to making it easier to understand which service endpoint is responsible for which route. You can pick any of the IP addresses and validate that it’s indeed from Storage account IP address range:

Get-AzureDatacenterIPOrNo -IP 191.239.203.0 | Format-Table

Here is the output:

IpRange          Source                      SystemService Ip            Region
-------          ------                      ------------- --            ------
191.239.203.0/28 ServiceTags_Public_20231106 AzureStorage  191.239.203.0 
191.239.203.0/28 ServiceTags_Public_20231106 AzureStorage  191.239.203.0 westeurope
191.239.200.0/22 ServiceTags_Public_20231106               191.239.203.0 westeurope
191.239.200.0/22 ServiceTags_Public_20231106               191.239.203.0

The above routing definitions are more specific and thus override our to-nva route. Traffic is therefore directly routed to the Storage account.

It’s also important to understand how Azure selects a route:

If multiple routes contain the same address prefix, Azure selects the route type, based on the following priority:

  1. User-defined route
  2. BGP route
  3. System route

Let’s test our scenario now in practice. I will enable to-nva route for 120 seconds and then remove it similarly as in previous test.

SSH gets stuck as previously:

However, our application continuously pumps data into the table storage without any issues:

So, it works as expected. Traffic to the Storage account is not impacted by the UDR. This also explains why you don’t see this traffic in your NVA logs.

Summary

I hope I have managed to explain how Virtual Network service endpoints work in practice. I think it’s valuable to give tools for people to understand how things work.

I can’t recommend this video enough to get better understanding about routing in Azure:

“Does my traffic stay on the Microsoft Network?” – Adam Stuart

I’m planning to write more about this topic in the future and especially about SNAT port exhaustion which every now and then causes issues for people.

I hope you find this useful!