AWS: Egress Traffic and Using AWS Services via IPv6

2024-05-20

a road away from here

Introduction

In February 2024, AWS started charging customers for public IPv4 addresses. I recommend reading my ~~rant~~ analysis of the AWS announcement first: Cannot Escape IPv4. The second part of this series explored the situation for ingress traffic.

This final part will explore options for egress traffic from VPCs, with a focus on using IPv6 to avoid public IPv4 addresses entirely (spoiler: not feasible).

Egress connectivity “to the internet” is required not only for downloading software and container images, but also for accessing other services like Github, payment providers, sending telemetry, etc. – and for using AWS services: EC2 instances might access objects on S3, containers might send messages to SQS queues, and so on.

And as connecting to AWS service endpoints via IPv6 is surprisingly involved, I will take a look at the different kinds of endpoints and the additional configuration required for the AWS CLI and SDKs.

Egress from VPC Resources

Options for Egress

With IPv4, the common patterns for enabling egress traffic from VPCs are:

Public subnet: Route to an Internet Gateway; direct assignment of public IP addresses to VPC resources like EC2 instances and containers
Private subnet: Using only private addresses on resources, but adding a Managed NAT Gateway (or a cheaper self-managed NAT Instance¹) that translates all egress connections to a single public IPv4 source address
PrivateLink: Even when a subnet does not have any internet connectivity (internal subnet), it’s possible to enable egress to specific AWS services by adding Interface Endpoints (PrivateLink) or Gateway Endpoints² to the VPC

This is mostly the same with IPv6, but there is one significant difference.

No More NAT Gateway

With IPv6, there is no NAT³. Every IPv6 address used in a VPC is from a public address block. The difference between a public and a private subnet would no longer exist.

To mimic the familiar security characteristics of private subnets with NAT, AWS offers the Egress-only Internet Gateway for IPv6: It works like the regular Internet Gateway, but it only allows egress connections, i.e. it will never allow connections from the internet to VPC resources.

And just like Internet Gateway, it doesn’t cost anything. It has no fixed hourly price, it doesn’t have additional traffic-based charges, and it doesn’t require a public IP address that adds to its price – all of which a NAT Gateway does. IPv6 can be a cost optimization measure!

Dual-Stack and IPv6-Only Subnets

IPv6-enabled VPC subnets will usually be dual-stack, meaning IPv6 will be configured in addition to IPv4 (running two network protocol stacks). Any resource running in dual-stack subnets will pick up an IPv4 address as always, and can additionally pick up an IPv6 address.

It’s possible to create IPv6-only subnets that have no IPv4 configuration at all, avoiding the complexity of running two network protocols. I love that this is possible, and I hope this will be the default mode of operation at some point. And if the VPC also has public IPv4 and a Managed NAT Gateway, DNS64 and NAT64 support enables IPv6-only resources to seamlessly access to IPv4-only targets.

IPv6 Egress Support by Service

So far, IPv6-only subnets are only supported by EC2 instances built on Nitro. All other AWS services will reject deployment into an IPv6-only subnet, so it’s not possible to add a Lambda function or a Load Balancer into an IPv6-only subnet, for example.

But dual-stack subnets are supported by the major AWS compute services, so IPv6 egress connections are possible from EC2 instances, containers running on ECS⁴ and EKS (both via Fargate and EC2), and VPC-enabled Lambda functions.

Most other VPC-enabled services only support IPv4 – they simply ignore IPv6 when deployed into a dual-stack subnet; examples include App Runner, API Gateway, AppSync, and CodeBuild. In a VPC without any public IPv4 connectivity (or PrivateLink), they will not be able to connect to external resources, including AWS services.

Accessing AWS Services via IPv6

But even when a VPC resource can initiate connections via IPv6, some public IPv4 egress (or PrivateLink) is still required for many workloads, as most AWS services do not support requests from IPv6 clients. Here are some example requirements that cannot work otherwise:

Pulling container images from the Elastic Container Registry
Sending e-mails via SES (neither via API nor via SMTP)
Sending metrics or X-Ray traces to CloudWatch
Managing EC2 instances via Systems Manager (SSM), including Session Manager, Patch Manager, etc.
Dynamic SSH keys for EC2 instance access via EC2 Instance Connect
Connecting to any (public) API Gateway endpoint
Using the ECS Exec feature to log into ECS tasks
Using any AWS service that does not support IPv6 on its endpoints, like SQS, SNS, Eventbridge, Bedrock, and many others

This also applies to third-party services without IPv6 support like Github⁵.

Many AWS services deploy resources into the customer’s VPC, like RDS or ElastiCache. Connecting to those resources happens entirely within the VPC, so this works fine without public IPv4 connectivity (using the VPC’s private IPv4 addresses). That’s also how PrivateLink Interface Endpoints work.

Cloudwatch Logs

Many applications need to send log output to Cloudwatch Logs. This has been supported via IPv6 since February 2024, but not by default – so sending logs to Cloudwatch requires configuration of each application. For example, for the Cloudwatch Agent (EC2), this can be specified via the endpoint_override option in the configuration file.

Elastic Container Service

While ECS supports IPv6 for workloads, it often needs access to other AWS services as part of starting the task.

AWS’s Elastic Container Registry does not support IPv6, so ECS tasks cannot pull container images from ECR via IPv6. Dockerhub supports IPv6 though.

ECS tasks are commonly configured to send the containers’ log output to Cloudwatch, using the awslogs driver in the logging configuration. As mentioned above, extra configuration is required – but awslogs does not support endpoint configuration (Github issue #73, open since 2018). Tasks will fail to start unless configured without logging or using a different logs driver. There was a suggested workaround using the FireLens driver to send logs to Cloudwatch.

ECS supports retrieving secret values from either Secrets Manager or the SSM Parameter Store, but the latter does not support IPv6.

Apropos ECS: The ECS service team has published a pattern: Dual-stack IPv6 networking for Amazon ECS and AWS Fargate. It describes an ECS setup without public IPv4 in great technical detail and has some example code. The issues I mention here are solved with PrivateLink Interface Endpoints for ECR, Cloudwatch, etc. (for about $160/month). A great lab setup to gain experience with IPv6 on AWS!

I don’t know too much about EKS, but I assume the situation is similar there. AWS offers an IPv6 workshop for EKS, by the way.

Hard Truth

Clearly, it’s not feasible to fully avoid public IPv4 for egress on AWS, except for very specific use cases.

If some public IPv4 is required, the general advice for costs is this: A small self-managed NAT Instance is usually the cheapest option by far (in fact, the required public IPv4 address can cost more than the EC2 instance). For maximum quality of life, Managed NAT Gateway is the best choice – and for any enterprise environment, this is a no-brainer, despite its price tag.

But there are grey areas for small setups: Using public IPv4 addresses directly is cheaper than Managed NAT Gateway until around ten IPv4 addresses (supported for EC2 instances and ECS Fargate), and using PrivateLink is cheaper than Managed NAT Gateway until five Interface Endpoints. This is per Availability Zone for both, if AZ redundancy is required. In large environments, it’s common to use Resource Sharing and/or Transit Gateway to centralize egress – but this is out of scope for this post.

So if some public IPv4 is required anyway, it may seem pointless to spend time on IPv6 for egress at all, especially as it adds complexity. I recommend doing it anyway, as it’s important to gain operational experience with IPv6 – it’s coming slowly, but it’s coming. And every bit of egress traffic that doesn’t go through a NAT Gateway decreases costs.

AWS Service Endpoints and Client Applications (AWS CLI and SDK)

The remainder of this post shows how to make the AWS SDK use IPv6, so it’s only relevant when running entirely without public IPv4 connectivity (and PrivateLink).

On AWS Service Endpoints

When an AWS application – anything using the AWS SDK, including the AWS CLI – needs to send requests to an AWS service, it must know which service endpoint to connect to.

Endpoints are usually region-specific DNS names, like logs.eu-central-1.amazonaws.com for Cloudwatch Logs in Frankfurt. Some services are global⁶ instead, so they don’t contain a region name, like iam.amazonaws.com or cloudfront.amazonaws.com.

The SDK doesn’t actually care much about IPv4 vs. IPv6; it just connects to the service endpoint using the standard behavior of the operating system (usually: try IPv6 first, then try IPv4).

For the few services that support IPv6 endpoints, they usually do so only on separate dual-stack endpoints. This is an important and surprising detail – virtually everyone in the world implements IPv6 on websites and APIs by allowing requests via IPv6 and IPv4 on the same endpoint, so IPv6 “just works” when available on both ends of the connection.

But on AWS, IPv6 is usually not supported on the default service endpoint. For example, in Ireland (eu-west-1), EC2 supports IPv6 clients, but on a separate endpoint:

default: ec2.eu-west-1.amazonaws.com (IPv4 only)
dual-stack: ec2.eu-west-1.api.aws (IPv4 and IPv6)

Some AWS services need to discover endpoints dynamically – like Timestream, where the SDK needs to connect to the correct cell that handles the requested Timestream database. This mechanism isn’t aware of different endpoint flavors, so it could not support separate dual-stack endpoints, as far as I can tell.

Unfortunately, the AWS SDK will never attempt to use a service’s dual-stack endpoint by default. It must be configured to do so, or it will never use IPv6 for most services.

Service Endpoint Configuration

Using Dual-Stack Endpoints

By default, when no endpoint has been given explicitly, the SDK will look up the service endpoint using an internal ruleset. This supports a configuration option to make it select a dual-stack endpoint.

Using this option is preferred over explicitly specifying an endpoint, if possible.

The effect can be verified easily by using AWS CLI with --debug, as it will log the result of the rule-based lookup.

Standard behavior:

$ aws --debug ec2 describe-instances 2>&1 | grep "Endpoint provider result"
[...] Endpoint provider result: https://ec2.eu-west-1.amazonaws.com

Requesting the dual-stack endpoint via environment variable:

$ export AWS_USE_DUALSTACK_ENDPOINT=true
$ aws --debug ec2 describe-instances 2>&1 | grep "Endpoint provider result"
[...] Endpoint provider result: https://ec2.eu-west-1.api.aws

Configuring this via environment variables also works for other SDK applications, not just the CLI. It can also be enabled in code when initializing the SDK client. Python example:

import boto3
from botocore.config import Config

ec2 = boto3.client(
	service_name = 'ec2',
	config = Config(
		use_dualstack_endpoint = True,
	),
)

But if the SDK is configured to select a dual-stack endpoint, it will generate a dual-stack endpoint name even if that is not supported for the service. There is no fallback to the default endpoint. Requesting a dual-stack endpoint for EC2 in Frankfurt (eu-central-1), for example, would abort with a connection error because ec2.eu-central-1.api.aws does not exist. EC2 supports dual-stack endpoints only in some regions.

Therefore, it’s important to carefully check which service supports which endpoints in which regions. I have built a map of AWS Service Endpoints by Region and IPv6 Support to navigate the chaos.

Explicitly Specifying Endpoints

A service endpoint can also be specified explicitly. Python example:

import boto3

ec2 = boto3.client(
        service_name = "ec2",
        endpoint_url = "https://ec2.eu-west-1.api.aws",
)

AWS CLI example:

$ aws --endpoint https://ec2.eu-west-1.api.aws ec2 describe-instances

Or using environment variables:

$ export AWS_ENDPOINT_URL=https://ec2.eu-west-1.api.aws
$ aws ec2 describe-instances

Additionally, it’s possible to configure service-specific endpoints like this:

$ export AWS_ENDPOINT_URL_LAMBDA=https://lambda.eu-central-1.api.aws

This makes it easier to configure an application that needs to access several services, without modifying the code.

Both endpoint configuration options are also supported via the configuration files. The full documentation for specifying endpoints is here, but some SDK languages lack support for this (e.g. Java and Rust).

IPv6-Enabled Default Endpoints

A few services (like Secrets Manager) actually support IPv6 on their default service endpoint.

It’s indeed possible to set up an IPv6-only subnet, launch a new IPv6-only EC2 instance, and run this command:

$ aws secretsmanager list-secrets

This “just works”, without any configuration and in all supported regions. As it should be.

(But if the “use dual-stack endpoint” option is enabled, this command will assume a wrong service endpoint and fail.)

Conclusion

The basic building blocks are there. AWS has done all the hard work – the IPv6 support in VPC, EC2, Lambda, and ECS/EKS is good.

But the obstructive SDK behavior and the frugal IPv6 support of service endpoints make IPv6 complex and error-prone to implement on AWS. And that’s in addition to all the little things that crop up. It all feels like it has never been seriously used without public IPv4 outside a lab environment.

It’s pretty clear that IPv6 on AWS will not see widespread adoption this way. Just ignoring IPv6 for egress is so much easier than running without public IPv4 addresses. This way we will never escape IPv4. So I can only hope that AWS can rally service teams to add IPv6 support to their endpoints. Preferably to the default endpoints, as the rest of the world does – this would alleviate the SDK issues significantly.

In other words – referring to Jeff Barr’s initial blog post about the IPv4 charges –, I’d like to encourage AWS to think about accelerating adoption of IPv6 as a modernization and conservation measure.