neveragain.de teletype

There is No Way It’s DNS

If you’ve been around in networking for a while, you probably know this haiku:

It's not DNS. There's no way it's DNS. It was DNS.

Recently I’ve had one more experience like that, where enabling “VPC Hostnames” on AWS caused an outage. I’d have sworn on my love of beer that it cannot be related to DNS.

Time for a closer look at DNS support in VPCs.

Overview: The VPC DNS Resolver

Amazon provides standard DNS resolvers in each VPC, free of charge. You can find the resolver service at several IP addresses:

  • IPv6: At the local address fd00:ec2::253
  • Legacy IP:
    • Local address 169.254.169.253
    • VPC’s base network address plus 2 (e.g. 172.31.0.2 or 10.0.0.2) – therefore sometimes called the “dot-2 resolver”

This resolver is announced as part of the default DHCP options.

It can resolve both public internet DNS names (like neveragain.de) and VPC-specific hostnames.

The documentation of DNS support for VPCs fits on a single page.

Overview: VPC-Specific DNS Hostnames

Each Elastic network interface (ENI) in a VPC can have DNS hostnames pointing to it.

For EC2 instances, this would be ec2-$IPADDRESS.$REGION.compute.amazonaws.com (legacy IP only).

It works the same way for other in-VPC AWS services: They create ENIs in the VPC and some corresponding DNS records. With RDS, for example, this is something like mysql1234.cluster-oisdhosdahsj.eu-central-1.rds.amazonaws.com.

RDS demonstrates an important distinction:

  • For private-only databases, they always (globally!) resolve to the private IP address
  • For databases with public IPs:
    • queries from “outside” resolve to the public IP address
    • queries from “inside” resolve to the private IP address

Here, “inside” usually means “from within the same VPC”, but this line is blurred by VPC Peerings and other connections like Transit Gateway.

VPC-Specific DNS settings

There’s two settings to control the DNS resolver’s behavior. In the documentation’s own words:

  • enableDnsHostnames: Indicates whether instances with public IP addresses get corresponding public DNS hostnames1

  • enableDnsSupport: Indicates whether the DNS resolution is supported

Confusingly, the default settings vary a bit: “By default, both attributes are set to true in a default VPC or a VPC created by the VPC wizard” – but any other VPC will have enableDnsHostnames disabled by default.

Reading Naïvely skimming the VPC DNS documentation, the following assumption seemed sound: enableDnsHostnames just creates additional DNS entries for IP addresses (authoritative DNS), and enableDnsSupport controls whether the VPC’s resolver is enabled at all.

But, as we all know: Assumption is the mother of all fuck-ups.

enableDnsSupport

This works pretty much as expected: If this is enabled, the VPC resolver IP addresses (see above) respond to queries.

Disabling it immediately casues those resolvers to go dark, i.e. any DNS query will not be answered (you’ll see a timeout).

It is also required for the next setting to have any effect.

enableDnsHostnames

As expected, this creates additional DNS records. That’s why this option is also required if you want to have public IPs for RDS databases, for example.

Entirely not expected – for me, at least – is that it sythesizes forward and reverse responses for any private IP address. That’s right: Not just the private IP addresses within your VPC CIDR range, but any private IP address. Like, you know, those private IP addresses that you’re using on-premises.

This seems reasonable for VPC addresses:

[ec2-user@ip-10-0-0-34 ~]$ dig +noall +ans -x 10.0.0.34
34.0.0.10.in-addr.arpa.	285	IN	PTR	ip-10-0-0-34.eu-central-1.compute.internal.

But now your on-premises servers have a reverse lookup, too! From the VPC’s perspective, at least:

[ec2-user@ip-10-0-0-34 ~]$ dig +noall +ans -x 192.168.123.234
234.123.168.192.in-addr.arpa. 600 IN	PTR	ip-192-168-123-234.eu-central-1.compute.internal.

[ec2-user@ip-10-0-0-34 ~]$ dig +noall +ans -x 172.16.0.3
3.0.16.172.in-addr.arpa. 600	IN	PTR	ip-172-16-0-3.eu-central-1.compute.internal.

If this option is disabled, those queries return NXDOMAIN (entry does not exist), as they should.

This is documented, but rather in the fine print instead of obviously with the options’ descriptions.

Resolution (Pun Inteded)

So, after all: It was DNS. The client in question performs a reverse lookup of the server’s IP address and then uses that information to see which credentials it should present. After enabling VPC DNS hostnames, that lookup was no longer answered with “no such entry” but with the synthesized ip-192-168-0-123.eu-central-1.compute.internal. Subsequently, the credentials lookup failed to find a match.

Additional Trivia

Version Query

This is something I’ve observed often on AWS: They seem to like easter eggs and/or proper implementations.

From ancient times, when ISC BIND was the de-facto authoritative DNS server software, you’d query a nameserver’s version information from the version.bind hostname in the Chaos (CH) class (as opposed to the default Internet class IN). The nameserver would answer with the software version, e.g. "9.11.0".

Most modern nameservers simply don’t answer this query. The VPC resolver, however, does:

[ec2-user@ip-10-0-0-34 ~]$ dig +noall +ans version.bind txt ch @10.0.0.2
version.bind.           0       CH      TXT     "EC2 DNS"

Caching

The resolver IP address seems to employ some caching of its own, apparently for a few minutes per record.

This is important when changing the VPC Hostnames setting: It will not take effect immediately, and not consistently!

Fleet

I can observe fluctuating time-to-live values returned by the resolver IP address, which leads me to believe that there’s actually a handful of different resolvers answering the queries:

[ec2-user@ip-10-0-0-34 ~]$ while true; do dig +noall +ans neveragain.de soa @10.0.0.2; sleep 1; done
neveragain.de.          243     IN      SOA     squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800
neveragain.de.          242     IN      SOA     squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800
neveragain.de.          241     IN      SOA     squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800
neveragain.de.          247     IN      SOA     squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800
neveragain.de.          245     IN      SOA     squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800
neveragain.de.          246     IN      SOA     squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800
[ ... and so on ...]

Additional Reading

For advanced DNS resolution topics, see also:

  • Route53 Resolver Endpoints to forward DNS queries between external (e.g. on-premises) DNS and AWS (beware: while this sounds like three lines of BIND configuration, it actually starts at almost $190/month!)
  • DNSSEC validation (not enabled by default, unfortunately)
  • DNS Firewall, which is a corny name for hostname-specific override rules in the resolver (you can fake NODATA, NXDOMAIN and arbitrary CNAME responses)
  • For VPC Peerings, there’s separate settings for DNS

Conclusion

It was DNS.


Discuss and/or follow on Twitter!


  1. Minor gripe: What does that even mean, an instance gets a hostname? What exactly does it get, and how? Does it affect the DHCP-assigned hostname as well? A rather weak choice of words, I’d say, used throughout the document. Spoiler: It does not affect DHCP responses.