There is No Way It’s DNS
If you’ve been around in networking for a while, you probably know this haiku:
Recently I’ve had one more experience like that, where enabling “VPC Hostnames” on AWS caused an outage. I’d have sworn on my love of beer that it cannot be related to DNS.
Time for a closer look at DNS support in VPCs.
Overview: The VPC DNS Resolver
Amazon provides standard DNS resolvers in each VPC, free of charge. You can find the resolver service at several IP addresses:
- IPv6: At the local address
- Legacy IP:
- Local address
- VPC’s base network address plus 2 (e.g. 172.31.0.2 or 10.0.0.2) – therefore sometimes called the “dot-2 resolver”
- Local address
This resolver is announced as part of the default DHCP options.
It can resolve both public internet DNS names (like
neveragain.de) and VPC-specific hostnames.
The documentation of DNS support for VPCs fits on a single page.
Overview: VPC-Specific DNS Hostnames
Each Elastic network interface (ENI) in a VPC can have DNS hostnames pointing to it.
For EC2 instances, this would be
ec2-$IPADDRESS.$REGION.compute.amazonaws.com (legacy IP only).
It works the same way for other in-VPC AWS services: They create ENIs in the VPC and some corresponding DNS records.
With RDS, for example, this is something like
RDS demonstrates an important distinction:
- For private-only databases, they always (globally!) resolve to the private IP address
- For databases with public IPs:
- queries from “outside” resolve to the public IP address
- queries from “inside” resolve to the private IP address
VPC-Specific DNS settings
There’s two settings to control the DNS resolver’s behavior. In the documentation’s own words:
enableDnsHostnames: Indicates whether instances with public IP addresses get corresponding public DNS hostnames1
enableDnsSupport: Indicates whether the DNS resolution is supported
Confusingly, the default settings vary a bit: “By default, both attributes are set to true in a default VPC or a VPC
created by the VPC wizard” – but any other VPC will have
enableDnsHostnames disabled by default.
Reading Naïvely skimming the VPC DNS documentation, the following assumption seemed sound:
enableDnsHostnames just creates additional DNS entries for IP addresses (authoritative DNS), and
controls whether the VPC’s resolver is enabled at all.
But, as we all know: Assumption is the mother of all fuck-ups.
This works pretty much as expected: If this is enabled, the VPC resolver IP addresses (see above) respond to queries.
Disabling it immediately casues those resolvers to go dark, i.e. any DNS query will not be answered (you’ll see a timeout).
It is also required for the next setting to have any effect.
As expected, this creates additional DNS records. That’s why this option is also required if you want to have public IPs for RDS databases, for example.
Entirely not expected – for me, at least – is that it sythesizes forward and reverse responses for any private IP address. That’s right: Not just the private IP addresses within your VPC CIDR range, but any private IP address. Like, you know, those private IP addresses that you’re using on-premises.
This seems reasonable for VPC addresses:
[ec2-user@ip-10-0-0-34 ~]$ dig +noall +ans -x 10.0.0.34 126.96.36.199.in-addr.arpa. 285 IN PTR ip-10-0-0-34.eu-central-1.compute.internal.
But now your on-premises servers have a reverse lookup, too! From the VPC’s perspective, at least:
[ec2-user@ip-10-0-0-34 ~]$ dig +noall +ans -x 192.168.123.234 188.8.131.52.in-addr.arpa. 600 IN PTR ip-192-168-123-234.eu-central-1.compute.internal. [ec2-user@ip-10-0-0-34 ~]$ dig +noall +ans -x 172.16.0.3 184.108.40.206.in-addr.arpa. 600 IN PTR ip-172-16-0-3.eu-central-1.compute.internal.
If this option is disabled, those queries return
NXDOMAIN (entry does not exist), as they should.
This is documented, but rather in the fine print instead of obviously with the options’ descriptions.
Resolution (Pun Inteded)
So, after all: It was DNS. The client in question performs a reverse lookup of the server’s IP address and then
uses that information to see which credentials it should present. After enabling VPC DNS hostnames, that lookup
was no longer answered with “no such entry” but with the synthesized
Subsequently, the credentials lookup failed to find a match.
This is something I’ve observed often on AWS: They seem to like easter eggs and/or proper implementations.
From ancient times, when ISC BIND was the de-facto authoritative DNS server software, you’d
query a nameserver’s version information from the
version.bind hostname in the Chaos (
CH) class (as opposed to the
default Internet class
IN). The nameserver would answer with the software version, e.g.
Most modern nameservers simply don’t answer this query. The VPC resolver, however, does:
[ec2-user@ip-10-0-0-34 ~]$ dig +noall +ans version.bind txt ch @10.0.0.2 version.bind. 0 CH TXT "EC2 DNS"
The resolver IP address seems to employ some caching of its own, apparently for a few minutes per record.
This is important when changing the VPC Hostnames setting: It will not take effect immediately, and not consistently!
I can observe fluctuating time-to-live values returned by the resolver IP address, which leads me to believe that there’s actually a handful of different resolvers answering the queries:
[ec2-user@ip-10-0-0-34 ~]$ while true; do dig +noall +ans neveragain.de soa @10.0.0.2; sleep 1; done neveragain.de. 243 IN SOA squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800 neveragain.de. 242 IN SOA squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800 neveragain.de. 241 IN SOA squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800 neveragain.de. 247 IN SOA squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800 neveragain.de. 245 IN SOA squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800 neveragain.de. 246 IN SOA squigley.hq.neveragain.de. hostmaster.neveragain.de. 2021080901 86400 3600 2419200 10800 [ ... and so on ...]
For advanced DNS resolution topics, see also:
- Route53 Resolver Endpoints to forward DNS queries between external (e.g. on-premises) DNS and AWS (beware: while this sounds like three lines of BIND configuration, it actually starts at almost $190/month!)
- DNSSEC validation (not enabled by default, unfortunately)
- DNS Firewall,
which is a corny name for hostname-specific override rules in the resolver (you can fake
- For VPC Peerings, there’s separate settings for DNS
It was DNS.
Minor gripe: What does that even mean, an instance gets a hostname? What exactly does it get, and how? Does it affect the DHCP-assigned hostname as well? A rather weak choice of words, I’d say, used throughout the document. Spoiler: It does not affect DHCP responses. ↩