neveragain.de teletype

AWS: Managing SSH Access Like Grown-Ups

2020-08-25

We Can Do Better

SSH access management on AWS is in a weird place. It’s hard to find clear guidance. And the available guidance is often tailored to someone’s lonely developer account. There are some good recommendations, but they only cover half of the picture. Most of the internet seems to assume that you’re fine with supplying that single keypair when creating your instances.

As a result, many organizations have gotten used to doing things in weird ways. Like provisioning SSH keys with configuration management tools. Or like sharing private keys or using group-accounts from jump hosts1, like wild animals in the woods. Or baking allowed keys into their machine images, only to update every single machine when personnel changes – some machines manually, because that super-holy database machine cannot be restarted now, so it won’t get the new image anytime soon…

I get it, checking all the boxes is hard:

Wouldn’t it be nice if we had all that? And we could simply do this?

$ ssh i-01d4213f65f6db8e4

Yes we can!

Conventional SSH

Early humans have used telnet and rsh to make computers do what they wanted. Unfortunately, those lacked confidentiality (no encryption), integrity (data could be manipulated) and authenticity (we had no way of knowing that we’re really sending our password to the correct server). Those protocols had to die. Today, we use ssh to connect to servers, because it has all these properties – when used correctly.

But managing SSH keypairs becomes a challenge, once there are more than a few people and/or more than a few servers. Much of this can be automated, but usually remains painful for different reasons. And even if you are among the very few organizations that actually hooked up all AWS SSH usage to your Directory / LDAP, that brings a whole host of new issues to the table.

And besides key management, some other issues remain with conventional SSH.

For example, because that super-holy database machine is not directly connected to the public internet, access goes through at least one jump host, making many tasks really annoying – like copying large files to the intermediate machine first, then copying to the destination.

Advanced requirements like Multi-Factor Authentication, proper audit trails etc. are complicated to integrate and therefore, in the very most cases, are simply ignored and never talked about.

AWS Systems Manager Session Manager for Shell Access to EC2 Instances2

That’s a nice name, isn’t it? This feature of AWS Systems Manager was released mid-2018 and is something entirely different from SSH.

It relies on the SSM Agent that needs to be running on our EC2 instances. It connects to the AWS SSM API, which acts as a Command & Control3 system.

So, thanks to this back-channel from SSM Agent to SSM API, we can ask the SSM API to do lots of things. One of those things is to give us a shell session. We’ll get an unprivileged shell, but we can sudo our way out of it (at least with Amazon Linux’ default settings).

Opening that shell directly via the AWS SSM API has several interesting properties:

The AWS SSM agent is installed & enabled by default on Amazon Linux.

Our instances need permission to register themselves with the SSM API. It’s enough to attach the AWS-managed policy AmazonSSMManagedInstanceCore to our instance profile (see docs).

On our local machine, we need to install the AWS CLI Plugin for SSM.

Then we can do something like this:

$ aws ssm start-session --target i-01d4213f65f6db8e4
Starting session with SessionId: botocore-session-1598384360-04ec7c738e5b95488

sh-4.2$ hostname
ip-172-31-36-76.eu-central-1.compute.internal

sh-4.2$ id
uid=1001(ssm-user) gid=1001(ssm-user) groups=1001(ssm-user)

sh-4.2$ sudo -i
[root@ip-172-31-36-76 ~]# 

This is also what happens when we use the Web Console (EC2 > Instance > Connect > Session Manager).

The major drawback of an SSM Session is that we’re not using SSH, which means that all advanced functionality of SSH is lost. Most dearly we’ll miss the option to copy files with scp, but we might also need other features like port forwarding or multiplexed channels or, on very dark days, agent-forwarding. Or simply free choice of SSH client.

EC2 Instance Connect4

EC2 Instance Connect was added about one year later, in mid-2019.

This is a pretty nifty addition to the SSH Server configuration: In addition to the usual local check for a matching SSH key, it also checks the EC2 instance’s metadata for additional keys! The magic is that /etc/ssh/sshd_config uses a custom AuthorizedKeysCommand5. This is installed and enabled by default on Amazon Linux.

Using an AWS API call, we can upload any SSH key, so the SSH server will find and accept it. It’s valid only for one minute, then it disappears from the instance metdata.

Just like the SSM Session Manager approach, we get all the API benefits, like IAM and CloudTrail.

While cumbersome, we can use the naked API, e.g. via AWS CLI, and then connect as usual:

$ aws ec2-instance-connect send-ssh-public-key \
> --instance-id i-01d4213f65f6db8e4 \
> --instance-os-user ec2-user \
> --availability-zone eu-central-1b \
> --ssh-public-key file://~/.ssh/id_rsa_aws.pub
{
    "RequestId": "b2df7fad-9a9c-4723-ba9c-de4c69b75e2a",
    "Success": true
}

$ ssh ec2-user@18.194.208.55
[...]
[ec2-user@ip-172-31-36-76 ~]$ 

To make this easier, AWS provides the EC2 Instance Connect CLI, providing the mssh command:

$ mssh i-01d4213f65f6db8e4
[...]
[ec2-user@ip-172-31-36-76 ~]$ 

We don’t even need to have a keypair for this, mssh will generate a temporary keypair.

mssh isn’t perfect – it doesn’t give us our advanced SSH features back. We could work around that by sending our SSH key, as above, and then using ssh as usual. But we’d still be using the same conventional SSH channels, requiring public access or a jump host, to look up that darn IP address, blindly trust verify that fingerprint, and so on. And we’d need to re-send the key for every new ssh connection.

Putting it together: SSM Session Manager + EC2 Instance Connect + SSH configuration

Here’s the cool thing: Since mid-2019, SSM Session Manager supports Port Forwarding! Honestly I don’t get why that blog entry doesn’t speak one word of EC2 Instance Connect.

Let’s fix that and plug them together:

  1. Given the instance ID, figure out necessary data (availability zone etc.)
  2. Send our public key to EC2 Instance Connect
  3. Open a connection to the instance’s sshd, via Session Manager tunneling
  4. Use ssh (and scp and sftp and …) just as we’re used to

We’ll have a little script and some custom ssh_config to make that work.

In a local file, say ~/bin/ssmssh.sh, write a few lines of shell script to glue everything together:

#!/bin/sh

INSTANCE_ID="$1"
SSH_PORT="$2"
SSH_USERNAME="$3"
SSH_PUBKEY="$4"

AZ=$(aws ec2 \
        describe-instances \
        --instance-ids "$INSTANCE_ID" \
        --query 'Reservations[].Instances[].Placement.AvailabilityZone[]' \
        --output text
)

aws ec2-instance-connect send-ssh-public-key \
        --instance-id "$INSTANCE_ID" \
        --instance-os-user "$SSH_USERNAME" \
        --availability-zone "$AZ" \
        --ssh-public-key "file://$SSH_PUBKEY" \
        > /dev/null

aws ssm start-session \
        --target "$INSTANCE_ID" \
        --document-name AWS-StartSSHSession \
        --parameters "portNumber=$SSH_PORT"

Don’t forget to chmod 755 it.

Then in our ~/.ssh/config, add some magic so it knows how to handle the i- notation. The idea is from the SSM docs, but we’re taking it one step further with our script:

host i-* mi-*
	ProxyCommand sh -c "~/ssmssh.sh %h %p %r ~/.ssh/id_rsa.pub"
	StrictHostKeyChecking no
	UserKnownHostsFile /dev/null
	User ec2-user

Note that this does require us to have an id_rsa keypair (if we don’t, we can use ssh-keygen to generate an RSA pair).

And then we can enjoy safe and easy SSH with all features – and even SFTP, if we were into that:

$ ssh i-01d4213f65f6db8e4 hostname
Warning: Permanently added 'i-01d4213f65f6db8e4' (ECDSA) to the list of known hosts.
ip-172-31-36-76.eu-central-1.compute.internal

$ scp tmp/foo i-01d4213f65f6db8e4:/tmp/
Warning: Permanently added 'i-01d4213f65f6db8e4' (ECDSA) to the list of known hosts.
foo                                                        100% 3502KB 285.5KB/s   00:12    

$ sftp i-01d4213f65f6db8e4
Warning: Permanently added 'i-01d4213f65f6db8e4' (ECDSA) to the list of known hosts.
Connected to i-01d4213f65f6db8e4.
sftp> 

Conclusion

By wrapping our common SSH connection over an SSM Session Manager tunnel, we get all the features that make the AWS API so great.

IAM enables us to use groups, roles, external identity providers, MFA, very fine-grained access policies and much more, while everything is safely recorded in CloudTrail.

All our client’s authentication is channeled through the AWS CLI, so we can use different profiles, roles, accounts, environment variables, even instance profiles.

And because we’re relying on the secured and trusted6 connection to the AWS API, we have confidentiality, integrity and authenticity – without comparing fingerprints.

We get all that and maintain our full set of SSH features.

And: Both SSM Session Manager and EC2 Instance Connect are free of charge!


Discuss on Twitter


  1. It keep hearing that in the US, the term bastion host is common and the term jump host is used in Australia – I always knew this as jump host and I had never heard bastion host before, so, well… who knows! 

  2. SSM Shell Access original blog post 

  3. The term Command & Control is usually reserved for malicious servers that control botnets and such, but I want to point out that SSM effectively gives AWS very comfortable root access. AWS SSM is indeed designed to C&C our servers. I mean, we’re on AWS, so we already have to trust their execution environment and a lot of other things, but this certainly is a good time to take a deep breath. 

  4. EC2 Instance Connect original blog post 

  5. There is one little caveat though: As of 2020-08, that AuthorizedKeysCommand used by AWS has a hard timeout of five seconds to retrieve the key from metadata. Usually that’s more than enough, but when our instance is burning CPU like there’s no tomorrow, five seconds is easily exceeded – and then we cannot SSH to our instance when we need it the most. We can still use a plain SSM Session, if we manage to remember in that situation! 

  6. Well, as far as trust goes in today’s TLS ecosystem.