An Under-Appreciated Service
Overview-style blog posts are not my thing, but I got trapped between my own laziness and the Duckbill Group1.
So let’s get an overview of the AWS Transfer Family service. It’s actually a nifty solution to a problem that nobody wants to talk about, so it certainly qualifies as under-appreciated.
In a few words: Transfer Family bolts an SFTP / FTP / FTPS endpoint onto S3 and gives you a nice web interface to manage users.
Why Did It Have to Be SFTP?
Just like every program expands until it can read mail, any organization expands until it needs automated file exchange with some third party. Some start-ups may be lucky, but sooner or later, some weird project will demand an SFTP connection.
This is a
dated well-established mechanism to exchange data with vendors and suppliers – think catalog data, parts lists, price lists, all kinds of reporting, and other machine-readable files of similar nature. Usually the data isn’t very time-critical; daily or weekly exchanges are common.
Those files are often part of some ETL processing – on either end. On the supplying side, those files might be generated by a database export job. On the receiving side, some process might load the list to one or more databases. A common example is to enrich local product data with a supplier’s components prices and availability.
Back in the day, FTP was the obvious choice for such a service. When security got more attention over the years, the technically superior choice would have been FTPS (that’s FTP-over-SSL/TLS) – but X.509 certificates generally cause headaches for everyone, and available client support was very limited. So the world converged towards SFTP (that’s an FTP-like extension to SSH), which was much less painful to deploy2.
It makes migrations easy: You can assign a custom DNS name, so your clients won’t need to be reconfigured. You can even import an SSH Host Private Key for SFTP, so your service will have the same SSH fingerprint – this is important, because all those automated clients would stop and complain otherwise… probably… maybe.
It’s called Transfer Family because it offers the unholy trinity of protocols: SFTP, FTPS, and yes, even plain FTP like it’s 19904. Because AWS enjoys taking care of every last dinosaur out there.
The Backend is S3!
I could repeat this several times: The backend is S3! And this bucket can be an existing one – it can be used by other services as usual.
This is so simple and obvious, but it’s also quite brilliant: Given that you’re actually using AWS services beyond EC2, your data will gravitate heavily towards S3 anyway.
Transfer Family makes serving that data via SFTP just too easy. And if you’re receiving data via SFTP, it’s just too easy to continue processing the the uploaded data in a “cloud native” manner: For example by using it via Amazon Athena, or by using S3 Event Notifications to trigger processing by Lambda, or maybe directly load it into Redshift. Or use Redshift Spectrum, I always mix those two up.
For the SFTP-trapped customers that I encountered, this would have been the best part: Transfer Family comes with a nice web UI via the AWS Console where you can manage users and their access. It’s a separate user management, with no ties to IAM or Active Directory entities. And for most customers, this is just fine.
But if there is a more complex use-case, you can achieve any authentication backend with some glue code: Transfer Family can issue calls to API Gateway. There are templates for using AWS Secrets Manager and Okta. Just be aware that there’s no single-click integration of Active Directory or Amazon Cognito.
Permissions management is joyfully easy: Users are mapped to an IAM Role, which manages their access to S3 buckets as usual. It’s even possible to access multiple S3 buckets from a single connection! If you don’t want that, there’s a mode to restrict users to some specific S3 bucket and path – what traditionally would have been called “restrict to home directory”.
This deserves an extra section: Pricing can be obscene, depending on your point of view. It’s priced per hour and per endpoint. Running a full month of SFTP will place you somewhere around $225. Running a full month of SFTP and FTPS puts you well over $400.
Data transfer comes on top of that, which is $0.04/GB ($40/TB) for either direction. On top of the usual egress transfer pricing, of course.
With that being said: If you store much more data than you transfer, the price advantage of S3 can make this worthwhile, especially when using S3 Infrequent Access or Intelligent Tiering storage (compared to a simple EC2 instance with EBS). And even if this isn’t the case – I’d take “not having to deal with SFTP myself” over a few more bucks on the bill anytime.
- Elegant mapping of legacy FTP/SFTP data flows to Amazon S3
- Using S3 makes the “surrounding” data processing really easy
- Access multiple buckets from one connection
- Web UI for user management
- Flexible adding of other authentication methods
- Easy drop-in for existing solution – custom hostname, custom SSH Host Private Key
- Highly available and scales as required
- Optional deployment within a VPC (useful for offering an internal/private service)
- Pricing is obscene (depending on your use-case)
- No easy way to add Directory Services (other than providing some API glue)
- Not supported in all regions yet (but most of them – only very new regions like Milan are missing, I think)
- SFTP and friends should just die, instead of being made easy to tolerate
If you have that particular SFTP itch and can swallow the pricing, then this is a really nifty solution to a really annoying problem.
UPDATE 2021-01-07 Just two days after this post, they added EFS backend support. It makes perfect sense, but it doesn’t get me excited, as EFS does not pave a way out of the legacy tech swamp.
P.S.: I’m aware that this came out at more than 1,000 words. I swear I can write more terse if I want to!
Corey Quinn’s Duckbill Group was looking to hire a Content Writer for LastWeekInAWS.com as a side-job. I was intrigued by the idea, though I certainly am not the ideal candidate – being neither experienced writer nor native speaker and all that. The application was asking for a small piece of 250-300 words about the most under-appreciated AWS service. I didn’t have a good idea immediately (besides EC2-Instance-Connect, which isn’t really a full-blown service), so I decided to throw it on my low-prio pile and ask Corey for a deadline on this. He never replied. Soon after the New Year started, this item bubbled up on my to-do pile. The position isn’t open anymore, but I didn’t want to lazily remove an item from my to-do list just a few days after New Year’s Resolution Meditation. So I figured I’d write it anyway, and just publish it on my blog. Good for you! ↩
Not only because SFTP clients are easy to find and use – also because a) SSH already was everywhere at the time and b) people had already learned to blindly type “yes” to the fingerprint prompt, further reducing friction, while improving security (over plain FTP, that is). ↩
Usually a single bare-naked VM in some dark and dusty folder of your vSphere client, running nothing but OpenSSH. ↩
Don’t get me wrong where many people want to get people wrong, because they want to look smart: FTP still has valid use-cases! For example, software distribtion packages that are signed with (and verified against) a well-known PGP key anyway. Encrypting and authenticating (signing) such high-traffic content distribution is simply wasting energy for no good reason. But it feels 1990. ↩