Problem:
I am storing number of HLS streams in S3 with given file structure:
Video1
├──hls3
├──hlsv3-master.m3u8
├──media-1
├──media-2
├──media-3
├──media-4
├──media-5
├──hls4
├──hlsv4-master.m3u8
├──media-1
├──media-2
├──media-3
├──media-4
├──media-5
In my user API I know which exactly user has access to which video content
but I also need to ensure that video links are not sharable and only accessible
by users with right permissions.
Solutions:
1) Use signed / temp S3 urls for private S3 content. Whenever client wants to play some specific video it is
sending request to my API. If user has right permissions the API is generating signed url
and returning it back to client which is passing it to player.
The problem I see here is that real video content is stored in dozen of segments files in media-* directories
and I do not really see how can I protect all of them - would I need to sign each of the segment file urls separately?
2) S3 content is private. Video stream requests made by players are going through my API or separate reverse-proxy.
So whenever client decides to play specific video, API / reverse-proxy is getting the request, doing authentication & authorization
and passing the right content (master play list files & segments).
In this case I still need to make S3 content private and accessible only by my API / reverse-proxy. What should be the recommended way here?
S3 rest authentication via tokens?
3) Use encryption with protected key. In this case all of video segments are encrypted and publicly available. The key is also stored in S3
but is not publicly available. Every key request made by player is authenticated & authorized by my API / reverse-proxy.
These are 3 solutions I have in my mind right now. Not convinced on all of them. I am looking for something simple and bullet proof secure. Any recommendations / suggestions?
Used technology:
would I need to sign each of the segment file urls separately?
If the player is requesting directly from S3, then yes. So that's probably not going to be the ideal approach.
One option is CloudFront in front of the bucket. CloudFront can be configured with an Origin Access Identity, which allows it to sign requests and send them to S3 so that it can fetch private S3 objects on behalf of an authorized user, and CloudFront supports both signed URLs (using a different algorithm than S3, with two important differences that I will explain below) or with signed cookies. Signed requests and cookies in CloudFront work very similarly to each other, with the important difference being that a cookie can be set once, then automatically used by the browser for each subsequent request, avoiding the need to sign individual URLs. (Aha.)
For both signed URLs and signed cookies in CloudFront, you get two additional features not easily done with S3 if you use a custom policy:
The policy associated with a CloudFront signature can allow a wildcard in the path, so you could authorize access to any file in, say /media/Video1/*
until the time the signature expires. S3 signed URLs do not support wildcards in any form -- an S3 URL can only be valid for a single object.
As long as the CloudFront distribution is configured for IPv4 only, you can tie a signature to a specific client IP address, allowing only access with that signature from a single IP address (CloudFront now supports IPv6 as an optional feature, but it isn't currently compatible with this option). This is a bit aggressive and probably not desirable with a mobile user base, which will switch source addresses as they switch from provider network to Wi-Fi and back.
Signed URLs must still all be generated for all of the content links, but you can generate and sign a URL only once and then reuse the signature, just string-rewriting the URL for each file making that option computationally less expensive... but still cumbersome. Signed cookies, on the other hand, should "just work" for any matching object.
Of course, adding CloudFront should also improve performance through caching and Internet path shortening, since the request hops onto the managed AWS network closer to the browser than it typically will for requests direct to S3. When using CloudFront, requests from the browser are sent to whichever of 60+ global "edge locations" is assumed to be nearest the browser making the request. CloudFront can serve the same cached object to different users with different URLs or cookies, as long as the sigs or cookies are valid, of course.
To use CloudFront signed cookies, at least part of your application -- the part that sets the cookie -- needs to be "behind" the same CloudFront distribution that points to the bucket. This is done by declaring your application as an additional Origin for the distribution, and creating a Cache Behavior for a specific path pattern which, when requested, is forwarded by CloudFront to your application, which can then respond with the appropriate Set-Cookie:
headers.
I am not affiliated with AWS, so don't mistake the following as a "pitch" -- just anticipating your next question: CloudFront + S3 is priced such that the cost difference compared to using S3 alone is usually negligible -- S3 doesn't charge you for bandwidth when objects are requested through CloudFront, and CloudFront's bandwidth charges are in some cases slightly lower than the charge for using S3 directly. While this seems counterintuitive, it makes sense that AWS would structure pricing in such a way as to encourage distribution of requests across its network rather than to focus them all against a single S3 region.
Note that no mechanism, either the one above or the one below is completely immune to unauthorized "sharing," since the authentication information is necessarily available to the browser, and thus to the user, depending on the user's expertise... but both approaches seem more than sufficient to keep honest users honest, which is all you can ever hope to do. Since signatures on signed URLs and cookies have expiration times, the duration of the share-ability is limited, and you can identify such patterns through CloudFront log analysis, and react accordingly. No matter what approach you take, don't forget the importance of staying on top of your logs.
The reverse proxy is also a good idea, probably easily implemented, and should perform quite acceptably with no additional data transport charges or throughput issues, if the EC2 machines running the proxy are in the same AWS region as the bucket, and the proxy is based on solid, efficient code like that found in Nginx or HAProxy.
You don't need to sign anything in this environment, because you can configure the bucket to allow the reverse proxy to access the private objects because it has a fixed IP address.
In the bucket policy, you do this by granting "anonymous" users the s3:getObject
privilege, only if their source IPv4 address matches the IP address of one of the proxies. The proxy requests objects anonymously (no signing needed) from S3 on behalf of authorized users. This requires that you not be using an S3 VPC endpoint, but instead give the proxy an Elastic IP address or put it behind a NAT Gateway or NAT instance and have S3 trust the source IP of the NAT device. If you do use an S3 VPC endpoint, it should be possible to allow S3 to trust the request simply because it traversed the S3 VPC Endpoint, though I have not tested this. (S3 VPC Endpoints are optional; if you didn't explicitly configure one, then you don't have one, and probably don't need one).
Your third option seems weakest, if I understand it correctly. An authorized but malicious user gets the key an can share it all day long.
You need to use CloudFront on top of S3, use Signed-URL's from CloudFront, then use MediaConvert to convert your videos into an Encrypted HLS Video Stream, then use a player like VideoJS which can play that encrypted stream.
And even after all that, someone could still launch a screen-recorder program like Camtasia or Screenflow and capture the whole thing. But nothing can stop that from happening.
But the stack above can help you get pretty darn close to 99.99% secure.
-- Ravi Jayagopal
S3MediaVault.com