I have app where user's photos are private. I store the photos(thumbnails also) in AWS s3. There is a page in the site where user can view his photos(i.e thumbnails). Now my problem is how do I serve these files. Some options that I have evaluated are:
- Serving files from CloudFront(or AWS) using signed url generation. But the problem is every time the user refreshes the page I have to create so many signed urls again and load it. So therefore I wont be able to cache the Images in the browser which would have been a good choice. Is there anyway to do still in javascript? I cant have the validity of those urls for longer due to security issues. And secondly within that time frame if someone got hold of that url he can view the file without running through authentication from the app.
- Other option is to serve the file from my express app itself after streaming it from S3 servers. This allows me to have http cache headers, therefore enable browser caching. It also makes sure no one can view a file without being authenticated. Ideally I would like to stream the file and a I am hosting using NGINX proxy relay the other side streaming to NGINX. But as i see that can only be possible if the file exist in the same system's files. But here I have to stream it and return when i get the stream is complete. Don't want to store the files locally.
I am not able to evaluate which of the two options would be a better choice?? I want to redirect as much work as possible to S3 or cloudfront but even using singed urls also makes the request first to my servers. I also want caching features.
So what would be ideal way to do? with the answers for the particular questions pertaining to those methods?
i would just stream it from S3. it's very easy, and signed URLs are much more difficult. just make sure you set the content-type
and content-length
headers when you upload the images to S3.
var aws = require('knox').createClient({
key: '',
secret: '',
bucket: ''
})
app.get('/image/:id', function (req, res, next) {
if (!req.user.is.authenticated) {
var err = new Error()
err.status = 403
next(err)
return
}
aws.get('/image/' + req.params.id)
.on('error', next)
.on('response', function (resp) {
if (resp.statusCode !== 200) {
var err = new Error()
err.status = 404
next(err)
return
}
res.setHeader('Content-Length', resp.headers['content-length'])
res.setHeader('Content-Type', resp.headers['content-type'])
// cache-control?
// etag?
// last-modified?
// expires?
if (req.fresh) {
res.statusCode = 304
res.end()
return
}
if (req.method === 'HEAD') {
res.statusCode = 200
res.end()
return
}
resp.pipe(res)
})
})
If you'll redirect user to a signed url using 302 Found
browser will cache the resulting image according to its cache-control
header and won't ask it the second time.
To prevent browser from caching the signed url itself you should send proper Cache-Control
header along with it:
Cache-Control: private, no-cache, no-store, must-revalidate
So the next time it'll send request to the original url and will be redirected to a new signed url.
You can generate signed url with knox
using signedUrl
method.
But don't forget to set proper headers to every uploaded image. I'd recommend you to use both Cache-Control
and Expires
headers, because some browser have no support for Cache-Control
header and Expires
allows you to set only an absolute expiration time.
With the second option (streaming images through your app) you'll have better control over the situation. For example, you'll be able to generate Expires
header for each response according to current date and time.
But what about speed? Using signed urls have two advantages which may affect page load speed.
First, you won't overload your server. Generating signed urls if fast because you're just hashing your AWS credentials. And to stream images through your server you'll need to maintain a lot of extra connections during the page load. Anyway, it won't make any actual difference unless your server is hard loaded.
Second, browsers keeps only two parallel connections per hostname during page load. So, browser will keep resolving images urls in parallel while downloading them. It'll also keep images downloading from blocking downloading of any other resources.
Anyway, to be absolutely sure you should run some benchmarks. My answer was based on my knowledge of HTTP specification and on my experience in web developing, but I never tried to serve images that way myself. Serving public images with long cache lifetime directly from S3 increases page speed, I believe the situation won't change if you'll do it through redirects.
And you should keep in mind that streaming images through your server will bring all the benefits of Amazon CloudFront to naught. But as long as you're serving content directly from S3 both options will work fine.
Thus, there are two cases when using signed urls should speedup your page:
- If you have a lot of images on a single page.
- If you serving images using CloudFront.
If you have only few images on each page and serving them directly from S3, you'll probably won't see any difference at all.
Important Update
I ran some tests and found that I was wrong about caching. It's true that browsers caches images they was redirected to. But it associates cached image with the url it was redirected to and not with the original one. So, when browser loads the page second time it requests image from the server again instead of fetching it from the cache. Of course, if server responds with the same redirect url it responded the first time, browser will use its cache, but it's not the case for signed urls.
I found that forcing browser to cache signed url as well as the data it receives solves the problem. But I don't like the idea of caching invalid redirect URL. I mean, if browser will miss the image somehow it'll try to request it again using invalid signed url from the cache. So, I think it's not an option.
And it doesn't matter if CloudFront serve images faster or if browsers limits the number of parallel downloads per hostname, the advantage of using browser cache exceeds all the disadvantages of piping images through your server.
And it looks like most social networks solves the problem with private images by hiding its actual urls behind some private proxies. So, they store all their content on public servers, but there is no way to get an url to a private image without authorization. Of course, if you'll open private image in a new tab and send the url to your friend, he'll be able to see the image too. So, if it's not an option for you then it'll be best for you to use Jonathan Ong's solution.
I would be concerned with using the CloudFront option if the photos really do need to remain private. It seems like you'll have a lot more flexibility in administering your own security policy. I think the nginx setup may be more complex than is necessary. Express should give you very good performance working as a remote proxy where it uses request to fetch items from S3 and streams them through to authorized users. I would highly recommend taking a look at Asset Rack, which uses hash signatures to enable permanent caching in the browser. You won't be able to use the default Racks because you need to calculate the MD5 of each file (perhaps on upload?) which you can't do when it's streaming. But depending on your application, it could save you a lot of effort for browsers never to need to refetch the images.
Regarding your second option, you should be able to set cache control headers directly in S3.
Regarding your first option. Have you considered securing your images a different way?
When you store an image in S3, couldn't you use a hashed and randomised filename? It would be quite straight forward to make the filename difficult to guess + this way you'll have no performance issues viewing the images back.
This is the technique facebook use. You can still view an image when you're logged out, as long as you know the URL.