Scalable Image Storage

2019-01-20 21:24发布

I'm currently designing an architecture for a web-based application that should also provide some kind of image storage. Users will be able to upload photos as one of the key feature of the service. Also viewing these images will be one of the primary usages (via web).

However, I'm not sure how to realize such a scalable image storage component in my application. I already thought about different solutions but due to missing experiences, I look forward to hear your suggestions. Aside from the images, also meta data must besaved. Here are my initial thoughts:

  1. Use a (distributed) filesystem like HDFS and prepare dedicated webservers as "filesystem clients" in order to save uploaded images and service requests. Image meta data are saved in a additional database including the filepath information for each image.

  2. Use a BigTable-oriented system like HBase on top of HDFS and save images and meta data together. Again, webservers bridge image uploads and requests.

  3. Use a completly schemaless database like CouchDB for storing both images and metadata. Additionally, use the database itself for upload and delievery by using the HTTP-based RESTful API. (Additional question: CouchDB does save blobs via Base64. Can it however return data in form of image/jpeg etc.)?

11条回答
贼婆χ
2楼-- · 2019-01-20 22:17

Ok, if all that AWS stuff isn't going to work, here are a couple of thoughts.

As far as (3), if you put binary data into a database, the same data is going to come out. What makes it a jpeg is the format of the data, not what the database thinks it is. What makes the client (web browser) think its a jpeg is when you set the Content-type header to image/jpeg. You could also set it to something else (not recommended) like text and that's how the browser would try to interpret it.

For on-disk storage, I like CouchDB for its simplicity, but HDFS would certainly work. Here's a link to a post about serving image content from CouchDB: http://japhr.blogspot.com/2009/04/render-couchdb-images-via-sinatra.html

Edit: here's a link to a useful discussion about caching images in memcached vs serving them from disk under linux/apache.

查看更多
SAY GOODBYE
3楼-- · 2019-01-20 22:17

I've been experimenting with some of the _update functionality available to CouchDB view servers in my Python view server.

One really cool thing I did was an update function for image uploads so that I could use PIL to create thumbnails and other related images and attach them to the document when they get pushed to CouchDB.

This might be useful if you need image manipulation and want to cut down on the amount of code and infrastructure you need to keep up.

查看更多
在下西门庆
4楼-- · 2019-01-20 22:18

As part of Cloudant, I don't want to push product.... but BigCouch solves this problem in my science application stack (physics -- nothing to do with Cloudant, and certainly nothing to do with profit!). It marries the simplicity of the CocuhDB design with the auto-sharding and scalability that is missing in single-server CouchDB. I generally use it to store a smaller number of big file (multi-GB) and a large number of small file (100MB or less). I was using S3 but the get costs actually start to add up for small files that are repeatedly accessed.

查看更多
地球回转人心会变
5楼-- · 2019-01-20 22:21

Use Seaweed-FS (used to be called Weed-FS), an implementation of Facebook's haystack paper.

Seaweed-FS is very flexible and pared down to the basics. It was created to store billions of images and serve them fast.

查看更多
来,给爷笑一个
6楼-- · 2019-01-20 22:22

"Additional question: CouchDB does save blobs via Base64."

CouchDB does not save blobs as Base64, they are stored as straight binary. When retrieving a JSON document with ?attachments=true we do convert the on-disk binary to Base64 in order to add it safely to JSON but that's just a presentation level thing.

See Standalone Attachments.

CouchDB serves attachments with the content-type they are stored with, it's possible, in fact common, to server HTML, CSS and GIF/PNG/JPEG attachments directly to browsers.

Attachments can be streamed and, in CouchDB 1.1, even support the Range header (for media streaming and/or resumption of an interrupted download).

查看更多
登录 后发表回答