Scalable Image Storage

2019-01-20 21:24发布

I'm currently designing an architecture for a web-based application that should also provide some kind of image storage. Users will be able to upload photos as one of the key feature of the service. Also viewing these images will be one of the primary usages (via web).

However, I'm not sure how to realize such a scalable image storage component in my application. I already thought about different solutions but due to missing experiences, I look forward to hear your suggestions. Aside from the images, also meta data must besaved. Here are my initial thoughts:

  1. Use a (distributed) filesystem like HDFS and prepare dedicated webservers as "filesystem clients" in order to save uploaded images and service requests. Image meta data are saved in a additional database including the filepath information for each image.

  2. Use a BigTable-oriented system like HBase on top of HDFS and save images and meta data together. Again, webservers bridge image uploads and requests.

  3. Use a completly schemaless database like CouchDB for storing both images and metadata. Additionally, use the database itself for upload and delievery by using the HTTP-based RESTful API. (Additional question: CouchDB does save blobs via Base64. Can it however return data in form of image/jpeg etc.)?

11条回答
手持菜刀,她持情操
2楼-- · 2019-01-20 21:56

We use MogileFS. We're small scale users with less than 8TB and some 50 million files. We switched from storing in Amazon S3 some years ago to get better control of file names and performance.

It's not the prettiest software, but it's very "field tested" and basically all users are using it the same way you will be.

查看更多
一夜七次
3楼-- · 2019-01-20 21:59

Here is an example to store blob image in CouchDB using PHP Laravel. In this example, I am storing three images based on user requirements.

Establishing the connection in CouchDB.

$connection = DB::connection('your database name');

/*region Fetching the Uers Uploaded Images*/

$FirstImage = base64_encode(file_get_contents(Input::file('FirstImageInput')));
$SecondImage =base64_encode(file_get_contents(Input::file('SecondImageInput')));
$ThirdImage = base64_encode(file_get_contents(Input::file('ThirdImageInput')));

list($id, $rev) = $connection->putDocument(array(
    'name' => $name,
    'location' => $location,
    'phone' => $phone,
    'website' => $website,
    "_attachments" =>[
        'FirstImage.png' => [
            'content_type' => "image/png",
            'data' => $FirstImage
        ],
        'SecondImage.png' => [
            'content_type' => "image/png",
            'data' => $SecondImage
        ],
        'ThirdImage.png' => [
            'content_type' => "image/png",
            'data' => $ThirdImage
        ]
    ],
), $id, $rev);

...

same as you can store single image.

查看更多
老娘就宠你
4楼-- · 2019-01-20 22:00

We have been using CouchDB for that, saving images as an "Attachment". But after a year the multi-dozen GB CouchDB Database files turned out to be a headache. For example CouchDB replication still has issues if you use it with very large document sizes.

So we just rewrote our software to use CouchDB for image information and Amazon S3 for the actual image storage. The code is available at http://github.com/hudora/huImages

You might want to set up a Amazon S3 compatible Storage Service on-site for your project. This keeps you flexible and leaves the amazon option without requiring external services for now. Walruss seems to become the most popular and scalable S3 clone.

I also urge you to look into the Design of Livejournal with their excellent Open Source MogileFS and Perlbal offerings. This combination is probably the most Famous image serving setup.

Also the flickr Architecture can be an inspiration, although they don't offer Open Source software to the public, like Livejournal does.

查看更多
\"骚年 ilove
5楼-- · 2019-01-20 22:01

I've written image store on top of cassandra . We have a lot and writes and random reads read/write is low. For high read/write ratio I suggest You mongodb (GridFs).

查看更多
劳资没心,怎么记你
6楼-- · 2019-01-20 22:03

Maybe have a look at the description of Facebook hayStack

Needle in a haystack: efficient storage of billions of photos

查看更多
Summer. ? 凉城
7楼-- · 2019-01-20 22:08

Have you considered Amazon Web Services? S3 is web-based file storage, and SimpleDB is a key->attribute store. Both are performant and highly scalable. It's more expensive than maintaining your own servers and setups (assuming you are going to do it yourself and not hire people), but you get up and running much more quickly.

Edit: I take that back - its more expensive in the long run at high volumes, but for low volume it beats the initial cost of buying hardware.

S3: http://aws.amazon.com/s3/ (you could store your image files here, and for performance maybe have an image cache on your server, or maybe not)

SimpleDB: http://aws.amazon.com/simpledb/ (metadata could go here: image id mapping to whatever data you want to store)

Edit 2: I didn't even know about this, but there is a new web service called Amazon CloudFront (http://aws.amazon.com/cloudfront/). It is for fast web content delivery, and it integrates well with S3. Kind of like Akamai for your images. You could use this instead of the image cache.

查看更多
登录 后发表回答