Proper NoSQL data schema for web photo gallery

2019-03-09 10:56发布

问题:

I'm looking to build an appropriate data structure for NoSQL storage of a photo gallery. In my web application, a photo can be part of 1 or more albums. I have experience with MySQL, but almost none with key-value storage.

With MySQL, I would have set up (3) tables as follows:

photos (photo_id, title, date_uploaded, filename)
albums (album_id, title, photo_id)
album_photo_map (photo_id, album_id)

And then, to retrieve a list of the 5 latest photos (with album data), a query like this:

SELECT *
FROM albums, photos, album_photo_map
WHERE albums.album_id = album_photo_map.album_id AND
                photos.photo_id = album_photo_map.photo_id
ORDER BY photos.date_uploaded DESC LIMIT 5;

How would I accomplish a similar query using a NoSQL key-value pair database? (Specifically, Amazon's DynamoDB.) What would the storage look like? How would the indexing work?

回答1:

Using mongodb lingo, your collections could look like this:

photos = [
    {
        _id: ObjectId(...),
        title: "...",
        date_uploaded: Date(...),
        albums: [
            ObjectId(...),
            ...
        ]
    },
    ...
]

albums = [
    {
        _id: ObjectId(...),
        title: "..."
    }
]

Finding the 5 newest photos would be done like this:

> var latest = db.photos.find({}).sort({date_uploaded:1}).limit(5);

There's no server-side joins in mongo, so you'd have to fetch all the latest albums like this:

> var latest_albums = latest.find({}, {albums: 1});

Of course, then you have to boil this down into a set.

It's actually easier if you just embed the album inside the photo documents, since they're small:

photos = [
    {
        _id: ObjectId(...),
        title: "...",
        date_uploaded: Date(...),
        albums: [
            {name: "family-vacation-2011", title: "My family vacation in 2010"},
            ...
        ]
    },
    ...
]

Then querying is the same, but you don't have to join. Finding all photos in an album looks like:

> db.photos.find({albums:{$elemMatch:{name: "family-vacation-2011"}}});


回答2:

Redis can handle this. For the RMDBS table you mentioned above:

SET photos:photo_id:title "some photos title words"
SET photos:photo_id:date_uploaded "some uploaded time (2011-02-09 HH:MM:SS for example)"
SET photos:photo_id:filename "some filename words"

SET albums:album_id:title "some album title words"

SADD album_photo_map:photo_id album_id

Use a List(Redis suports list) to store last uploaded photos and update the list when a new photo is uploaded:

ret = r.lpush("upload:last_upload_times", photo_id) // update list
ret = r.ltrim("upload:last_upload_times", 0, N-1) // control list length

then, if we want to get the last uploaded N photos with album data:

last_uploaded_photo_list = r.lrange("upload:last_upload_times", 0, N-1) last_uploaded_photo_with_album_list = [(photo_id, album_id) for photo_id in last_uploaded_photo_list for album_id in r.smembers(photo_id)]



回答3:

Using DynamoDB, the "schema" for the photos table could be:

Album_Photo

  • Album ID(String, Primary key)
  • Photo ID(Number, Range key)
  • ... Others fields

Now, where i wrote Other Fields you could save all Photo data and save another request for the proper data table, but this would create redundant data if photo exists in many albuns.

You could save all photo data in this table for the "main" album, and in the others albuns use an column to specify is the main album ID. As NoSQL databas dont need an strict schema, no column is necessary in an table.

If photoID has somehow an auto-increment behaviour you can easily get the last X pictures of an album. if not, you could use the Date as Range Key, and photo ID as an column. Is also an good idea use your range key reversed, to query the last row easily.