I'm looking to build an appropriate data structure for NoSQL storage of a photo gallery. In my web application, a photo can be part of 1 or more albums. I have experience with MySQL, but almost none with key-value storage.
With MySQL, I would have set up (3) tables as follows:
photos (photo_id, title, date_uploaded, filename)
albums (album_id, title, photo_id)
album_photo_map (photo_id, album_id)
And then, to retrieve a list of the 5 latest photos (with album data), a query like this:
SELECT *
FROM albums, photos, album_photo_map
WHERE albums.album_id = album_photo_map.album_id AND
photos.photo_id = album_photo_map.photo_id
ORDER BY photos.date_uploaded DESC LIMIT 5;
How would I accomplish a similar query using a NoSQL key-value pair database? (Specifically, Amazon's DynamoDB.) What would the storage look like? How would the indexing work?
Using mongodb lingo, your collections could look like this:
photos = [
{
_id: ObjectId(...),
title: "...",
date_uploaded: Date(...),
albums: [
ObjectId(...),
...
]
},
...
]
albums = [
{
_id: ObjectId(...),
title: "..."
}
]
Finding the 5 newest photos would be done like this:
> var latest = db.photos.find({}).sort({date_uploaded:1}).limit(5);
There's no server-side joins in mongo, so you'd have to fetch all the latest albums like this:
> var latest_albums = latest.find({}, {albums: 1});
Of course, then you have to boil this down into a set.
It's actually easier if you just embed the album inside the photo documents, since they're small:
photos = [
{
_id: ObjectId(...),
title: "...",
date_uploaded: Date(...),
albums: [
{name: "family-vacation-2011", title: "My family vacation in 2010"},
...
]
},
...
]
Then querying is the same, but you don't have to join. Finding all photos in an album looks like:
> db.photos.find({albums:{$elemMatch:{name: "family-vacation-2011"}}});
Redis can handle this. For the RMDBS table you mentioned above:
SET photos:photo_id:title "some photos title words"
SET photos:photo_id:date_uploaded "some uploaded time (2011-02-09 HH:MM:SS for example)"
SET photos:photo_id:filename "some filename words"
SET albums:album_id:title "some album title words"
SADD album_photo_map:photo_id album_id
Use a List(Redis suports list) to store last uploaded photos and update the list when a new photo is uploaded:
ret = r.lpush("upload:last_upload_times", photo_id) // update list
ret = r.ltrim("upload:last_upload_times", 0, N-1) // control list length
then, if we want to get the last uploaded N photos with album data:
last_uploaded_photo_list = r.lrange("upload:last_upload_times", 0, N-1)
last_uploaded_photo_with_album_list = [(photo_id, album_id) for photo_id in last_uploaded_photo_list for album_id in r.smembers(photo_id)]
Using DynamoDB, the "schema" for the photos table could be:
Album_Photo
- Album ID(String, Primary key)
- Photo ID(Number, Range key)
- ... Others fields
Now, where i wrote Other Fields you could save all Photo data and save another request for the proper data table, but this would create redundant data if photo exists in many albuns.
You could save all photo data in this table for the "main" album, and in the others albuns use an column to specify is the main album ID. As NoSQL databas dont need an strict schema, no column is necessary in an table.
If photoID has somehow an auto-increment behaviour you can easily get the last X pictures of an album. if not, you could use the Date as Range Key, and photo ID as an column. Is also an good idea use your range key reversed, to query the last row easily.