Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am considering MongoDB to hold metadata of images, recorded from 100 cameras, and the records will be kept for 30 days for each camera. If one camera gives 100,000 images in a day then i am going to save (100 x 30 x 100000) images (documents) at max in MongoDB. My web application will query this data as:
Select a Camera > Select a Date > Select an Hour > Fetch all images in that hour.
I plan to design schema with following three options, and need your expert opinion/suggestion for the best way out;
1) Hour-wise Collections: Create 72000 MongoDB Collections, i.e. 1 Collection per Hour for each Camera (100 cameras X 30 days X 24 hours) (using --nssize 500 command to exceed 24000 limit). I am afraid if MongoDB will allow me to create these much collections and secondly what are expected performance benefits and losses while reading and writing to these collection. Though, for reading per hour images looks tremendously easy with this schema, because i can fetch data in a single query to any Collection.
2) Day-wise Collections: Create 3000 MongoDB Collections, i.e. 1 Collection per Day for each Camera (100 cameras X 30 days). Though this is allowable and seems good number of collection but my concern is reading images from a particular hour inside particular day collection.
3) Camera-wise Collections: Create 100 MongoDB Collections, i.e. 1 Collection for each Camera (100 cameras/collections). Then saving snapshots with unique 'id' in format like (20141122061055000) that is a rephrasing of full date timestamp (2014-11-22 06:10:55.000).
I wish if ideally i could do (1), (2) or (3) but any other option is welcomed.
Please suggest about my selection for MongoDB as well, considering my case.
Regards.
This continues from: Pros and Cons of using MongoDB instead of MS SQL Server.
I am unsure why you are trying to take the advise of using many collections.
Using many collections in this way in MongoDB is considered a bad idea (and you would have to increase ns size for this most likely after your index overhead), you should instead scale a single collection of common docs way out horizontally. It seems the other answerers agree.
I would use a single collection with a document structure maybe of (quick off the top of my head):
{
_id: {},
camera_id: ObjectId(),
image: {},
hour: ts_of_hour,
day: ts_of_day
}
That way you got all the data you need to select images based on whatever denomination you want.
NB: Consider as well that MongoDBs lock is database level, not collection level. You won't gain anything useful here only making your querying harder and more complex and maybe making your data harder to maintain.
Edit
To answer some of your concerns:
NB: I have not designed your app and this is a late answer (late at night too) so basically this is me fleshing out basic concepts that immediately come to mind.
1 collection for each camera, i.e. 100 collections almost.
Again I don't really see the point, if you were to do this for optimisation reasons then you would do it as one camera per DB, but that is officially overkill. Honestly 30m records is nothing, I will resolve that concern right now. Whether you are talking about SQL or MongoDB a 30m record collection is normally considered small, minute even, in terms of the databases potential (with MS SQL saying they can store perabytes per table).
- Select All images of between FromDate and ToDate 2
You can use the answer above to accomplish that using a BSON date field on your document.
- Select Top(COUNT) images between FromDate and ToDate
You can just count()
.
top()
is not implemented in all DB systems so this is MS SQL specific here however in this particular query it does nothing useful since that query will always return one row.
You can aggregate this particular data to another collection. That is fine, so in another collection you would have a set of days:
{
count: 3,
day: (date|ts)
}
And then you can just some up over the days since count()
can get slow on a large working set. So the aim of the collection to summarise your data to make your working set for queries more manageable.
So other collections are fine to use to hold "cache" of aggregation functions which would be slow, or of course to hold other entities within your app (like a relational DB would).
Basically, like in SQL, common schemas or documents get grouped in collections. So really I would design your app in SQL with only one table: images
and maybe camera
as well.
All others except for 5 have been covered loosely here so:
- Select previous/next images from/to an Image with an ID
You can use the _id
here like so:
db.images.find({_id: {$gt: last_id}}).limit(1)
And that should work pretty well.
As for the comment you posted here as well:
Do you mean that in MongoDB, querying a collection with 30 documents is not different from querying a collection with 30,00,000 documents ?
Now that depends on how much you know about database design in general and how to scale database architecture. This is something that doesn't just apply to MongoDB but also to SQL. If set-up right SQL can easily query 30m records like 30.
What it all comes down to is sharding. As to whether it would be fast comes down to your indexes across those shards that the queries to run and their working set size (how much data is needed in RAM, is it in RAM?). By the looks of it a shard index over image_id (ObjectId
) and date might give you what you want. However this will need more testing and since I believe you are a little new to scaling databases you should really do some searching on this subject via Google or something.
NB again: 30m documents might not need sharding so this could be just a case of making good indexes.
Hopefully this helps and I haven't gone round in circles here,
I don't see your problem with the collections. Photos are one single scheme, and they should be in a single collection.
Each photo gets a timestamp. The rest is done by querying. You can query documents per hour without a problem:
var begin_hour = new Date(date.year, date.month, date.day, hour);
var end_hour = new Date(date.year, date.month, date.day, hour + 1);
db.photos.find({taken: {$gte: begin_hour, $lt: end_hour}})
This selects the photos by the selected hour
.
If that doesn't satisfy you, there's also MapReduce.