Having large number of collections in MongoDB ( Ne

2020-06-28 03:29发布

This continues from: Pros and Cons of using MongoDB instead of MS SQL Server.

I am unsure why you are trying to take the advise of using many collections.

Using many collections in this way in MongoDB is considered a bad idea (and you would have to increase ns size for this most likely after your index overhead), you should instead scale a single collection of common docs way out horizontally. It seems the other answerers agree.

I would use a single collection with a document structure maybe of (quick off the top of my head):

{
    _id: {},
    camera_id: ObjectId(),
    image: {},
    hour: ts_of_hour,
    day: ts_of_day
}

That way you got all the data you need to select images based on whatever denomination you want.

NB: Consider as well that MongoDBs lock is database level, not collection level. You won't gain anything useful here only making your querying harder and more complex and maybe making your data harder to maintain.

Edit

To answer some of your concerns:

NB: I have not designed your app and this is a late answer (late at night too) so basically this is me fleshing out basic concepts that immediately come to mind.

1 collection for each camera, i.e. 100 collections almost.

Again I don't really see the point, if you were to do this for optimisation reasons then you would do it as one camera per DB, but that is officially overkill. Honestly 30m records is nothing, I will resolve that concern right now. Whether you are talking about SQL or MongoDB a 30m record collection is normally considered small, minute even, in terms of the databases potential (with MS SQL saying they can store perabytes per table).

Select All images of between FromDate and ToDate 2

You can use the answer above to accomplish that using a BSON date field on your document.

Select Top(COUNT) images between FromDate and ToDate

You can just count().

top() is not implemented in all DB systems so this is MS SQL specific here however in this particular query it does nothing useful since that query will always return one row.

You can aggregate this particular data to another collection. That is fine, so in another collection you would have a set of days:

{
     count: 3,
     day: (date|ts)
}

And then you can just some up over the days since count() can get slow on a large working set. So the aim of the collection to summarise your data to make your working set for queries more manageable.

So other collections are fine to use to hold "cache" of aggregation functions which would be slow, or of course to hold other entities within your app (like a relational DB would).

Basically, like in SQL, common schemas or documents get grouped in collections. So really I would design your app in SQL with only one table: images and maybe camera as well.

All others except for 5 have been covered loosely here so:

Select previous/next images from/to an Image with an ID

You can use the _id here like so:

db.images.find({_id: {$gt: last_id}}).limit(1)

And that should work pretty well.

As for the comment you posted here as well:

Do you mean that in MongoDB, querying a collection with 30 documents is not different from querying a collection with 30,00,000 documents ?

Now that depends on how much you know about database design in general and how to scale database architecture. This is something that doesn't just apply to MongoDB but also to SQL. If set-up right SQL can easily query 30m records like 30.

What it all comes down to is sharding. As to whether it would be fast comes down to your indexes across those shards that the queries to run and their working set size (how much data is needed in RAM, is it in RAM?). By the looks of it a shard index over image_id (ObjectId) and date might give you what you want. However this will need more testing and since I believe you are a little new to scaling databases you should really do some searching on this subject via Google or something.

NB again: 30m documents might not need sharding so this could be just a case of making good indexes.

Hopefully this helps and I haven't gone round in circles here,

回答2:

I don't see your problem with the collections. Photos are one single scheme, and they should be in a single collection.

Each photo gets a timestamp. The rest is done by querying. You can query documents per hour without a problem:

var begin_hour = new Date(date.year, date.month, date.day, hour);
var end_hour = new Date(date.year, date.month, date.day, hour + 1);

db.photos.find({taken: {$gte: begin_hour, $lt: end_hour}})

This selects the photos by the selected hour.

If that doesn't satisfy you, there's also MapReduce.