Is moving documents between collections a good way

2020-06-08 13:29发布

问题:

I have two collections, one (A) containing items to be processed (relatively small) and one (B) with those already processed (fairly large, with extra result fields).

Items are read from A, get processed and save()'d to B, then remove()'d from A.

The rationale is that indices can be different across these, and that the "incoming" collection can be kept very small and fast this way.

I've run into two issues with this:

  • if either remove() or save() time out or otherwise fail under load, I lose the item completely, or process it twice
  • if both fail, the side effects happen but there is no record of that

I can sidestep the double-failure case with findAndModify locks (not needed otherwise, we have a process-level lock) but then we have stale lock issues and partial failures can still happen. There's no way to atomically remove+save to different collections, as far as I can tell (maybe by design?)

Is there a Best Practice for this situation?

回答1:

There's no way to atomically remove+save to different collections, as far as I can tell (maybe by design?)

Yes this is by design. MongoDB explicitly does not provides joins or transactions. Remove + Save is a form of transaction.

Is there a Best Practice for this situation?

You really have two low-complexity options here, both involve findAndModify.

Option #1: a single collection

Based on your description, you are basically building a queue with some extra features. If you leverage a single collection then you use findAndModify to update the status of each item as it is processing.

Unfortunately, that means you will lose this: ...that the "incoming" collection can be kept very small and fast this way.

Option #2: two collections

The other option is basically a two phase commit, leveraging findAndModify.

Take a look at the docs for this here.

Once an item is processed in A you set a field to flag it for deletion. You then copy that item over to B. Once copied to B you can then remove the item from A.



回答2:

I've not tried this myself yet but the new book 50 Tips and Tricks for MongoDB Developers mentions a few times about using cron jobs (or services/scheduler) to clean up data like this. You could leave the documents in Collection A flagged for deletion and run daily job to clear them out, reducing the overall scope of the original transaction.

From what I've learned so far, I'd never leave the database in a state where I rely on the next database action succeeding unless it is the last action (journalling will resend the last db action upon recovery). For example, I have a three phase account registration process where I create a user in CollectionA and then add another related document to CollectionB. When I create the user I embed the details of the CollectionB document in CollectionA in case the second write fails. Later I will write a process that removes the embedded data from CollectionA if the document in CollectionB exists

Not having transactions does cause pain points like this, but I think in some cases there are new ways of thinking about it. In my case, time will tell as I progress with my app



标签: mongodb