How to check for duplication before creating a new

2019-08-18 05:45发布

问题:

We want to check if a document already exists in the database with the same fields and values of a new object we are trying to save to prevent duplicated item.

Note: This question is not about updating documents or about duplicated document IDs, we only check the data to prevent saving a new document with the same data of an existing one.

Preferably we'd like to accomplish this with Mango/Cloudant queries and not rely on views.

The idea so far is:

1) Scan the the data that we are trying to save and dynamically create a selector that matches that document's structure. (We can't have the selectors hardcoded because we have types of many documents)

2) Query de DB with for any documents matching that selector to if any document already exists that matches those criteria.

However I wonder about the performance of this approach since many of the selector fields will not be indexed.

I also much rather follow best practices than create something out of the blue, but haven't been able to find any known solutions for this specific scenario.

If you happen to know of any, please share.

回答1:

Option 1 - Define a meaningful ID for your documents

The ID could be a logical coposition or a computed hash from the values that should be unique

If you want to check if a document ID already exists you can use the HEAD method

HEAD /db/docId

which returns 200-OK if the docId exits on the database.

If you would like to check if you have the same content in the new document and in the previous one, you may use the Validate Document Update Function which allows to compare both documents.

function(newDoc, oldDoc, userCtx, secObj) {
...
}

Option 2 - Use content hash computed outside CouchDB

  • Before create or update a document a hash should be computed using the values of the attributes that should be unique.

  • The hash is included in the document in a new attribute i.e. "key_hash"

  • Create a mango index using the "key_hash" attribute

  • When a new doc should be inserted, the hash should be computed and find for documents with the same hash value using a mango expression before the doc is inserted.

Option 3 - Compute hash in a View

  • Define a view which emit the computed hash for each document as key

    • Couchdb Javascript support does not include hashing functions, this could be difficult to include in a design document.
    • Use erlang to define the map function, where you can access to the erlang support for hashing.
  • Before creating a new document you should query the view using a the hash that you need to compute previously.



回答2:

One solution would be to take Juanjo's and Alexis's comment one step further.

  1. Select the keys you wish to keep unique
  2. Put the values in a string and generate a hash
  3. Set the document's _id to that hash
  4. PUT the document on the database.
  5. check return for failure

If another document already exists on the database with the same _id value, the PUT request will fail.