CouchDB replication strategy with dynamic groups o

2019-04-13 03:59发布

This is the situation:
We have a series of users who share some documents. The documents they can share might change throughout the day, so can the documents themselves (changes and deletions). The users can change some information on the documents.
E.g.
Users | Documents
A | X
A | Y
A | Z
B | X
B | Z
C | Y

Possible groups: A+C, A+B

The server on CouchDB is a replica of a SQL Server DB with this data, an ETL takes care of managing changes on CouchDB. However, the CouchDB database is replicated on each user phone via PouchDB.

The goal:
To replicate changes and deletions accordingly.

What we've tried:
1) we figured we'd structure our documents with a list of users that can access to it. Each document would have a "Users" array and then a filter in the design document would take care of the replication to the clients. Unfortunately document deletions and document changes that won't pass the filter (e.g. a user is removed from the array) are not present in the _changes feed so cannot be replicated accordingly on the clients
2) database per user. This is not possible, because users need to see each others work on the documents (they share them)
3) database per group of users. Pretty much the same problem as the first solution, but worse. In fact:
- groups of user can change and no longer be present: how do reflect that client-side?
- a document can shift to a new group: it will have to be redownloaded from scratch. This greatly increases the download size
- the same document can be in more than one group! (see example above)
- each client would have to know in which group she is everytime she logs in and replicate multiple databases. Then on the return trip you'd have to know on which databases the document was present

Is there a recipe for this situation? Am I missing an obvious solution?

EDIT

Partial solution for case 1:

    localDB.sync(remoteDB, {
        live: true,
        retry: true,
        filter: 'app/by_user',
        query_params: { "agente": agent }
    })
    .on('paused', function(info){
        console.log("paused");
        localDB.allDocs().then(function(docs){
            console.log("allDocs");
            docs.rows.forEach(function(row){
                console.log(row);
                remoteDB.get(row.id)
                       .then(function(doc){
                    if(doc.Agents.indexOf(agent) < 0){
                        localDB.remove(doc);
                    }
                });

            });
        });
    })
    .on('change', function(result){
            console.log("change!");
            result.change.docs.forEach(function(change) {
                if(!change.deleted){
                    $rootScope.$apply(function(){
                        $rootScope.$broadcast('upsert', change);
                    });
                }
            });
    });

Each remove() is giving me a 409 (conflict), and rightfully so. Is there a way to tell Pouch "no longer consider this as replicable and just remove it from my DB?"

3条回答
做自己的国王
2楼-- · 2019-04-13 04:18

We arrived at the conclusion that:
1) our use-case might not be what CouchDB is good for
2) we value our mental health. After almost a month struggling with this problem we'd rather try and fail
3) documents are relatively inexpensive, so even if they stay on the user's phone that won't cause any major distress. If the data builds up too much they can simply clear the data and start fresh

Solution:
1) Keep the architecture as to point 1
2) After each 'pause' event triggers compare local docs with remote docs, if the remote doc doesn't pass the filter remove it from the UI. Should there be a way to remove the local document only we'll be very interested in upgrading to that logic.

查看更多
虎瘦雄心在
3楼-- · 2019-04-13 04:19

1) still sounds as the simplest approach to me..

I don't know PouchDB very well, but in plain CouchDB, changes on deleted document can be workaround by extending attributes on deleted document, using your own custom DELETE function.

I mean.. a delete is like an update which sets the _deleted attribute to true.

So, instead of directly deleting documents, using the normal CouchDB crud DELETE on document, you can create an update function like this:

function(doc,req){
   // optional acls for deleting doc.. doc is owned by req.userCtx.name

   // doc.users are users already granted to work with this doc

   return [{
       "_id" : doc._id,
       "_rev": doc._rev,
       "_deleted":true,
       "users": doc.users
   },"Ok doc deleted"];

}

Furthermore, using document rewriting rules, this update function can eventually be called even when submitting an HTTP DELETE request(not only on PUT or POST).. In this way your delete behaviour becomes totally transparent to the client... and you delete in a way which can be more useful for your use case.

The Smileupps Chatty couchapp tutorial app uses this approach: extended deletes for different document types are performed within user/drop.js, profile/drop.js, chat/drop.js files

查看更多
小情绪 Triste *
4楼-- · 2019-04-13 04:32

(3) Seems like the simplest solution to me, i.e. the "database per role" solution.

I think your difficulty stems from trying to manage permissions inside the documents themselves (and then using filtering replication). When you do that, you are basically trying to mirror CouchDB's permission system inside your documents, which is going to cause headaches.

Why not create a database per role, and assign roles to users using the normal _users database? If roles change, then users will lose or gain access to a set of documents. You would need to have server endpoints to handle the role-shuffling, or you would need to set up separate "admin" databases with special privileges, where users can change the roles.

Then on the client side, you can either replicate from multiple CouchDB databases into a single PouchDB (and then collate the results together yourself), or into a single PouchDB (probably a bad idea if you need to sync bidirectionally). Obviously you would need an initial step where you determine which databases the user has access to, but that's a small downside in my opinion.

Then if the user loses access to a document, they will simply get normal 401 errors during replication (which will show up in the 'denied' event during live replication). No need for ddocs or filtered replication - much simpler!

查看更多
登录 后发表回答