CouchDB Views: remove duplicates *and* order by ti

2019-02-17 04:01发布

Based on a great answer to my previous question, I've partially solved a problem I'm having with CouchDB.

This resulted in a new view.

Now, the next thing I need to do is remove duplicates from this view while ordering by date.

For example, here is how I might query that view:

GET http://scoates-test.couchone.com/follow/_design/asset/_view/by_userid_following?endkey=[%22c988a29740241c7d20fc7974be05ec54%22]&startkey=[%22c988a29740241c7d20fc7974be05ec54%22,{}]&descending=true&limit=3

Resulting in this:

HTTP 200 http://scoates-test.couchone.com/follow/_design/asset/_view/by_userid_following
http://scoates-test.couchone.com > $_.json.rows
[ { id: 'c988a29740241c7d20fc7974be067295'
  , key: 
     [ 'c988a29740241c7d20fc7974be05ec54'
     , '2010-11-26T17:00:00.000Z'
     , 'clementine'
     ]
  , value: 
     { _id: 'c988a29740241c7d20fc7974be062ee8'
     , owner: 'c988a29740241c7d20fc7974be05f67d'
     }
  }
, { id: 'c988a29740241c7d20fc7974be068278'
  , key: 
 [ 'c988a29740241c7d20fc7974be05ec54'
     , '2010-11-26T15:00:00.000Z'
     , 'durian'
     ]
  , value: 
     { _id: 'c988a29740241c7d20fc7974be065115'
     , owner: 'c988a29740241c7d20fc7974be060bb4'
     }
  }
, { id: 'c988a29740241c7d20fc7974be068026'
  , key: 
     [ 'c988a29740241c7d20fc7974be05ec54'
     , '2010-11-26T14:00:00.000Z'
     , 'clementine'
     ]
  , value: 
     { _id: 'c988a29740241c7d20fc7974be063b6d'
     , owner: 'c988a29740241c7d20fc7974be05ff71'
     }
  }
]

As you can see, "clementine" shows up twice.

If I change the view to emit the fruit/asset name as the second key (instead of the time), I can change the grouping depth to collapse these, but that doesn't solve my order-by-time requirement. Similarly, with the above setup, I can order by time, but I can't collapse duplicate asset names into single rows (to allow e.g. 10 assets per page).

Unfortunately, this is not a simple question to explain. Maybe this chat transcript will help a little.

Please help. I'm afraid that what I need to do is still not possible.

S

2条回答
Lonely孤独者°
2楼-- · 2019-02-17 04:37

You can do this using list function. Here is an example to generate a really simple list containing all the owner fields without dupes. You can easily modify it to produce json or xml or anything you want.

Put it into your assets design doc inside the lists.nodupes and use like this: http://admin:123@127.0.0.1:5984/follow/_design/assets/_list/nodupes/by_userid_following_reduce?group=true

function(head, req) {
    start({
          "headers": {
          "Content-Type": "text/html"
          }
         });
    var row;
    var dupes = [];
    while(row = getRow()) {
    if (dupes.indexOf(row.key[2]) == -1) {
        dupes.push(row.key[2]);
        send(row.value[0].owner+"<br>");
    }
    } 
}
查看更多
3楼-- · 2019-02-17 04:37

Ordering by one field and uniquing on another isn't something the basic map reduce can do. All it can do is sort your data, and apply reduce rollups to dynamic key-ranges.

To find the latest entry for each type of fruit, you'd need to query once per fruit.

There are some ways to do this that are kinda sane.

You'll want a view with keys like [fruit_type, date], and then you can query like this:

for fruit in fruits
  GET /db/_design/foo/_view/bar?startkey=["apples"]&limit=1&descending=true

This will give you the latest entry for each fruit.

The list operation could be used to do this, it would just echo the first row from each fruit's block. This would be efficient enough as long as each fruit has a small number of entries. Once there are many entries per fruit, you'll be discarding more data than you echo, so the multi-query approach actually scales better than the list approach, when you get to a large data set. Luckily they can both work on the same view index, so when you have to switch it won't be a big deal.

查看更多
登录 后发表回答