Built in way to read couchdb document size?

2020-07-19 03:25发布

问题:

I'm experimenting with using couchdb as a message store and would like to report the message size.

Ideally it would be nice to read a _size attribute. At worst I could check the string length of the entire document's JSON. I may even want to use the size as a view key.

What do you think is the best way to record document size and why do you think that method is best?

回答1:

You could make a view;

function (doc) {
    emit(doc._id, JSON.stringify(doc).length);
}


回答2:

You can make a HEAD request:

$ curl -X HEAD -I http://USER:PASS@localhost:5984/db/doc_id
HTTP/1.1 200 OK
Server: CouchDB/1.1.1 (Erlang OTP/R14B03)
Etag: "1-c0b6a87a64fa1b1f63ee2aa7828a5390"
Date: Tue, 17 Jan 2012 21:32:43 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 740047
Cache-Control: must-revalidate

The Content-Length header contains the length in bytes of the document. This is very fast because you don't need to download the full document.

But there's a caveat: Content-Length is the number of bytes of the utf-8 version of the document (see the Content-Type header); String.length is the number of 16-bit utf-16 code units in a string.

i.e., they are counting different things, bytes versus code units, of different encodings of the document, utf-8 versus utf-16.



回答3:

Based on the accepted answer, I suggest the following improvement:

function (doc) {
    emit([JSON.stringify(doc).length, doc._id], doc._id);
}

This has the following advantages:

  • doc length as the first key part lets you sort by document size.

  • doc id as second key part ensures that documents with the same size show up as separate entries.

  • doc id in the value part makes it easier to copy the ID when in futon (as the key part gives you a link pointer there).



标签: couchdb