I'm experimenting with using couchdb as a message store and would like to report the message size.
Ideally it would be nice to read a _size attribute. At worst I could check the string length of the entire document's JSON. I may even want to use the size as a view key.
What do you think is the best way to record document size and why do you think that method is best?
You could make a view;
function (doc) {
emit(doc._id, JSON.stringify(doc).length);
}
You can make a HEAD request:
$ curl -X HEAD -I http://USER:PASS@localhost:5984/db/doc_id
HTTP/1.1 200 OK
Server: CouchDB/1.1.1 (Erlang OTP/R14B03)
Etag: "1-c0b6a87a64fa1b1f63ee2aa7828a5390"
Date: Tue, 17 Jan 2012 21:32:43 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 740047
Cache-Control: must-revalidate
The Content-Length header contains the length in bytes of the document. This is very fast because you don't need to download the full document.
But there's a caveat: Content-Length is the number of bytes of the utf-8 version of the document (see the Content-Type header); String.length is the number of 16-bit utf-16 code units in a string.
i.e., they are counting different things, bytes versus code units, of different encodings of the document, utf-8 versus utf-16.
Based on the accepted answer, I suggest the following improvement:
function (doc) {
emit([JSON.stringify(doc).length, doc._id], doc._id);
}
This has the following advantages:
doc length as the first key part lets you sort by document size.
doc id as second key part ensures that documents with the same size show up as separate entries.
doc id in the value part makes it easier to copy the ID when in futon (as the key part gives you a link pointer there).