An app (Spring, JPA Hibernate, Sybase 12, Webapp) when run locally on startup consumes 40MB of the 256MB heap space based on VisualVM. When I trigger a search that returns 70,000+ rows (text data no blobs) the heap space graph shoots up to 256MB and throws out of memory. I have resolved this by using setMaxResults(limit). However, when I queried the same data, copy-pasted to a text file and saved to the filesystem, I can see that the size is only 26MB worth of text.
So in effect, 216MB(from 256-40) is consumed by loading a 26MB amount of text from the databases, who is consuming the 190MB by the time out of memory occurs? Perhaps it would be the frameworks, but I don't see how it can consume more than the actual data being loaded...
**note again that I resolved this with the setMaxResults(limit), my question is NOT what to do but rather why, for educational purposes.
Some things to consider:
Your operating system probably uses an 8bit per character encoding to store the text file. Java strings internally are all encoded at 16 bits per character, double the space right there.
Numbers with only a few digits will be smaller encoded as text than numbers. e.g., '1' is a one byte character in your text file, but a long with the value 1 is eight times that size in memory.
There will be duplication from hibernate taking values out of the SQL result set and mapping it onto your java objects. It may need to wrap/translate the contents of the result set into the types you defined on your mapping.
If your data-per-entity is actually small with a large number of entities, then the ratio of object overhead size to data size will obviously be high.
If you have small pieces of data in collections, the size of the collection can add up quick relative to the data. In extreme example, if you have LinkedList of one or two character strings, that's 192bits consumed just by pointers for every 16-32 bits of actual data. In an array list it would still be 64 bits for the pointer to point to 16-32 bits of data. (assuming 64 bit OS of course.)
Every object you load in hibernate is "tracked" for dirty checking in what's called the L1 cache. There can indeed quite a bit of overhead to the internal data structures and instrumentation used to do this relative to data size for large numbers of entities with small amounts of data.
--
So the 26MB of data is already 52MB of data in memory in java, assuming it is all strings, no numbers, no dates, it will be bigger otherwise.
And then if it's split into many small pieces, 700,000 small strings rather than 1,000 really long ones, it is totally reasonable for the size of data structure overhead to be triple the size of the actual data, pushing you over 200MB easily.
All sorts of things.
Let's consider for example that your rows have 10 text columns, which are represented as a simple Java Bean with 10 String fields.
A String has 4 fields: a char[], and 3 ints.
A String is descendant from Object, which has 1 int, and a reference to its class.
On a 64bit JVM, those references could well be 8 bytes (but not necessarily, but we'll stick with that for the sake of argument).
A 10 character string will have a char[10], and the 3 ints, which are 4 bytes each.
The char[10] is a pointer to an array. An array has to track its length, which is likely another 4 bytes, and it is also an Object (thus the class pointer and another int) plus the data. But characters in Java are represented as UTF-16 internally, 2 bytes per character. So, the actual array for 10 characters takes 24 bytes. And the reference to that array is a pointer.
So, a single String instance is: 8 + 4 for the Object, 8 + 4 + 4 + 4 for the String itself, and 8 + 4 + 20 for the actual data, or 62 bytes.
Your bean has 10 String fields, plus extends Object, so 8 + 4 + (10 * 8).
So, a single row from your database, for 100 chars of text, is 8 + 4 + (10 * 8) + (10 * 62) which equals 712 bytes.
These are not perfect numbers, I can't speak specifically to how arrays are stored, and the object references may well not be 8 bytes on a 64b JVM.
But it gives you some idea of the overhead involved. And this is just for your raw data. If you have those rows stored in an ArrayList, well, there's 70,000 * 8 just to point to your objects -- 560K for just the structure.