I am trying to reduce the memory usage of a javascript web application that stores a lot of information in memory in the form of a large number of small strings. When I changed the code to use Uint8Array
instead of String
, I noticed that memory usage went up.
For example, consider the following code that creates many small strings:
// (1000000 strings) x (10 characters)
var a=[];
for (let i=0; i<1000000; i++)
a.push("a".repeat(10).toUpperCase());
If you put it in an empty page and let the memory usage settle for a few seconds, it settles at 70 MiB on Google Chrome. On the other hand, the following code:
// (1000000 arrays) x (10 bytes)
var a=[];
for (let i=0; i<1000000; i++)
a.push(new Uint8Array(10));
uses 233 MiB of memory. An empty page without any code uses about 20 MiB. On the other hand, if I create a small number of large strings/arrays, the difference becomes smaller and in the case of a single string/array with 10000000 characters/entries, the memory usage is virtually identical.
So why do typed arrays have such a large memory overhead?
V8 developer here. Your conclusion makes sense: If you compare characters in a string to elements in a Uint8Array, the string will have less overhead. TypedArrays are great at providing fast access to typed elements; however having a large number of small TypedArrays is not memory efficient.
The difference is in the object header size for strings and typed arrays.
For a string, the object header is:
- hidden class pointer
- hash
- length
- payload
where the payload is rounded up to pointer size alignment, so 16 bytes in this case.
For a Uint8Array, you need the following:
- hidden class pointer
- properties pointer (unused)
- elements pointer (see below)
- array buffer pointer (see below)
- offset into array buffer
- byte length
- length of view into array buffer
- length (user-visible)
- embedder field #1
embedder field #2
array buffer: hidden class pointer
- array buffer: properties pointer (unused)
- array buffer: elements pointer (see below)
- array buffer: byte length
- array buffer: backing store
- array buffer: allocation base
- array buffer: allocation length
- array buffer: bit field (internal flags)
- array buffer: embedder field #1
array buffer: embedder field #2
elements object: hidden class pointer
- elements object: length (of the backing store)
- elements object: base pointer (of the backing store)
- elements object: offset to data start
- elements object: payload
where, again, the payload is rounded up to pointer size alignment, so consumes 16 bytes here.
In summary, each string consumes 5*8 = 40 bytes, each typed array consumes 26*8 = 208 bytes. That does seem like a lot of overhead; the reason is due to the various flexible options that TypedArrays provide (they can be overlapping views into ArrayBuffers, which can be allocated directly from JavaScript, or shared with WebGL and whatnot, etc).
(It's not about "optimizing memory allocation" nor being "better at garbage collecting strings" -- since you're holding on to all the objects, GC does not play a role.)
The typed arrays are not supposed to be used that way.
If you want high memory efficiency, use just one typed array to hold all of your integer numbers. Instead of use a huge number of arrays to hold your integer numbers due to low level reasons.
Those low level reasons are related to how much overhead is need to hold one object in memory, and that quantity depends on a few aspects like immutability and garbage collection. In this case hold one typed array has higher overhead than hold one simple string. Thats why you should pay that price one time only
You should take advantage of:
var a = []; for (let i=0; i<1000000; i++) a.push("1");
var b = new Uint8Array(10000000); for (let i=0; i<1000000; i++) a[i] = 1;
// 'b' is more memory efficient than 'a', just pay the price of Uint8Array one time
// and save the wasted memory in string allocation overhead