TCL max size of array

2019-02-17 12:58发布

问题:

I'm working on an engineering application, and the interface is written in TCL TK.

Everything went fine until I need to use a (extremely) large array. 370.000.000 of elements, each element from 2 to 10 characters length (linear grown).

My question is, ¿where is the size limit for TCL arrays? I've been reading and investigating and the only I've found is "2GB" of string data, but I dont know if it's reliable because it doesn't explain the reason.

I did an experiment:

set lista [list ]
catch {
    for {set i 0} {$i < 370000000} {incr i} {
        lappend lista $i
    }
}
puts $i

returns $i = 50.000.000 more or less on a 32 bits Windows 7

回答1:

It's a bit complicated to explain. The 2GB limit comes from the low-level memory allocator, which has a size limit because it uses a signed 32-bit integer to describe how much memory to allocate. That was fine on 32-bit systems, but it's an open bug (which might be assigned to me) that it's still true on 64-bit systems; the right type in the C API is actually ssize_t (yeah, still signed; negative values are used for signalling) but fixing it completely wrecks a lot of API, so it requires a major version change to sort out.

But the maximum size of a list is something else. That is fundamentally linked to a combination of a few things. Firstly, there's the maximum size of memory structure that can be allocated (the 2GB limit) which means that you probably can't reliably get more than 256M elements in a list on a 64-bit system. Then there's the total number of items allocated, though that's less of a problem in practice, particularly if you actually put items in the list multiple times (as they share references). Finally, there's the size of the string representation of the list: if you're generating that a lot, you're doing it wrong anyway, but that would be the real limiting factor in your example if you were creating it (as that will hit the 2GB limit sooner).

The actual point where you hit the memory limit might be lower, depending on when your system starts to deny requests to allocate memory. That's all up to the OS, which tends to base its decision on what else is going on on the system, so it's incredibly hard to give any kind of general rule there. My (64-bit, OSX) system took ages, but succeeded in running your sample code:

$ tclsh8.6
% eval {
set lista [list ]
catch {
    for {set i 0} {$i < 370000000} {incr i} {
        lappend lista $i
    }
}
puts $i
}
370000000
% llength $lista
370000000
% unset lista
% exit

The llength was the only truly quick operation (since it could pull the length out of the list metadata). The unset took ages. The exit was pretty quick, but took a few seconds.