C ++拥有的std ::向量和Java有ArrayList中,和许多其他的语言有自己的动态分配的数组的形式。 当一个动态数组运行的空间,它被重新分配到一个更大的区域和老值复制到新的数组。 中央以在此阵列的性能的一个问题是该阵列如何快速增长的大小。 如果你总是只增长足够大,以适应当前推,你最终会重新分配每次。 因此,它是有道理的两倍数组大小,或者说1.5倍乘以。
是否有一个理想的生长因子? 2倍? 1.5倍? 通过理想我的意思是数学有道理的,最好的平衡性能和内存浪费。 我认识到,从理论上说,考虑到你的应用程序可能有任何潜在的分布推,这在某种程度上取决于应用。 但我想知道是否有一个值,这就是“通常是”最好的,或者被认为是最好的一些严格的限制之内。
我听说有这个地方的论文,但我一直无法找到它。
这将完全取决于使用情况。 你更在乎浪费的时间数据拷贝(并重新分配阵列)或额外的内存? 阵列多久会持续下去? 如果它不会是周围长,使用更大的缓冲也不失为一个好主意 - 刑罚是短命的。 如果它要流连(例如,在Java中,进入老了几岁)这显然更多的是惩罚。
有作为没有这样的事情“理想的生长因子。” 这不只是理论上依赖于应用程序,它绝对依赖于应用程序。
2是一个很常见的生长因子-我敢肯定那是什么ArrayList
和List<T>
在.NET使用。 ArrayList<T>
在Java使用1.5。
编辑:由于埃里希指出, Dictionary<,>
在.NET使用“双尺寸然后增加至下一个质数”,使散列值可以合理桶之间进行分配。 (我敢肯定,我最近看到的文档表明素数实际上并没有分发散列桶很大,但那是另一种答案的理由。)
我记得为什么1.5优于两种,至少适用于C ++(这可能并不适用于托管语言,在运行时系统可以随意搬迁对象)多年前读书。
原因是这样的:
- 假设您从一个16字节的分配。
- 当你需要更多的,可以分配32个字节,然后腾出16个字节。 这使得在内存中的16个字节的孔。
- 当你需要更多的,可以分配64个字节,解放了32个字节。 这留下48字节的孔(如果16和32是相邻的)。
- 当您需要更多,你分配128个字节,解放了64个字节。 这留下一个112字节的孔(假定所有以前分配是相邻的)。
- 所以和等等。
这个想法是,用2倍的扩展,有没有及时点所产生的孔以往任何时候都足够大,以重新用于下一次分配。 采用1.5倍的分配,我们有这个:
- 先从16个字节。
- 当你需要更多的,分配24个字节,然后腾出16,留下一个16字节的孔。
- 当你需要更多的,分配36个字节,然后腾出24,留下一个40字节的孔。
- 当你需要更多的,分配54个字节,然后腾出36,留下一个76字节的孔。
- 当你需要更多的,分配81个字节,然后腾出54,留下了130个字节的孔。
- 当需要更多的,从130字节的孔使用122个字节(四舍五入)。
理想的情况下(在极限当n→∞), 它是黄金比例 :φ= 1.618 ...
在实践中,你想要的东西密切,像1.5。
原因是,你希望能够重复使用旧的内存块,采取缓存的优势,避免不断地使OS给你更多的内存页面。 方程式你解决,以确保这降低至Xn - 1 - 1 = X N + 1 - ×n的 ,它的解决方法X =φ为大的n。
回答这样的问题时,一种方法是只是“欺骗”,看看哪些流行库做的假设下,一个广泛使用的库,起码,不是做一些可怕的事情。
所以只是检查非常快,红宝石(1.9.1-P129)追加到数组时,和Python(2.6.2)使用1.125x加上一个常数(出现使用1.5倍Objects/listobject.c
):
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
/* check for integer overflow */
if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
newsize
以上是阵列中元件的数量。 注意清楚, newsize
加入到new_allocated
,所以与bitshifts和三元运算符表达式是真的只是计算的超额分配。
比方说,你通过增加数组的大小x
。 所以,假设你开始与大小T
。 下一次你成长的数组的大小将是T*x
。 那么这将是T*x^2
等。
如果你的目标是能够重新使用之前已经建立的内存,那么你要确保你分配新的内存小于以前的记忆,你释放的总和。 因此,我们有这个不等式:
T*x^n <= T + T*x + T*x^2 + ... + T*x^(n-2)
我们可以从两侧去除T。 因此,我们得到这样的:
x^n <= 1 + x + x^2 + ... + x^(n-2)
通俗地说,就是我们说的是,在nth
分配,我们希望所有的先前释放的内存在第n个分配大于或等于内存需求,使我们可以重新使用先前释放的内存。
举例来说,如果我们希望能够做到这一点,在第三步(即n=3
),然后我们
x^3 <= 1 + x
这个等式为真对于所有x,使得0 < x <= 1.3
(大约)
看看x我们得到不同的n是下面:
n maximum-x (roughly)
3 1.3
4 1.4
5 1.53
6 1.57
7 1.59
22 1.61
需要注意的是生长因子是小于2
,因为x^n > x^(n-2) + ... + x^2 + x + 1 for all x>=2
。
这要看。 有人分析常见的使用情况,以找到最佳的数量。
我见过的1.5倍2.0倍披X和2的幂之前使用。
如果你有超过数组的长度分布,和你有一个效用函数,说你喜欢多大的浪费空间与浪费时间,那么你可以毫不犹豫地选择最佳的调整大小(与初始大小)的策略。
简单的固定倍数使用的原因,显然是让每个追加已分期常量时间。 但是,这并不意味着你不能使用小尺寸不同的(大)的比例。
在Scala中,可以覆盖loadFactor与该着眼于当前尺寸的功能标准库散表。 奇怪的是,可调整大小的数组只是翻一番,这是大多数人在实践中做。
我不知道任何加倍(或1.5 *荷兰国际集团)阵列实际上赶上内存不足的错误和成长在这种情况下少。 看来,如果你有一个巨大的单一阵列,你会想这样做。
我会进一步增加,如果你保持可调整大小的数组足够长的时间,你倾向于随着时间的推移空间,它可能是有意义的大幅overallocate(在大多数情况下)开始,然后再分配到什么时候你在正确的大小完成。
I agree with Jon Skeet, even my theorycrafter friend insists that this can be proven to be O(1) when setting the factor to 2x.
The ratio between cpu time and memory is different on each machine, and so the factor will vary just as much. If you have a machine with gigabytes of ram, and a slow CPU, copying the elements to a new array is a lot more expensive than on a fast machine, which might in turn have less memory. It's a question that can be answered in theory, for a uniform computer, which in real scenarios doesnt help you at all.
I know it is an old question, but there are several things that everyone seems to be missing.
First, this is multiplication by 2: size << 1. This is multiplication by anything between 1 and 2: int(float(size) * x), where x is the number, the * is floating point math, and the processor has to run additional instructions for casting between float and int. In other words, at the machine level, doubling takes a single, very fast instruction to find the new size. Multiplying by something between 1 and 2 requires at least one instruction to cast size to a float, one instruction to multiply (which is float multiplication, so it probably takes at least twice as many cycles, if not 4 or even 8 times as many), and one instruction to cast back to int, and that assumes that your platform can perform float math on the general purpose registers, instead of requiring the use of special registers. In short, you should expect the math for each allocation to take at least 10 times as long as a simple left shift. If you are copying a lot of data during the reallocation though, this might not make much of a difference.
Second, and probably the big kicker: Everyone seems to assume that the memory that is being freed is both contiguous with itself, as well as contiguous with the newly allocated memory. Unless you are pre-allocating all of the memory yourself and then using it as a pool, this is almost certainly not the case. The OS might occasionally end up doing this, but most of the time, there is going to be enough free space fragmentation that any half decent memory management system will be able to find a small hole where your memory will just fit. Once you get to really bit chunks, you are more likely to end up with contiguous pieces, but by then, your allocations are big enough that you are not doing them frequently enough for it to matter anymore. In short, it is fun to imagine that using some ideal number will allow the most efficient use of free memory space, but in reality, it is not going to happen unless your program is running on bare metal (as in, there is no OS underneath it making all of the decisions).
My answer to the question? Nope, there is no ideal number. It is so application specific that no one really even tries. If your goal is ideal memory usage, you are pretty much out of luck. For performance, less frequent allocations are better, but if we went just with that, we could multiply by 4 or even 8! Of course, when Firefox jumps from using 1GB to 8GB in one shot, people are going to complain, so that does not even make sense. Here are some rules of thumb I would go by though:
If you cannot optimize memory usage, at least don't waste processor cycles. Multiplying by 2 is at least an order of magnitude faster than doing floating point math. It might not make a huge difference, but it will make some difference at least (especially early on, during the more frequent and smaller allocations).
Don't overthink it. If you just spent 4 hours trying to figure out how to do something that has already been done, you just wasted your time. Totally honestly, if there was a better option than *2, it would have been done in the C++ vector class (and many other places) decades ago.
Lastly, if you really want to optimize, don't sweat the small stuff. Now days, no one cares about 4KB of memory being wasted, unless they are working on embedded systems. When you get to 1GB of objects that are between 1MB and 10MB each, doubling is probably way too much (I mean, that is between 100 and 1,000 objects). If you can estimate expected expansion rate, you can level it out to a linear growth rate at a certain point. If you expect around 10 objects per minute, then growing at 5 to 10 object sizes per step (once every 30 seconds to a minute) is probably fine.
What it all comes down to is, don't over think it, optimize what you can, and customize to your application (and platform) if you must.
另两分钱
- 大多数计算机有虚拟内存! 在物理内存,你可以有到处随机页面被显示为你的程序的虚拟内存一个连续的空间。 该解决间接的由硬件完成。 虚拟内存耗尽时在32个系统中的问题,但它是真的不是一个问题了。 因此,填充孔是不是一个问题了(除特殊环境)。 由于Windows 7甚至微软支持64位不用额外的努力。 @ 2011
- O(1)达到与任何R> 1分的因素。 相同的数学证明不仅对2的参数工作。
- R = 1.5 时可以与计算
old*3/2
所以没有必要浮点运算。 (我说/2
,因为编译器将与在如果他们认为合适的生成的汇编代码比特移位替换它。) - MSVC去了R = 1.5,所以有不使用2作为比至少一个主要的编译器。
正如有人提到2感觉比8.更好,还有2感觉比1.1要好。
我的感觉是,1.5是一个很好的默认值。 除此之外,它取决于具体情况。