Force freeing memory in PHP

2019-01-08 09:19发布

In a PHP program, I sequentially read a bunch of files (with file_get_contents), gzdecode them, json_decode the result, analyze the contents, throw most of it away, and store about 1% in an array.

Unfortunately, with each iteration (I traverse over an array containing the filenames), there seems to be some memory lost (according to memory_get_peak_usage, about 2-10 MB each time). I have double- and triple-checked my code; I am not storing unneeded data in the loop (and the needed data hardly exceeds about 10MB overall), but I am frequently rewriting (actually, strings in an array). Apparently, PHP does not free the memory correctly, thus using more and more RAM until it hits the limit.

Is there any way to do a forced garbage collection? Or, at least, to find out where the memory is used?

8条回答
Melony?
2楼-- · 2019-01-08 09:32

Found the solution: it was a string concatenation. I was generating the input line by line by concatenating some variables (the output is a CSV file). However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data. Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior.

For some reason - not obvious to me - PHP reported the increased memory usage during json_decode calls, which mislead me to the assumption that the json_decode function was the problem.

查看更多
我欲成王,谁敢阻挡
3楼-- · 2019-01-08 09:33

it has to do with memory fragmentation.

Consider two strings, concatenated to one string. Each original must remain until the output is created. The output is longer than either input.
Therefore, a new allocation must be made to store the result of such a concatenation. The original strings are freed but they are small blocks of memory.
In a case of 'str1' . 'str2' . 'str3' . 'str4' you have several temps being created at each . -- and none of them fit in the space thats been freed up. The strings are likely not laid out in contiguous memory (that is, each string is, but the various strings are not laid end to end) due to other uses of the memory. So freeing the string creates a problem because the space can't be reused effectively. So you grow with each tmp you create. And you don't re-use anything, ever.

Using the array based implode, you create only 1 output -- exactly the length you require. Performing only 1 additional allocation. So its much more memory efficient and it doesn't suffer from the concatenation fragmentation. Same is true of python. If you need to concatenate strings, more than 1 concatenation should always be array based:

''.join(['str1','str2','str3'])

in python

implode('', array('str1', 'str2', 'str3'))

in PHP

sprintf equivalents are also fine.

The memory reported by memory_get_peak_usage is basically always the "last" bit of memory in the virtual map it had to use. So since its always growing, it reports rapid growth. As each allocation falls "at the end" of the currently used memory block.

查看更多
Root(大扎)
4楼-- · 2019-01-08 09:33

I was going to say that I wouldn't necessarily expect gc_collect_cycles() to solve the problem - since presumably the files are no longer mapped to zvars. But did you check that gc_enable was called before loading any files?

I've noticed that PHP seems to gobble up memory when doing includes - much more than is required for the source and the tokenized file - this may be a similar problem. I'm not saying that this is a bug though.

I believe one workaround would be not to use file_get_contents but rather fopen()....fgets()...fclose() rather than mapping the whole file into memory in one go. But you'd need to try it to confirm.

HTH

C.

查看更多
We Are One
5楼-- · 2019-01-08 09:40

I've found that PHP's internal memory manager is most-likely to be invoked upon completion of a function. Knowing that, I've refactored code in a loop like so:

while (condition) {
  // do
  // cool
  // stuff
}

to

while (condition) {
  do_cool_stuff();
}

function do_cool_stuff() {
  // do
  // cool
  // stuff
}

EDIT

I ran this quick benchmark and did not see an increase in memory usage. This leads me to believe the leak is not in json_decode()

for($x=0;$x<10000000;$x++)
{
  do_something_cool();
}

function do_something_cool() {
  $json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
  $result = json_decode($json);
  echo memory_get_peak_usage() . PHP_EOL;
}
查看更多
Animai°情兽
6楼-- · 2019-01-08 09:45

There recently was a similar issue with System_Daemon. Today I isolated my problem to file_get_contents.

Could you try using fread instead? I think this may solve your problem. If it does, it's probably time to do a bugreport over at PHP.

查看更多
一夜七次
7楼-- · 2019-01-08 09:47

I just had the same problem and found a possible workaround.

SITUATION: I was writing from a db query into csv files. I always allocated one $row, then reassigned it in the next step. Unsetting $row didn't help; putting an 5MB string into $row first (to avoid fragmentation) didn't help; creating an array of $row-s (loading many rows into it + unsetting the whole thing in every 5000th step) didn't help; really tried a couple of things.

BUT.

When I made a separate function that opens the file, transfers 100.000 lines (just enough not to eat up the whole memory) and closes the file, THEN I made subsequent calls to this function (appending to the existing file), I found that for every function exit, PHP removed the garbage. It was a local-variable-space thing.

CONCLUSION: Whenever your function exits, it frees all local variables.

This is the rule, as far as I found out. Just one side note however: when I tried to make my "do_only_a_smaller_subset()" function get some variables by reference (namely the query object and the file pointer), garbage collection did not happen. Now maybe I'm misunderstanding something and maybe the query object (mysqli) was leaking, well, I don't know. However, since it was passed by ref, obviously it couldn't get cleaned up since it existed afer the small function's exit point.

So, worth a try! It saved my day to find this out.

查看更多
登录 后发表回答