I think I have found a problem that seems to create a memory leak in Apache / PHP when unicode characters as a delimiter or sometimes anywhere in a regular expression with preg_match
and preg_replace
. It is possible that this happens in more preg_*
methods.
Test case 1
Create a new PHP file test.php
with the following contents:
<?php
preg_match( '°test°i', 'test', $matches );
Test case 2
Create a new PHP file test.php
with the following contents:
<?php
preg_match( '°', 'test', $matches );
The unicode character °
used as a delimiter is the degree sign. Try any other unicode character to see what happens, if you like.
Result
Having uploaded the file to a webserver with Apache 2.4.10 (Debian)
and PHP 5.6.0-1+b1
, run it from your favourite browser. Expect to see a blank page or a message saying either "invalid response" or "this page could not be loaded".
This will result in the following two lines in your Apache error.log (usually /var/log/error.log):
[Mon Dec 15 10:31:09.941622 2014] [:error] [pid 6292] [client ###.###.###.###:64413] PHP Warning: preg_match(): in /path/to/test.php on line 2
[Mon Dec 15 10:31:09.941796 2014] [:error] [pid 6292] [client ###.###.###.###:64413] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 139979759686648 bytes) in Unknown on line 0
Note that the amount of bytes PHP tried to allocate is just over 127 Terabytes.
You need to restart Apache now
Running PHP scripts after trying out the above script will result in all kinds of notices or fatal errors that pop-up even in code that shouldn't even be able to produce them. For instance, autoloading extended classes does not seem to work correctly anymore and may display errors like the following:
Class MyClass not found in file MyExtendingClass.php on Line 3
And the file MyExtendingClass.php would look like this:
<?php
class MyExtendingClass extends MyClass
{
}
As you can see MyClass
is clearly on line 2 and even though it does exist and the autoloader has been set up correctly, PHP can't find it anymore.
Obviously, don't use unicode characters in regular expressions. But why does PHP leak memory when using certain unicode characters? Is there an explanation for this behavior? I'd like to know why PHP thinks it should allocate such a vast amount of bytes.
System information
Apache/2.4.10 (Debian) PHP/5.6.0-1+b1 OpenSSL/1.0.1i configured