For some reason, the zlib.deflate
filter doesn't seem to be working with socket pairs generated by stream_socket_pair()
. All that can be read from the second socket is the two-byte zlib header, and everything after that is NULL.
Example:
<?php
list($in, $out) = stream_socket_pair(STREAM_PF_UNIX,
STREAM_SOCK_STREAM,
STREAM_IPPROTO_IP);
$params = array('level' => 6, 'window' => 15, 'memory' => 9);
stream_filter_append($in, 'zlib.deflate', STREAM_FILTER_WRITE, $params);
stream_set_blocking($in, 0);
stream_set_blocking($out, 0);
fwrite($in, 'Some big long string.');
$compressed = fread($out, 1024);
var_dump($compressed);
fwrite($in, 'Some big long string, take two.');
$compressed = fread($out, 1024);
var_dump($compressed);
fwrite($in, 'Some big long string - third time is the charm?');
$compressed = fread($out, 1024);
var_dump($compressed);
Output:
string(2) "x�"
string(0) ""
string(0) ""
If I comment out the call to stream_filter_append()
, the stream writing/reading functions correctly, with the data being dumped in its entirety all three times, and if I direct the zlib filtered stream into a file instead of through the socket pair, the compressed data is written correctly. So both parts function correctly separately, but not together. Is this a PHP bug that I should report, or an error on my part?
This question is branched from a solution to this related question.
I had worked on the PHP source code and found a fix.
To understand what happens I had traced the code during a
loop and I found that the
deflate
function is never called with theZ_SYNC_FLUSH
flag set because no new data are present into thebackets_in
brigade.My fix is to manage the (
PSFS_FLAG_FLUSH_INC
flag is setAND
no iterations are performed on deflate function case) extending themanaging
FLUSH_INC
too:This downloadable patch is for
debian squeeze
version of PHP but the current git version of the file is closer to it so I suppose to port the fix is simply (few lines).If some side effect arises please contact me.
Looking through the C source code, the problem is that the filter always lets zlib's
deflate()
function decide how much data to accumulate before producing compressed output. The deflate filter does not create a new data bucket to pass on unlessdeflate()
outputs some data (see line 235) or thePSFS_FLAG_FLUSH_CLOSE
flag bit is set (line 250). That's why you only see the header bytes until you close$in
; the first call todeflate()
outputs the two header bytes, sodata->strm.avail_out
is 2 and a new bucket is created for these two bytes to pass on.Note that
fflush()
does not work because of a known issue with the zlib filter. See: Bug #48725 Support for flushing in zlib stream.Unfortunately, there does not appear to be a nice work-around to this. I started writing a filter in PHP by extending
php_user_filter
, but quickly ran into the problem thatphp_user_filter
does not expose the flag bits, only whetherflags & PSFS_FLAG_FLUSH_CLOSE
(the fourth parameter to thefilter()
method, a boolean argument commonly named$closing
). You would need to modify the C sources yourself to fix Bug #48725. Alternatively, re-write it.Personally I would consider re-writing it because there seems to be a few eyebrow-raising issues with the code:
status = deflate(&(data->strm), flags & PSFS_FLAG_FLUSH_CLOSE ? Z_FULL_FLUSH : (flags & PSFS_FLAG_FLUSH_INC ? Z_SYNC_FLUSH : Z_NO_FLUSH));
seems odd because when writing, I don't know whyflags
would be anything other thanPSFS_FLAG_NORMAL
. Is it possible to write & flush at the same time? In any case, handling the flags should be done outside of thewhile
loop through the "in" bucket brigade, like howPSFS_FLAG_FLUSH_CLOSE
is handled outside of this loop.Line 221, the
memcpy
todata->strm.next_in
seems to ignore the fact thatdata->strm.avail_in
may be non-zero, so the compressed output might skip some data of a write. See, for example, the following text from the zlib manual:In other words, it is possible that
avail_in
is non-zero.if
statement on line 235,if (data->strm.avail_out < data->outbuf_len)
should probably beif (data->strm.avail_out)
or perhapsif (data->strm.avail_out > 2)
.I'm not sure why*bytes_consumed = consumed;
isn't*bytes_consumed += consumed;
. The example streams at http://www.php.net/manual/en/function.stream-filter-register.php all use+=
to update$consumed
.EDIT:
*bytes_consumed = consumed;
is correct. The standard filter implementations all use=
rather than+=
to update thesize_t
value pointed to by the fifth parameter. Also, even though$consumed += ...
on the PHP side effectively translates to+=
on thesize_t
(see lines 206 and 231 ofext/standard/user_filters.c
), the native filter function is called with either aNULL
pointer or a pointer to asize_t
set to 0 for the fifth argument (see lines 361 and 452 ofmain/streams/filter.c
).You need to close the stream after the write to flush it before the data will come in from the read.
That produces: Compressed:789c0bcecf4d5548ca4c57c8c9cf4b57282e29cacc4bd70300532b079c Compressed:789c0bcecf4d5548ca4c57c8c9cf4b57282e29cacc4bd7512849cc4e552829cfd70300b1b50b07 Compressed:789c0bcecf4d5548ca4c57c8c9cf4b57282e29ca0452ba0a25199945290a259940c9cc62202f55213923b128d71e008e4c108c
Also I switched the $in and $out because writing to $in confused me.