Using zlib filter with a socket pair

2019-04-08 13:38发布

问题:

For some reason, the zlib.deflate filter doesn't seem to be working with socket pairs generated by stream_socket_pair(). All that can be read from the second socket is the two-byte zlib header, and everything after that is NULL.

Example:

<?php
list($in, $out) = stream_socket_pair(STREAM_PF_UNIX,
                                     STREAM_SOCK_STREAM,
                                     STREAM_IPPROTO_IP);

$params = array('level' => 6, 'window' => 15, 'memory' => 9);

stream_filter_append($in, 'zlib.deflate', STREAM_FILTER_WRITE, $params);
stream_set_blocking($in, 0);
stream_set_blocking($out, 0);

fwrite($in, 'Some big long string.');
$compressed = fread($out, 1024);
var_dump($compressed);

fwrite($in, 'Some big long string, take two.');
$compressed = fread($out, 1024);
var_dump($compressed);

fwrite($in, 'Some big long string - third time is the charm?');
$compressed = fread($out, 1024);
var_dump($compressed);

Output:

string(2) "x�"
string(0) ""
string(0) ""

If I comment out the call to stream_filter_append(), the stream writing/reading functions correctly, with the data being dumped in its entirety all three times, and if I direct the zlib filtered stream into a file instead of through the socket pair, the compressed data is written correctly. So both parts function correctly separately, but not together. Is this a PHP bug that I should report, or an error on my part?

This question is branched from a solution to this related question.

回答1:

Looking through the C source code, the problem is that the filter always lets zlib's deflate() function decide how much data to accumulate before producing compressed output. The deflate filter does not create a new data bucket to pass on unless deflate() outputs some data (see line 235) or the PSFS_FLAG_FLUSH_CLOSE flag bit is set (line 250). That's why you only see the header bytes until you close $in; the first call to deflate() outputs the two header bytes, so data->strm.avail_out is 2 and a new bucket is created for these two bytes to pass on.

Note that fflush() does not work because of a known issue with the zlib filter. See: Bug #48725 Support for flushing in zlib stream.

Unfortunately, there does not appear to be a nice work-around to this. I started writing a filter in PHP by extending php_user_filter, but quickly ran into the problem that php_user_filter does not expose the flag bits, only whether flags & PSFS_FLAG_FLUSH_CLOSE (the fourth parameter to the filter() method, a boolean argument commonly named $closing). You would need to modify the C sources yourself to fix Bug #48725. Alternatively, re-write it.

Personally I would consider re-writing it because there seems to be a few eyebrow-raising issues with the code:

  • status = deflate(&(data->strm), flags & PSFS_FLAG_FLUSH_CLOSE ? Z_FULL_FLUSH : (flags & PSFS_FLAG_FLUSH_INC ? Z_SYNC_FLUSH : Z_NO_FLUSH)); seems odd because when writing, I don't know why flags would be anything other than PSFS_FLAG_NORMAL. Is it possible to write & flush at the same time? In any case, handling the flags should be done outside of the while loop through the "in" bucket brigade, like how PSFS_FLAG_FLUSH_CLOSE is handled outside of this loop.
  • Line 221, the memcpy to data->strm.next_in seems to ignore the fact that data->strm.avail_in may be non-zero, so the compressed output might skip some data of a write. See, for example, the following text from the zlib manual:

    If not all input can be processed (because there is not enough room in the output buffer), next_in and avail_in are updated and processing will resume at this point for the next call of deflate().

    In other words, it is possible that avail_in is non-zero.

  • The if statement on line 235, if (data->strm.avail_out < data->outbuf_len) should probably be if (data->strm.avail_out) or perhaps if (data->strm.avail_out > 2).
  • I'm not sure why *bytes_consumed = consumed; isn't *bytes_consumed += consumed;. The example streams at http://www.php.net/manual/en/function.stream-filter-register.php all use += to update $consumed.

EDIT: *bytes_consumed = consumed; is correct. The standard filter implementations all use = rather than += to update the size_t value pointed to by the fifth parameter. Also, even though $consumed += ... on the PHP side effectively translates to += on the size_t (see lines 206 and 231 of ext/standard/user_filters.c), the native filter function is called with either a NULL pointer or a pointer to a size_t set to 0 for the fifth argument (see lines 361 and 452 of main/streams/filter.c).



回答2:

I had worked on the PHP source code and found a fix.

To understand what happens I had traced the code during a

....
for ($i = 0 ; $i < 3 ; $i++) {
    fwrite($s[0], ...);
    fread($s[1], ...);
    fflush($s[0], ...);
    fread($s[1], ...);
    }

loop and I found that the deflate function is never called with the Z_SYNC_FLUSH flag set because no new data are present into the backets_in brigade.

My fix is to manage the (PSFS_FLAG_FLUSH_INC flag is set AND no iterations are performed on deflate function case) extending the

if (flags & PSFS_FLAG_FLUSH_CLOSE) {

managing FLUSH_INC too:

if (flags & PSFS_FLAG_FLUSH_CLOSE || (flags & PSFS_FLAG_FLUSH_INC && to_be_flushed)) {

This downloadable patch is for debian squeeze version of PHP but the current git version of the file is closer to it so I suppose to port the fix is simply (few lines).

If some side effect arises please contact me.



回答3:

You need to close the stream after the write to flush it before the data will come in from the read.

list($in, $out) = stream_socket_pair(STREAM_PF_UNIX,
                                     STREAM_SOCK_STREAM,
                                     STREAM_IPPROTO_IP);

$params = array('level' => 6, 'window' => 15, 'memory' => 9);

stream_filter_append($out, 'zlib.deflate', STREAM_FILTER_WRITE, $params);
stream_set_blocking($out, 0);
stream_set_blocking($in, 0);

fwrite($out, 'Some big long string.');
fclose($out);
$compressed = fread($in, 1024);
echo "Compressed:" . bin2hex($compressed) . "<br>\n";


list($in, $out) = stream_socket_pair(STREAM_PF_UNIX,
                                     STREAM_SOCK_STREAM,
                                     STREAM_IPPROTO_IP);

$params = array('level' => 6, 'window' => 15, 'memory' => 9);

stream_filter_append($out, 'zlib.deflate', STREAM_FILTER_WRITE, $params);
stream_set_blocking($out, 0);
stream_set_blocking($in, 0);


fwrite($out, 'Some big long string, take two.');
fclose($out);
$compressed = fread($in, 1024);
echo "Compressed:" . bin2hex($compressed) . "<br>\n";

list($in, $out) = stream_socket_pair(STREAM_PF_UNIX,
                                     STREAM_SOCK_STREAM,
                                     STREAM_IPPROTO_IP);

$params = array('level' => 6, 'window' => 15, 'memory' => 9);

stream_filter_append($out, 'zlib.deflate', STREAM_FILTER_WRITE, $params);
stream_set_blocking($out, 0);
stream_set_blocking($in, 0);

fwrite($out, 'Some big long string - third time is the charm?');
fclose($out);
$compressed = fread($in, 1024);
echo "Compressed:" . bin2hex($compressed) . "<br>\n";

That produces: Compressed:789c0bcecf4d5548ca4c57c8c9cf4b57282e29cacc4bd70300532b079c Compressed:789c0bcecf4d5548ca4c57c8c9cf4b57282e29cacc4bd7512849cc4e552829cfd70300b1b50b07 Compressed:789c0bcecf4d5548ca4c57c8c9cf4b57282e29ca0452ba0a25199945290a259940c9cc62202f55213923b128d71e008e4c108c

Also I switched the $in and $out because writing to $in confused me.