Response XML contains “ 2000 ” and “20a0” characte

2019-07-27 07:43发布

问题:

I have a WebDAV propfind request sent using PHP. The HTTP request looks like this:

PROPFIND /path/to/whatever HTTP/1.1
User-Agent: My Client
Accept-Encoding: deflate
Depth: 1
Host: example.com
Content-Type: text/xml;charset=UTF-8
Authorization: Basic bLahDeBlah=
Content-Length: 82
Connection: close

<?xml version='1.0' encoding='utf-8'?><propfind xmlns='DAV:'><allprop/></propfind>

It works fine when the response XML is less than about 1.5 MB. When the response is bigger, the XML contains characters like \r\n2000\r\n and occasionaly \r\n20a0\r\n.

I am using this PHP code to retrieve the response:

<?php
$output = "";
while (!feof($this->socket)) {
        $output .= fgets($this->socket, 1024);
}

I can get around this issue by stripping the unwanted characters from the response - but I'd like to prevent this. Any idea what could cause this?

Update: The response header contains Transfer-Encoding: chunked. Our PHP build is Windows and I believe there is no DLL available to use http_chunked_decode().

回答1:

As several people already have pointed out in the comments the "hex"-characters are inserted because of the response being chunked-encoded.

This stack-overflow question deals with the same issue (not using the PECL extension) and suggests the following code-snippet for decoding the response:

function decode_chunked($str) {
  for ($res = ''; !empty($str); $str = trim($str)) {
    $pos = strpos($str, "\r\n");
    $len = hexdec(substr($str, 0, $pos));
    $res.= substr($str, $pos + 2, $len);
    $str = substr($str, $pos + 2 + $len);
  }
  return $res;
}

As pointed out in the linked question, make sure the header Transfer-Encoding: chunked is set before applying the decoding.

Update: The Zend-Framework features a Response class that also supports chunked decoding. Note that that Zend\Http classes can be used as a stand-alone components (no need to have the full framework included in your app!).