TRestClient/TRestRequest incorrectly decodes gzip

2019-04-29 07:57发布

问题:

I tried to read a REST API, which is gzip encoded. To be exact, I tried to read the StackExchange API.

I already found the question Automatically Decode GZIP In TRESTResponse?, but that answer doesn't solve my issue for some reason.

Test setup

In XE5, I added a TRestClient, a TRestRequest and a TRestResponse with the following relevant properties. I set the BaseURL of the client, the resource and parameters of the request, and I set AcceptEncoding of the request to gzip, deflate, which should make it automatically decode gzipped responses.

  object RESTClient1: TRESTClient
    BaseURL = 'https://api.stackexchange.com/2.2'
  end
  object RESTRequest1: TRESTRequest
    AcceptEncoding = 'gzip, deflate'
    Client = RESTClient1
    Params = <
      item
        Kind = pkURLSEGMENT
        name = 'id'
        Options = [poAutoCreated]
        Value = '511529'
      end
      item
        name = 'site'
        Value = 'stackoverflow'
      end>
    Resource = 'users/{id}'
    Response = RESTResponse1
  end
  object RESTResponse1: TRESTResponse
  end

This results in the url:

https://api.stackexchange.com/2.2/users/511529?site=stackoverflow

I invoke the request like this, with two message boxes to show the url and the outcome of the request:

ShowMessage(RESTRequest1.GetFullRequestURL());
RESTRequest1.Execute; // Actual call
ShowMessage(RESTResponse1.Content);

If I call that url in a browser, I get a proper result, which is a json object with some of my user information in it.

Problem

However, in Delphi, I don't get the JSON response. In fact, I get a bunch of bytes which seems to be a mangled gzip response. I tried to decompress it with TIdCompressorZlib.DecompressGZipStream(), but it fails with a ZLib Error (-3). When I inspect the bytes of the response myself, I see it starts with #1F#3F#08. This is especially weird, since the gzip header should be #1F#8B#08, so #8B is transformed into #3F, which is a question mark.

So it seems to me like the RESTClient has attempted to decode the gzip stream as if it was a UTF-8 response, and has replaced invalid sequences (#8B is in itself not a valid UTF-8 character) with a question mark.

Attempts (superficial)

I've done quite some experimenting, like

  • Use RESTResponse.RawBytes and try to decode it. I noticed the bytes in this byte array are already invalid. Comments in the source of TRESTResponse taught me that 'RawBytes' is already decoded, so that makes sense.
  • Saved RESTResponse.RawBytes in a file and tried to decompress it with 7zip and a couple of online gzip decompressors. They all failed, of course, since even the gzip header is incorrect.
  • Assigned the value 'gzip, deflate' to TRESTClient.AcceptEncoding, TRESTResponse.AcceptEncoding and a combination of those. Also tried to append it to the pre-filled Accept property of each of those components.
  • Switched from authenticated to an unauthenticated request. I had the whole oAuth part working, but I though that would make the question too complex. The anonymous API which I used in this question has the same issue, though.

Unfortunately it still doesn't work and I still get a mangled response.

Attemps (digging into the VCL)

Eventually, I dug a little deeper, and dove into TRestRequest.Execute. I won't paste all the code here, but eventually it performs the request by calling

FClient.HTTPClient.Get(LURL, LResponseStream);

FClient is the TRESTClient that is linked to the request and LResponseStream is a TMemoryStream. I added LResponseStream.SaveToFile('...') to the watches, so it would save this unprocessed result, et voilá, it gave me a valid gz file, which I could decompress to get my JSON.

A bug in the work-around?

But then, a couple of lines down, I see this piece of code:

  if FClient.HTTPClient.Response.CharSet > '' then
  begin
    LResponseStream.Position := 0;
    S := FClient.HTTPClient.ReadStringAsCharset(LResponseStream, FClient.HTTPClient.Response.CharSet);
    LResponseStream.Free;
    LResponseStream := TStringStream.Create(S);
  end;

According to the comment above this block, this is done because the contents of the memory stream are "NOT encoded accordingly to a possibly present Encoding or Content-Type Charset parameter", which is considered a bug in Indy by the writer of this VCL code.

So basically, what happens here: the raw response is treated as a string and converted to the 'right' encoding. FClient.HTTPClient.Response.CharSet is 'UTF-8', which is indeed the encoding of the JSON, but unfortunately, this conversion should only be done after decompressing the stream, which isn't done yet. So this is considered a bug by me. ;)

I tried to dig deeper, but I couldn't find the place where this decompression should have taken place. The actual request is performed by an IIPHTTP instance, which is IPPeerAPI.dcu of which I don't have the source.

So...

So my question is twofold:

  1. Why does this happen? TRestClient should automatically decode the gzip stream when you set AcceptEncoding to 'gzip, deflate'. What setting did I miss? Or isn't this supported yet in XE5?
  2. How do I prevent this incorrect translation of the gzip stream? I don't mind decoding the response myself, as long as it works, although ideally the REST components should do it automatically.

My setup: VCL Forms application, Windows 8.1, Delphi XE5 professional Update 2.

Update

  • Work-around was found (see my answer)
  • Bug report RSP-9855 filed in quality central
  • It's supposedly fixed in Delphi 10.1 (Berlin), but I have yet to test this.

回答1:

Remy Lebeau's input in his answer to this question as well as his comment to the answer in the question Automatically Decode GZIP In TRESTResponse? put me on the right track.

Like he said, setting AcceptEncoding doesn't suffice, because the TIdHTTP that performs the actual request doesn't have a decompressor attached, so it can't decompress the gzip response. Based on the sparse resources, I got the idea that setting AcceptEncoding would automatically decompress the response too, but that idea was wrong.

Still, leaving AcceptEncoding empty doesn't work either in this case, since the API this is all about, which is the StackExchange API, is always compressed, regardless whether you specify that you accept gzip or not.

So the combination of a) an always compressed response, b) an HTTP client that cannot decompress and c) a TRESTRequest object that -incorrectly- assumed that the response is already properly decompressed together lead to this situation.

I see only two solutions, the first being to discard TRESTClient altogether and just perform the request with a plain TIdHTTP. A pity, since my goal was to explore the possibilities of the new REST components to see how they can make life easier.

So the other solution is to assign a compressor to the TIdHTTP that is used internally.

I managed to succeed, although unfortunately it undoes a lot of the abstraction that the TREST components are trying to introduce. This is the code that solves it:

var
  Http: TIdCustomHTTP;
begin
  // Get the TIdHTTP that performs the request.
  Http := (RESTRequest1 // The TRESTRequest object
    .Client // The TRESTClient
    .HTTPClient // A TRESTHTTP object that wraps HTTP communication
    .Peer // An IIPHTTP interface which is obtained through PeerFactory.CreatePeer
    .GetObject // A method to get the object instance of the interface
    as TIdCustomHTTP // The object instance, which is an TIdCustomHTTP.
  );

  // Attach a gzip decompressor to it.
  Http.Compressor := TIdCompressorZLib.Create(Http);

After this, I can use the RESTRequest1 component to successfully fetch the JSON response (at least as text).



回答2:

AcceptEncoding = 'gzip, deflate'

This is the root of your problem. You are manually telling the server that the response is allowed to be gzip encoded, but as far as I can see in the REST source code, the underlying TIdHTTP object that TRESTClient uses internally does not have a gzip decompressor assigned to it (even if it had one, assigning AcceptEncoding manually would still be wrong, because TIdHTTP sets up its own Accept-Encoding header if a decompressor is assigned). I commented on that in the other question you linked to. So TIdHTTP ends up returning the raw gzip bytes without decoding them, and then TRESTClient converts them as-is to a charset-decoded UnicodeString (since you are reading the Content property). That is why you are seeing the bytes getting messed up.

You need to get rid of the AcceptEncoding assignment.

Why does this happen?

Because TRestClient does not assign a gzip decompressor to its internal TIdHTTP object, but you are tricking the server into thinking it did.

should automatically decode the gzip stream when you set AcceptEncoding to 'gzip, deflate'

No, because there is no decompressor assigned.

Update: that being said, I would probably just drop TRESTClient and use TIdHTTP directly. The following works for me when I try it:

var
  HTTP: TIdHTTP;
  JSON: string;
begin
  HTTP := TIdHTTP.Create;
  try
    HTTP.Compressor := TIdCompressorZLib.Create(HTTP);
    // starting with SVN rev 5224, the TIdHTTP.IOHandler property no longer
    // needs to be explicitly set in order to request HTTPS urls.  TIdHTTP
    // now creates a default SSLIOHandler internally if needed.  But if you
    // are using an older release, you will have to assign the IOHandler... 
    //
    // HTTP.IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(HTTP);
    //
    JSON := HTTP.Get('https://api.stackexchange.com/2.2/users/511529?site=stackoverflow');
  finally
    Http.Free;
  end;
  ShowMessage(JSON);
end;

Displays:

{"items":[{"badge_counts":{"bronze":96,"silver":53,"gold":4},"account_id":240984,"is_employee":false,"last_modified_date":1419235802,"last_access_date":1419293282,"reputation_change_year":15259,"reputation_change_quarter":2983,"reputation_change_month":1301,"reputation_change_week":123,"reputation_change_day":0,"reputation":61014,"creation_date":1290042241,"user_type":"registered","user_id":511529,"accept_rate":100,"location":"Netherlands","website_url":"http://www.eftepedia.nl","link":"https://stackoverflow.com/users/511529/goleztrol","display_name":"GolezTrol","profile_image":"https://www.gravatar.com/avatar/b07c67edfcc5d1496365503712de5c2a?s=128&d=identicon&r=PG"}],"has_more":false,"quota_max":300,"quota_remaining":295}