I tried to read a REST API, which is gzip encoded. To be exact, I tried to read the StackExchange API.
I already found the question Automatically Decode GZIP In TRESTResponse?, but that answer doesn't solve my issue for some reason.
Test setup
In XE5, I added a TRestClient, a TRestRequest and a TRestResponse with the following relevant properties. I set the BaseURL of the client, the resource and parameters of the request, and I set AcceptEncoding
of the request to gzip, deflate
, which should make it automatically decode gzipped responses.
object RESTClient1: TRESTClient
BaseURL = 'https://api.stackexchange.com/2.2'
end
object RESTRequest1: TRESTRequest
AcceptEncoding = 'gzip, deflate'
Client = RESTClient1
Params = <
item
Kind = pkURLSEGMENT
name = 'id'
Options = [poAutoCreated]
Value = '511529'
end
item
name = 'site'
Value = 'stackoverflow'
end>
Resource = 'users/{id}'
Response = RESTResponse1
end
object RESTResponse1: TRESTResponse
end
This results in the url:
https://api.stackexchange.com/2.2/users/511529?site=stackoverflow
I invoke the request like this, with two message boxes to show the url and the outcome of the request:
ShowMessage(RESTRequest1.GetFullRequestURL());
RESTRequest1.Execute; // Actual call
ShowMessage(RESTResponse1.Content);
If I call that url in a browser, I get a proper result, which is a json object with some of my user information in it.
Problem
However, in Delphi, I don't get the JSON response. In fact, I get a bunch of bytes which seems to be a mangled gzip response. I tried to decompress it with TIdCompressorZlib.DecompressGZipStream()
, but it fails with a ZLib Error (-3)
. When I inspect the bytes of the response myself, I see it starts with #1F#3F#08. This is especially weird, since the gzip header should be #1F#8B#08, so #8B is transformed into #3F, which is a question mark.
So it seems to me like the RESTClient has attempted to decode the gzip stream as if it was a UTF-8 response, and has replaced invalid sequences (#8B is in itself not a valid UTF-8 character) with a question mark.
Attempts (superficial)
I've done quite some experimenting, like
- Use RESTResponse.RawBytes and try to decode it. I noticed the bytes in this byte array are already invalid. Comments in the source of TRESTResponse taught me that 'RawBytes' is already decoded, so that makes sense.
- Saved RESTResponse.RawBytes in a file and tried to decompress it with 7zip and a couple of online gzip decompressors. They all failed, of course, since even the gzip header is incorrect.
- Assigned the value 'gzip, deflate' to TRESTClient.AcceptEncoding, TRESTResponse.AcceptEncoding and a combination of those. Also tried to append it to the pre-filled Accept property of each of those components.
- Switched from authenticated to an unauthenticated request. I had the whole oAuth part working, but I though that would make the question too complex. The anonymous API which I used in this question has the same issue, though.
Unfortunately it still doesn't work and I still get a mangled response.
Attemps (digging into the VCL)
Eventually, I dug a little deeper, and dove into TRestRequest.Execute. I won't paste all the code here, but eventually it performs the request by calling
FClient.HTTPClient.Get(LURL, LResponseStream);
FClient is the TRESTClient that is linked to the request and LResponseStream is a TMemoryStream. I added LResponseStream.SaveToFile('...')
to the watches, so it would save this unprocessed result, et voilá, it gave me a valid gz file, which I could decompress to get my JSON.
A bug in the work-around?
But then, a couple of lines down, I see this piece of code:
if FClient.HTTPClient.Response.CharSet > '' then
begin
LResponseStream.Position := 0;
S := FClient.HTTPClient.ReadStringAsCharset(LResponseStream, FClient.HTTPClient.Response.CharSet);
LResponseStream.Free;
LResponseStream := TStringStream.Create(S);
end;
According to the comment above this block, this is done because the contents of the memory stream are "NOT encoded accordingly to a possibly present Encoding or Content-Type Charset parameter", which is considered a bug in Indy by the writer of this VCL code.
So basically, what happens here: the raw response is treated as a string and converted to the 'right' encoding. FClient.HTTPClient.Response.CharSet is 'UTF-8', which is indeed the encoding of the JSON, but unfortunately, this conversion should only be done after decompressing the stream, which isn't done yet. So this is considered a bug by me. ;)
I tried to dig deeper, but I couldn't find the place where this decompression should have taken place. The actual request is performed by an IIPHTTP instance, which is IPPeerAPI.dcu of which I don't have the source.
So...
So my question is twofold:
- Why does this happen? TRestClient should automatically decode the gzip stream when you set AcceptEncoding to 'gzip, deflate'. What setting did I miss? Or isn't this supported yet in XE5?
- How do I prevent this incorrect translation of the gzip stream? I don't mind decoding the response myself, as long as it works, although ideally the REST components should do it automatically.
My setup: VCL Forms application, Windows 8.1, Delphi XE5 professional Update 2.
Update
- Work-around was found (see my answer)
- Bug report RSP-9855 filed in quality central
- It's supposedly fixed in Delphi 10.1 (Berlin), but I have yet to test this.