I tried to read a REST API, which is gzip encoded. To be exact, I tried to read the StackExchange API.
I already found the question Automatically Decode GZIP In TRESTResponse?, but that answer doesn't solve my issue for some reason.
Test setup
In XE5, I added a TRestClient, a TRestRequest and a TRestResponse with the following relevant properties. I set the BaseURL of the client, the resource and parameters of the request, and I set AcceptEncoding
of the request to gzip, deflate
, which should make it automatically decode gzipped responses.
object RESTClient1: TRESTClient
BaseURL = 'https://api.stackexchange.com/2.2'
end
object RESTRequest1: TRESTRequest
AcceptEncoding = 'gzip, deflate'
Client = RESTClient1
Params = <
item
Kind = pkURLSEGMENT
name = 'id'
Options = [poAutoCreated]
Value = '511529'
end
item
name = 'site'
Value = 'stackoverflow'
end>
Resource = 'users/{id}'
Response = RESTResponse1
end
object RESTResponse1: TRESTResponse
end
This results in the url:
https://api.stackexchange.com/2.2/users/511529?site=stackoverflow
I invoke the request like this, with two message boxes to show the url and the outcome of the request:
ShowMessage(RESTRequest1.GetFullRequestURL());
RESTRequest1.Execute; // Actual call
ShowMessage(RESTResponse1.Content);
If I call that url in a browser, I get a proper result, which is a json object with some of my user information in it.
Problem
However, in Delphi, I don't get the JSON response. In fact, I get a bunch of bytes which seems to be a mangled gzip response. I tried to decompress it with TIdCompressorZlib.DecompressGZipStream()
, but it fails with a ZLib Error (-3)
. When I inspect the bytes of the response myself, I see it starts with #1F#3F#08. This is especially weird, since the gzip header should be #1F#8B#08, so #8B is transformed into #3F, which is a question mark.
So it seems to me like the RESTClient has attempted to decode the gzip stream as if it was a UTF-8 response, and has replaced invalid sequences (#8B is in itself not a valid UTF-8 character) with a question mark.
Attempts (superficial)
I've done quite some experimenting, like
- Use RESTResponse.RawBytes and try to decode it. I noticed the bytes in this byte array are already invalid. Comments in the source of TRESTResponse taught me that 'RawBytes' is already decoded, so that makes sense.
- Saved RESTResponse.RawBytes in a file and tried to decompress it with 7zip and a couple of online gzip decompressors. They all failed, of course, since even the gzip header is incorrect.
- Assigned the value 'gzip, deflate' to TRESTClient.AcceptEncoding, TRESTResponse.AcceptEncoding and a combination of those. Also tried to append it to the pre-filled Accept property of each of those components.
- Switched from authenticated to an unauthenticated request. I had the whole oAuth part working, but I though that would make the question too complex. The anonymous API which I used in this question has the same issue, though.
Unfortunately it still doesn't work and I still get a mangled response.
Attemps (digging into the VCL)
Eventually, I dug a little deeper, and dove into TRestRequest.Execute. I won't paste all the code here, but eventually it performs the request by calling
FClient.HTTPClient.Get(LURL, LResponseStream);
FClient is the TRESTClient that is linked to the request and LResponseStream is a TMemoryStream. I added LResponseStream.SaveToFile('...')
to the watches, so it would save this unprocessed result, et voilá, it gave me a valid gz file, which I could decompress to get my JSON.
A bug in the work-around?
But then, a couple of lines down, I see this piece of code:
if FClient.HTTPClient.Response.CharSet > '' then
begin
LResponseStream.Position := 0;
S := FClient.HTTPClient.ReadStringAsCharset(LResponseStream, FClient.HTTPClient.Response.CharSet);
LResponseStream.Free;
LResponseStream := TStringStream.Create(S);
end;
According to the comment above this block, this is done because the contents of the memory stream are "NOT encoded accordingly to a possibly present Encoding or Content-Type Charset parameter", which is considered a bug in Indy by the writer of this VCL code.
So basically, what happens here: the raw response is treated as a string and converted to the 'right' encoding. FClient.HTTPClient.Response.CharSet is 'UTF-8', which is indeed the encoding of the JSON, but unfortunately, this conversion should only be done after decompressing the stream, which isn't done yet. So this is considered a bug by me. ;)
I tried to dig deeper, but I couldn't find the place where this decompression should have taken place. The actual request is performed by an IIPHTTP instance, which is IPPeerAPI.dcu of which I don't have the source.
So...
So my question is twofold:
- Why does this happen? TRestClient should automatically decode the gzip stream when you set AcceptEncoding to 'gzip, deflate'. What setting did I miss? Or isn't this supported yet in XE5?
- How do I prevent this incorrect translation of the gzip stream? I don't mind decoding the response myself, as long as it works, although ideally the REST components should do it automatically.
My setup: VCL Forms application, Windows 8.1, Delphi XE5 professional Update 2.
Update
- Work-around was found (see my answer)
- Bug report RSP-9855 filed in quality central
- It's supposedly fixed in Delphi 10.1 (Berlin), but I have yet to test this.
Remy Lebeau's input in his answer to this question as well as his comment to the answer in the question Automatically Decode GZIP In TRESTResponse? put me on the right track.
Like he said, setting AcceptEncoding doesn't suffice, because the TIdHTTP that performs the actual request doesn't have a decompressor attached, so it can't decompress the gzip response. Based on the sparse resources, I got the idea that setting AcceptEncoding would automatically decompress the response too, but that idea was wrong.
Still, leaving AcceptEncoding empty doesn't work either in this case, since the API this is all about, which is the StackExchange API, is always compressed, regardless whether you specify that you accept gzip or not.
So the combination of a) an always compressed response, b) an HTTP client that cannot decompress and c) a TRESTRequest object that -incorrectly- assumed that the response is already properly decompressed together lead to this situation.
I see only two solutions, the first being to discard TRESTClient altogether and just perform the request with a plain TIdHTTP. A pity, since my goal was to explore the possibilities of the new REST components to see how they can make life easier.
So the other solution is to assign a compressor to the TIdHTTP that is used internally.
I managed to succeed, although unfortunately it undoes a lot of the abstraction that the TREST components are trying to introduce. This is the code that solves it:
After this, I can use the RESTRequest1 component to successfully fetch the JSON response (at least as text).
This is the root of your problem. You are manually telling the server that the response is allowed to be gzip encoded, but as far as I can see in the REST source code, the underlying
TIdHTTP
object thatTRESTClient
uses internally does not have a gzip decompressor assigned to it (even if it had one, assigningAcceptEncoding
manually would still be wrong, becauseTIdHTTP
sets up its ownAccept-Encoding
header if a decompressor is assigned). I commented on that in the other question you linked to. SoTIdHTTP
ends up returning the raw gzip bytes without decoding them, and thenTRESTClient
converts them as-is to a charset-decodedUnicodeString
(since you are reading theContent
property). That is why you are seeing the bytes getting messed up.You need to get rid of the
AcceptEncoding
assignment.Because
TRestClient
does not assign a gzip decompressor to its internalTIdHTTP
object, but you are tricking the server into thinking it did.No, because there is no decompressor assigned.
Update: that being said, I would probably just drop
TRESTClient
and useTIdHTTP
directly. The following works for me when I try it:Displays: