empty buffer but IdTCPClient.IOHandler.InputBuffer

2019-07-04 17:14发布

I have problem in below code with idTCPClient for reading buffer from a telnet server:

procedure TForm2.ReadTimerTimer(Sender: TObject);
var
   S: String; 
begin
   if IdTCPClient.IOHandler.InputBufferIsEmpty then
   begin
     IdTCPClient.IOHandler.CheckForDataOnSource(10);
     if IdTCPClient.IOHandler.InputBufferIsEmpty then Exit;
   end;
   s := idTCPClient.IOHandler.InputBufferAsString(TEncoding.UTF8);
   CheckText(S);
end;

this procedure run every 1000 milliseconds and when the buffer have a value CheckText called.

this code works but sometimes this return the empty buffer to CheckText.

what's the problem?

thanks

3条回答
我只想做你的唯一
2楼-- · 2019-07-04 17:36

While your code is correct, the problem is most likely that the inputBuffer contains data that might contain null characters (#0) which would end the string.

Try Remy's solution, and check what you get in the rawbytestring.

Edit

I didn't read that the OP was reading from a TelnetServer. OP should use TidTelnet instead of IdTCPClient.

Edit2

I just read an older post of OP which explains the reason why he is not using TidTelnet.

/Daddy

查看更多
Rolldiameter
3楼-- · 2019-07-04 17:36

Telnet servers send a null character (#0) after each carriage return. This is most likely what you are seeing.

A null character encoded to UTF8 is still a single byte with the value of 0. Check to see if that's what you are receiving.

查看更多
Juvenile、少年°
4楼-- · 2019-07-04 17:43

Your code is attempting to read arbitrary blocks of data from the InputBuffer and expects them to be complete and valid strings. It is doing this without ANY consideration for what kind of data you are receiving. That is a recipe for disaster on multiple levels.

You are connected to a Telnet server, but you are using TIdTCPClient directly instead of using TIdTelnet, so you MUST manually decode any Telnet sequences that are received BEFORE you can then process any remaining string data. Look at the source code for TIdTelnet. There is a lot of decoding logic that takes place before the OnDataAvailable event is fired. All Telnet sequence data is handled internally, then the OnDataAvailable event provides whatever non-Telnet data is left over after decoding.

Once you have Telnet decoding taken care of, another problem you have to watch out for is that TEncoding.UTF8 only handles properly encoded COMPLETE UTF-8 sequences. If it encounters a badly encoded sequence, or more importantly encounters an incomplete sequence, THE ENTIRE DECODE FAILS and it returns a blank string. This has already been reported as a bug (see QC #79042).

CheckForDataOnSource() stores whatever raw bytes are in the socket at that moment into the InputBuffer. InputBufferAsString() extracts whatever raw bytes are in the InputBuffer at that moment and attempts to decode them using the specified encoding. It is very possible and likely that the raw bytes that are in the InputBuffer when you call InputBufferAsString() do not always contain COMPLETE UTF-8 sequences. Chances are that sometimes the last sequence in the InputBuffer is still waiting for bytes to arrive in the socket and they will not be read until the next call to CheckForDataOnSource(). That would explain why your CheckText() function is receiving blank strings when using TEncoding.UTF8.

You should use IndyUTF8Encoding() instead (Indy implements its own UTF-8 encoder/decoder to avoid the decoding bug in TEncoding.UTF8). At the very least, you will not get blank strings anymore, however you can still lose data when a UTF-8 sequence spans multiple CheckForDataOnSource() calls (incomplete UTF-8 sequences will be converted to ? characters). For that reason alone, you should not be using InputBufferAsString() in this situation (even if TEncoding.UTF8 did work properly). To handle this properly, you should either:

1) scan through the InputBuffer manually, calculating how many bytes constitute COMPLETE UTF-8 sequences only, and then pass that count to InputBuffer.Extract() or TIdIOHandler.ReadString(). Any left over bytes will remain in the InputBuffer for the next time. For that to work, you will have to get rid of the first InputBufferIsEmpty() call and just call CheckForDataOnSource() unconditionally so that you are always checking for more bytes even if you already have some.

2) use TIdIOHandler.ReadChar() instead and get rid of the calls to InputBufferIsEmpty() and CheckForDataOnSource() altogether. The downside is that you will lose data if a UTF-8 sequence decodes into a UTF-16 surrogate pair. ReadChar() can decode surrogates, but it cannot return the second character in the pair (I have started working on new ReadChar() overloads for a future release of Indy that return String instead of Char so full surrogate pairs can be returned).

查看更多
登录 后发表回答