Java InputStream read buffer

2019-07-27 10:27发布

Say I'm trying to read from a Java InputStream like this:

ZipInputStream zis = new ZipInputStream(new FileInputStream("C:\\temp\\sample3.zip"));
zis.getNextEntry();
byte[] buffer2 = new byte[2];
int count = zis.read(buffer2));
if(count != -1) //process...
else...//something wrong, abort

I'm parsing a binary file and I set my buffer to 2 in this case because I want to read the next short. I would set my buffer to size 4 if I want to read the next int and so on for other types. The problem is that sometimes zis.read(buffer) won't fill the buffer even when I know that there is enough unread data to fill the buffer. I could simply dump the entire file contents into an array and parse that, but then I end up implementing my own stream reader to do that which seems like re-inventing the wheel. I could also implement a read() function that checks the read count and if less than buffersize, request more data to fill the buffer, but that's inefficient and ugly. Is there a better way to do this?

This is a follow-up question to a question posted here:

Java ZipInputStream extraction errors

3条回答
可以哭但决不认输i
2楼-- · 2019-07-27 10:34

Is there a better way to do this?

Well ... a ZipInputStream ultimately inherits from InputStream so you should be able to wrap it with a BufferedInputStream and then a DataInputStream and read data using readShort, readInt and so on.

Something like this:

while (zis.getNextEntry() != null) {
  DataInputStream dis = new DataInputStream(new BufferedInputStream(zis));
  boolean done = false;
  do {
    short s = dis.readShort();
    int i = dis.readInt();
    ...
  } while (!done);
}

NB: you shouldn't close the dis stream as that would cause the zis to be closed. (Obviously, the zis needs to be closed at an outer level to avoid a resource leak.)

The BufferedInputStream in the stack ensures that you don't do lots of small reads on the underlying stream ... which would be bad.

The only possible gotcha is that its methods have particular ideas about how the binary data is represented; e.g. numbers are bigendian. If that is an issue, consider reading the entire zip entry into a byte array, and wrapping it in a ByteBuffer.

查看更多
一夜七次
3楼-- · 2019-07-27 10:57

You need to check the byte count and keep reading until you have all the information you need

zis.getNextEntry();
byte[] buffer2 = new byte[2];
int count = 0;
while (count < 2) {
  int bytesRead = zis.read(buffer2, count, 2 - count));
  if(bytesRead != -1) {
    count += bytesRead;
  }
  else...//something wrong, abort
}
//process...
查看更多
干净又极端
4楼-- · 2019-07-27 10:58

ZipInputStream conforms to the contract defined by InputStream. The read(byte[] ...) methods are allowed and documented to return either -1 for end of stream, or any value between (1...requested length).

And there is good reason the API is defined that way, it gives the implementation the freedom to return partial data as soon as it is available without blocking for extended periods of time while waiting for data to become available (think of SocketInputStream).

If you require a minimum amount of data you need to call read repeatedly until you have read as much data as needed to continue processing.

As for "thats inefficient and ugly", reading tiny amounts of data through the bulk-read methods incurs its own overhead, and possibly in the code you show also creation of a garbage byte[] for each data entity you read. For reading a handful of bytes, you could simpy use the read() method that returns a single byte, implemented in a simple utility method e.g.:

 static short readShort(InputStream in) throws IOException {
      short s = 0;
      for (int i=0; i<2; ++i) {
          int read = in.read();
          if (read < 0)
              throw new IOException("unexpected end of stream");
          s = (short) ((s << 8) | read);
      }
      return s;
 }

(this can be easily adapted to other primitive types)

Single byte I/O is in most cases totally acceptable, as long as you take care to ensure the InputStream is wrapped into a BufferedInputStream. The average overhead then reduces to a few array index bounds checks inside the BufferedInputStream. It won't cause an excessive number of calls down to the native data source.

查看更多
登录 后发表回答