1) How does buffered streams work in background, how do they differ from normal streams and what are the advantage(s) of using them?
2) DataInputStream
is also Byte based. But it is having methods to readLine()
. What's the point in here?
1) How does buffered streams work in background, how do they differ from normal streams and what are the advantage(s) of using them?
2) DataInputStream
is also Byte based. But it is having methods to readLine()
. What's the point in here?
Buffered Readers/Writers/InputStreams/OutputStreams read and write to the OS in large chunks for optimization. In case of writers and outputstreams, the data is buffered in memory until there is enough collected to write out a big chunk. In case of readers and inputstreams, a large chunk is read form disk/network/... into the buffer and all reads are done from that buffer until the buffer is empty, and a new chunk is read in.
DataInputStream is indeed byte based. The readLine method is deprecated. Internally it reads bytes from disk/network/... byte-for-byte until it has collected a complete line. So this stream could be sped up by using a BufferedInputStream as it's source, such that the bytes for the line are read from the in-memory buffer instead of directly from disk.
From the BufferedInputStream javadoc:
Internally a buffer array is used and instead of reading bytes individually from the underlying input stream enough bytes are read to fill the buffer. This generally results in faster performance as less reads are required on the underlying input stream.
The opposite is then true for BufferedOutputStream.
mark() and reset() could be used as follows:
To explain mark/reset some more...
The BufferInputStream internally remembers the current position in the buffer. As you read bytes the position will increment. A call to mark(10) will save the current position. Subsequent calls to read will continue to increment the current position but a call to reset will set the current position back to its value when mark was called.
The argument to mark specifies how many bytes you can read after calling mark before the mark position gets invalidated. Once the mark position is invalidated you can no longer call reset to return to it.
For example, if mark(2) had been used in line 4 above an IOException would be thrown when reset() is called on line 6 as the mark position would have been invalidated since we read more than 2 bytes.
To reduce this kind of overhead, the Java platform implements buffered I/O streams. Buffered input streams read data from a memory area known as a buffer; the native input API is called only when the buffer is empty. Similarly, buffered output streams write data to a buffer, and the native output API is called only when the buffer is full.
With un-buffered I/O each read or write request is passed directly to the Operating System. Java's buffered I/O streams read and write data to their own memory buffer (usually a byte array). Calls to the Operating System are only made when the buffer is empty (when doing reads) or the buffer is full (when doing writes). It is sometimes a good idea to flush the buffer manually after critical points in your application.
Since the Operating System API calls may result in disk access, network activity and the like, this can be quite expensive. Using buffers to batch the native Operating System I/O into larger chunks often significantly improves performance.
Buffered streams write or read data in larger chunks by – nomen est omen – buffering. Depending on the underlying stream, this can increase performance dramatically.
From java.io.BufferedOutputStream's Javadocs: