Is TarArchiveInputStream buffered or unbuffered in

2019-09-14 16:13发布

问题:

Is TarArchiveInputStream buffered or unbuffered inputstream?

InputStream inputStream = new TarArchiveInputStream(new GZIPInputStream(new BufferedInputStream(new FileInputStream(file))));

Does this object of inputStream store the whole entire file internally into the heap memory? Or is it just a pointer to a file and stores nothing into the memory?

回答1:

Based on the source code of commons-compress.jar ver 1.4,

What happens when we create an instance of TarArchiveInputStream?

Apart from other initialization, the important object that gets created is an instance of TarBuffer object, which internally has a byte[] blockBuffer whose default size is (DEFAULT_RCDSIZE * 20) i..e, 512*20 = 10 KB.

This TarBuffer object actually performs the reads operations and the data into this blockBuffer from the underlying tar file as readblock() method gets called internally as we invoke TarArchiveInputStream.read(..)

Does the object of TarArchiveInputStream store the whole entire file internally into the heap memory?

No. In fact, in general, whenever we call the read method of an inputStream if will try to get the data from application buffer if the stream is buffered. If the requested data is present it serves it from the buffer. If not, it signals the OS (through trap) to read the data from the OS file cache/disk and copy that into its buffer. (Memory mapped files is bit different where this copying into is not needed, but we will not mix that in our discussion).

This is true even in the case of TarArchiveInputStream as well. As we call read method on TarArchiveInputStream it delegates to the inner inputStream and the above same flow can be visualized.

Or is it just a pointer to a file and stores nothing into the memory?

While creating the TarArchiveInputStream we pass an inputStream as an argument and this inputStream is, in fact, a pointer (as far as I could recollect, it is in inode number in *-nix OS and points to an actual inode structure) to the file.

It does store the content into the memory as explained before but not the entire file. How much data is read into the memory depends on the size of the byte[] passed to the while invoking read(...) method on TarArchiveInputStream.

Also, if it helps, this is the link that I used to see how to read entries using TarArchiveInputStream.