I am using gzip utilities in windows machine. I compressed a file and stored in the DB as blob. When I want to decompress this file using gzip utility I am writing this byte stream to process.getOutputStream. But after 30KB, it was unable to read the file. It hangs there.
Tried with memory arguments, read and flush logic. But the same data if I try to write to a file it is pretty fast.
OutputStream stdin = proc.getOutputStream();
Blob blob = Hibernate.createBlob(inputFileReader);
InputStream source = blob.getBinaryStream();
byte[] buffer = new byte[256];
long readBufferCount = 0;
while (source.read(buffer) > 0)
{
stdin.write(buffer);
stdin.flush();
log.info("Reading the file - Read bytes: " + readBufferCount);
readBufferCount = readBufferCount + 256;
}
stdin.flush();
Regards,
Mani Kumar Adari.
I suspect that the problem is that the external process (connected to proc
) is either
- not reading its standard input, or
- it is writing stuff to its standard output that your Java application is not reading.
Bear in mind that Java talks to the external process using a pair of "pipes", and these have a limited amount of buffering. If you exceed the buffering capacity of a pipe, the writer process will be blocked writing to the pipe until the reader process has read enough data from the pipe to make space. If the reader doesn't read, then the pipeline locks up.
If you provided more context (e.g. the part of the application that launches the gzip process) I'd be able to be more definitive.
FOLLOWUP
gzip.exe is a unix utility in windows we are using. gzip.exe in command prompt working fine. But Not with the java program. Is there any way we can increase the buffering size which java writes to a pipe. I am concerned about the input part at present.
On UNIX, the gzip utility is typically used one of two ways:
gzip file
compresses file
turning it into file.gz
.
... | gzip | ...
(or something similar) which writes a compressed version of its standard input to its standard output.
I suspect that you are doing the equivalent of the latter, with the java application as both the source of the gzip
command's input and the destination of its output. And this is the precisely the scenario that can lock up ... if the java application is not implemented correctly. For instance:
Process proc = Runtime.exec(...); // gzip.exe pathname.
OutputStream out = proc.getOutputStream();
while (...) {
out.write(...);
}
out.flush();
InputStream in = proc.getInputStream();
while (...) {
in.read(...);
}
If the write phase of the application above writes too much data, it is guaranteed to lockup.
Communication between the java application and gzip
is via two pipes. As I stated above, a pipe will buffer a certain amount of data, but that amount is relatively small, and certainly bounded. This is the cause of the lockup. Here is what happens:
- The
gzip
process is creates with a pair of pipes connecting it to the Java application process.
- The Java application writes data to its
out
stream
- The
gzip
processes reads that data from its standard input, compresses it and writes to its standard output.
- Steps 2. and 3. are repeated a few times, until finally the
gzip
processes attempt to write to its standard output blocks.
What has been happening is that gzip
has been writing into its output pipe, but nothing has been reading from it. Eventually, we reach the point where we've exhausted the output pipe's buffer capacity, and the write to the pipe blocks.
Meanwhile, the Java application is still writing to the out
Stream, and after a couple more rounds, this too blocks because we've filled the other pipe.
The only solution is for the Java application to read and write at the same time. The simple way to do this is to create a second thread and do the writing to the external process from one thread and the reading from the process in the other one.
(Changing the Java buffering or the Java read / write sizes won't help. The buffering that matters is in the OS implementations of the pipes, and there's no way to change that from pure Java, if at all.)