i am programming C on windows. i encountered this problem while trying to read a .tar.gz file.
the file looks like (opened with notepad++):
and the code i used to read is as follow:
iFile = fopen("my.tar.gz", "r");
while ((oneChar = fgetc(iFile)) != EOF) {
printf("%c", oneChar);
}
the following figure shows the result of my program:
The problem I have is, the result only has several lines while the original file has thousands of lines (6310 lines, as you can see). My guess is that the .tar.gz file contains some strange characters (like an EOF in the middle of the file?).
My question is why notepad++ can display the whole file while my program can not. And is there a solution to this problem?
A
.tar.gz
file is conventionally a gnu-zipped compression of some tar archive. It is of course a binary file (any'\n'
or'\r'
inside it does not delimit lines, and'\0'
may appear inside), so you need to open it withAlso,
feof(iFile)
is valid only after some<stdio.h>
input operation sowhile(!feof(iFile))
is wrong just after thefopen
...But that won't help you extracting any files from the archive.
So you need to first uncompress that file then extract or list the relevant archives files in it.
You could find libraries (and command executables) for both the uncompression (
zlib
library,gunzip
orzcat
commands) and the archive extraction (libarchive
library, orlibtar
, ortar
command) steps.If your operating system provides it, you could consider using appropriately the
popen
function.BTW using
putchar(oneChar)
is shorter, simpler and faster thanprintf("%c", oneChar)
....Usually the file ending
tar.gz
is a compresses tar file )a binary file). Therefore I would suggest you usepopen
(http://linux.die.net/man/3/popen) instead offopen
to open the file using a command.i.e.