I'm programming C on Windows(system language is Japanese), and I have a problem about EOF of binary and ascii files.
I asked this question last week, a kind guy helped me, but I still can't really understand how the program works when reading a binary or an ascii file.
I did the following test:
Test1:
int oneChar;
iFile = fopen("myFile.tar.gz", "rb");
while ((oneChar = fgetc(iFile)) != EOF) {
printf("%d ", oneChar);
}
Test2:
int oneChar;
iFile = fopen("myFile.tar.gz", "r");
while ((oneChar = fgetc(iFile)) != EOF) {
printf("%d ", oneChar);
}
In the test1 case, things worked perfectly for both binary and ascii files. But in test2, program stopped reading when it encountered 0x1A in a binary file. (Does this mean that 1A == EOF?) ASCII table tells me that 1A is a control character called substitute (whatever that means...) And when I printf("%d", EOF), however, it gave me -1...
I also found this question which tells me that the OS knows exactly where a file ends, so I don't really need to find EOF in the file, because EOF is out of the range of a byte (what about 1A?)
Can someone clear things up a little for me? Thanks in advance.
Found a terrific article that answers all the question! https://latedev.wordpress.com/2012/12/04/all-about-eof/
The convention of ending a file with Ctrl-Z originated with CP/M, a very old operating system for 8080/Z80 microcomputers. Its file system did not keep track of file sizes down to the byte level, only to the 128-byte sector level, so there needed to be another way to mark the end-of-file.
Microsoft's DOS was made to be as compatible with CP/M as possible, so it kept the convention when reading text files. By this time the file size was kept by the file system so it wasn't strictly necessary, just retained for backward compatibility.
This convention has persisted to the present day in the C and C++ libraries for Windows; when you open a file in text mode, every character is checked for Ctrl-Z and the end-of-file flag is set if it's detected. You're seeing the effects of backwards compatibility taken to an extreme, back to systems that are almost 40 years old.
This is a Windows-specific trick for text files:
SUB
character, which is represented by Ctrl+Z sequence, is interpreted asEOF
byfgetc
. You do not have to have1A
in your text file in order to get anEOF
back fromfgetc
, though: once you reach the actual end of file,EOF
would be returned.The standard does not define
1A
as thechar
value to represent anEOF
. The constant forEOF
is of typeint
, with a negative value outside the range ofunsigned char
. In fact, the reason whyfgetc
returns anint
, notchar
, is to let it return a special value forEOF
.