Where is hex code of the “EOF” character?

2019-01-22 18:24发布

问题:

As far as know in the end of all files, specially text files, there is a Hex code for EOF or NULL character. And when we want to write a program and read the contents of a text file, we send the read function until we receive that EOF hexcode.

My question : I downloaded some tools to see a hex view of a text file. but I can't see any hex code for EOF(End Of File/NULL) or EOT(End Of Text)


ASCII/Hex code tables :

This is output of Hex viewer tools:


Note : My input file is a text file that its content is "Where is hex code of "EOF"?"

Appreciate your time and consideration.

回答1:

There is no such thing as a EOF character. The operating system knows exactly how many bytes a file contains (this is stored alongside other metadata like permissions, creation date, and the name), and hence can tell programs that try to read the eleventh byte of a ten byte file: You've reached the end of file, there are no more bytes to read.

In fact, the "EOF" value returned for example by C functions like getchar is explicitly an int value outside the range of a byte, so it cannot possibly be stored in a file!

Sometimes, certain file formats insist on adding NUL terminators (probably because that's how strings are usually stored in C), though usually these delimit multiple records in a single file, not the file as a whole. And such decoration usually disqualifies a file from being considered a "text file".

ASCII codes like ETX and NUL date back to the days of teletypewriters and friends. NUL is used in C for in-memory strings, but this has no bearing on file systems.



回答2:

There was - a long long time ago - an End Of File marker but it hasn't been used in files for many years.

You can demonstrate a distant echo of it on windows using:

C:\>copy con junk.txt
Hello
Hello again
- Press <Ctrl> and <z>
C:\>dump junk.txt
junk.txt:
00000000  4865 6c6c 6f0d 0a48 656c 6c6f 2061 6761 Hello..Hello aga
00000010  696e 0d0a                               in..
C:\>

Note the use of Ctrl-Z as an EOT marker.

However, notice also that the Ctrl-Z does not appear in the file any more - it used to appear as a 0x1a but only on some operating systems and even then not consistently.

Use of ETX (0x03) stopped even before those dim and distant times.



回答3:

There is no such thing as EOF. EOF is just a value returned by file reading functions to tell you the file pointer reached the end of the file.



回答4:

There once were even different EOF characters (for different operating systems). No longer seen one. (Typically files were in blocks of 128 bytes.) For coding a PITA, like nowadays BOMs.

Instead there is still a int read() that normally delivers a byte value, but for EOF delivers -1.

The NUL character is a string terminator in C. In java you can have a NUL character in the middle of a string. To be cooperative with C, the UTF-8 bytes generated use a multi-byte encoding both for Unicode characters > 127 and for NUL.

(Some of this is probably known already.)



回答5:

The EOT byte (0x04) is used to this day by unix tty terminals to indicate end of input. You type it with a Ctrl + D (ie. ^D) to end input to shells or any other program reading from stdin.

However, as others have pointed out, this is distinct from EOF, which is a condition rather than a piece of data per se.



回答6:

You need the end of file character in certain instances for example sending a file to a printer from a Unix computer. Most windows/dos enabled printers expect the end-of-file marker to print the file stored in their memories. If no end-of-file marker is sent, the printer just sits until it times out (normally 2 minutes) and then prints the file. If you use lpr to print from Unix, you should make sure to include the end-of-file marker. Windows/dos attach it automatically and the printers are designed to wait fot it.