I'm making a simple program in C that reads an input. It then displays the number of characters used.
What I tried first:
#include <stdio.h>
int main(int argc, char** argv) {
int currentChar;
int charCount = 0;
while((currentChar = getchar()) != EOF) {
charCount++;
}
printf("Display char count? [y/n]");
int response = getchar();
if(response == 'y' || response == 'Y')
printf("Count: %d\n",charCount);
}
What happened:
I would enter some lines and end it with ^D
(I'm on Mac). The program would not wait at int response = getchar();
. I found online that this is because there is still content left in the input stream.
My first question is what content would that be? I don't enter anything after pressing ^D
to input EOF
and when I tried to print anything left in the stream, it would print a ?
.
What I tried next:
Assuming there were characters left in the input stream, I made a function to clear the input buffer:
void clearInputBuffer() {
while(getchar() != '\n') {};
}
I called the function right after the while loop:
while((currentChar = getchar()) != EOF) {
charCount++;
}
clearInputBuffer();
Now I would assume if there is anything left after pressing ^D
, it would be cleared up to the next \n
.
But instead, I can't stop the input request. When I press ^D
, rather than sending EOF
to currentChar
, a ^D
is shown on the terminal.
I know there is a probably a solution to this online, but since I'm not sure what exactly my problem is, I don't really know what to look for.
Why is this happening? Can someone also explain exactly what is going on behind the scenes of this program and the Terminal?
man 3 termios - search for VEOF
. That will tell you what it actually does.
If you need more explanation, I'll start by saying the ISO C stdin
stream has a default buffer, so any bytes read are stored into that buffer unless this behavior is somehow overridden (e.g. setvbuf
).
The getchar
function will read from this default buffer unless the buffer has no characters in it left to read. In that case, it will call the read
function to actually store new data into that buffer and return the number of bytes read.
However, your terminal has its own input buffer. It will wait for an input sequence recognized as an end-of-line (EOL
) delimiter. This is where things get interesting. If ICANON
is enabled, and you use Ctrl+D with bytes in the terminal's input buffer already, then you effectively will send all of that pending bytes to the program, as if you had entered an end-of-line delimiter. The read
function will receive those bytes and store them in the input buffer used for stdin
, resulting in getchar
returning an appropriate value.
If Ctrl+D is pressed with no pending bytes in the terminal's input buffer, no data will be sent, read
will return 0, and EOF
gets returned by getchar
after getchar
sets the end-of-file indicator for the stdin
stream.
Given the two behaviors of Ctrl+D, it follows that pressing it twice will send all pending bytes on the first key press, effectively emptying the terminal's input buffer, followed by the second key press sending 0 bytes to read
, which means getchar
returns EOF
and the end-of-file indicator for stdin
is set.
If an error occurs (e.g. stdin
was closed), read
itself will return -1, and getchar
will return EOF
after setting the error indicator for the stdin
stream. The following may help to illustrate the idea of how it works, though there's likely a lot more going on behind the scenes with the TTY itself than just waiting for an EOL
or VEOF
and sending data after either one is detected:
Of course, if ICANON
isn't set on the controlling terminal, then you will never receive EOF
unless your input is not from a terminal because suddenly certain special key sequences like Ctrl+D won't be recognized as special key sequences since the feature is turned off.
For a bit more completeness, please note that the ICANON
bit and termios
stuff in general do not necessarily apply much on Windows. The Windows Command Prompt uses Ctrl+Z for one thing, and the Windows operating system has no concept of terminals other than things like the _isatty
C runtime function that is used to detect whether a file descriptor points to a file description that involves a console handle.
Pressing Ctrl+Z with data pending will effectively cancel any remaining input that follows it, though an end-of-line character (Ctrl+M or Enter) still needs to be pressed for the data to be sent unless processed input was disabled by using the SetConsoleMode
Windows API function.
If pressed with no input data pending and sent by entering an end-of-line character, it acts as EOF
. For example, hello^Z1234^M
results in hello^Z
being read, and everything including the ^M
end-of-line character is ignored. ^Z1234^M
or just ^Z^M
will trigger EOF
.
Operating systems are weird.
Ctrl+D is a bit weird on Unix -- it's not actually an EOF character. Rather, it's a signal to the shell that stdin
should be closed. As a result, the behavior can be somewhat unintuitive. Two Ctrl+Ds in a row, or a Return followed by a Ctrl+D, will give you the behavior you're looking for. I tested it with this code:
#include <stdio.h>
int main(void) {
size_t charcount = 0;
while (getchar() != EOF)
charcount++;
printf("Characters: %zu\n", charcount);
return 0;
}
Edited to include chux's format character suggestion.
You can do it (also) this way:
fseek(stdin,0,SEEK_END);
This works fine for me.