I'm making a simple program in C that reads an input. It then displays the number of characters used.
What I tried first:
#include <stdio.h>
int main(int argc, char** argv) {
int currentChar;
int charCount = 0;
while((currentChar = getchar()) != EOF) {
charCount++;
}
printf("Display char count? [y/n]");
int response = getchar();
if(response == 'y' || response == 'Y')
printf("Count: %d\n",charCount);
}
What happened:
I would enter some lines and end it with ^D
(I'm on Mac). The program would not wait at int response = getchar();
. I found online that this is because there is still content left in the input stream.
My first question is what content would that be? I don't enter anything after pressing ^D
to input EOF
and when I tried to print anything left in the stream, it would print a ?
.
What I tried next:
Assuming there were characters left in the input stream, I made a function to clear the input buffer:
void clearInputBuffer() {
while(getchar() != '\n') {};
}
I called the function right after the while loop:
while((currentChar = getchar()) != EOF) {
charCount++;
}
clearInputBuffer();
Now I would assume if there is anything left after pressing ^D
, it would be cleared up to the next \n
.
But instead, I can't stop the input request. When I press ^D
, rather than sending EOF
to currentChar
, a ^D
is shown on the terminal.
I know there is a probably a solution to this online, but since I'm not sure what exactly my problem is, I don't really know what to look for.
Why is this happening? Can someone also explain exactly what is going on behind the scenes of this program and the Terminal?
man 3 termios - search for
VEOF
. That will tell you what it actually does.If you need more explanation, I'll start by saying the ISO C
stdin
stream has a default buffer, so any bytes read are stored into that buffer unless this behavior is somehow overridden (e.g.setvbuf
).The
getchar
function will read from this default buffer unless the buffer has no characters in it left to read. In that case, it will call theread
function to actually store new data into that buffer and return the number of bytes read.However, your terminal has its own input buffer. It will wait for an input sequence recognized as an end-of-line (
EOL
) delimiter. This is where things get interesting. IfICANON
is enabled, and you use Ctrl+D with bytes in the terminal's input buffer already, then you effectively will send all of that pending bytes to the program, as if you had entered an end-of-line delimiter. Theread
function will receive those bytes and store them in the input buffer used forstdin
, resulting ingetchar
returning an appropriate value.If Ctrl+D is pressed with no pending bytes in the terminal's input buffer, no data will be sent,
read
will return 0, andEOF
gets returned bygetchar
aftergetchar
sets the end-of-file indicator for thestdin
stream.Given the two behaviors of Ctrl+D, it follows that pressing it twice will send all pending bytes on the first key press, effectively emptying the terminal's input buffer, followed by the second key press sending 0 bytes to
read
, which meansgetchar
returnsEOF
and the end-of-file indicator forstdin
is set.If an error occurs (e.g.
stdin
was closed),read
itself will return -1, andgetchar
will returnEOF
after setting the error indicator for thestdin
stream. The following may help to illustrate the idea of how it works, though there's likely a lot more going on behind the scenes with the TTY itself than just waiting for anEOL
orVEOF
and sending data after either one is detected:Of course, if
ICANON
isn't set on the controlling terminal, then you will never receiveEOF
unless your input is not from a terminal because suddenly certain special key sequences like Ctrl+D won't be recognized as special key sequences since the feature is turned off.For a bit more completeness, please note that the
ICANON
bit andtermios
stuff in general do not necessarily apply much on Windows. The Windows Command Prompt uses Ctrl+Z for one thing, and the Windows operating system has no concept of terminals other than things like the_isatty
C runtime function that is used to detect whether a file descriptor points to a file description that involves a console handle.Pressing Ctrl+Z with data pending will effectively cancel any remaining input that follows it, though an end-of-line character (Ctrl+M or Enter) still needs to be pressed for the data to be sent unless processed input was disabled by using the
SetConsoleMode
Windows API function.If pressed with no input data pending and sent by entering an end-of-line character, it acts as
EOF
. For example,hello^Z1234^M
results inhello^Z
being read, and everything including the^M
end-of-line character is ignored.^Z1234^M
or just^Z^M
will triggerEOF
.Operating systems are weird.
You can do it (also) this way:
This works fine for me.
Ctrl+D is a bit weird on Unix -- it's not actually an EOF character. Rather, it's a signal to the shell that
stdin
should be closed. As a result, the behavior can be somewhat unintuitive. Two Ctrl+Ds in a row, or a Return followed by a Ctrl+D, will give you the behavior you're looking for. I tested it with this code:Edited to include chux's format character suggestion.