Issues with standard input in C

2019-09-04 06:25发布

问题:

I'm making a simple program in C that reads an input. It then displays the number of characters used.

What I tried first:

#include <stdio.h>

int main(int argc, char** argv) {
    int currentChar;
    int charCount = 0;

    while((currentChar = getchar()) != EOF) {
        charCount++;
    }

    printf("Display char count? [y/n]");
    int response = getchar();

    if(response == 'y' || response == 'Y')
        printf("Count: %d\n",charCount);
}

What happened:

I would enter some lines and end it with ^D (I'm on Mac). The program would not wait at int response = getchar();. I found online that this is because there is still content left in the input stream.

My first question is what content would that be? I don't enter anything after pressing ^D to input EOF and when I tried to print anything left in the stream, it would print a ?.

What I tried next:

Assuming there were characters left in the input stream, I made a function to clear the input buffer:

void clearInputBuffer() {
    while(getchar() != '\n') {};
}

I called the function right after the while loop:

while((currentChar = getchar()) != EOF) {
    charCount++;
}
clearInputBuffer();

Now I would assume if there is anything left after pressing ^D, it would be cleared up to the next \n.

But instead, I can't stop the input request. When I press ^D, rather than sending EOF to currentChar, a ^D is shown on the terminal.

I know there is a probably a solution to this online, but since I'm not sure what exactly my problem is, I don't really know what to look for.

Why is this happening? Can someone also explain exactly what is going on behind the scenes of this program and the Terminal?

回答1:

man 3 termios - search for VEOF. That will tell you what it actually does.

If you need more explanation, I'll start by saying the ISO C stdin stream has a default buffer, so any bytes read are stored into that buffer unless this behavior is somehow overridden (e.g. setvbuf).

The getchar function will read from this default buffer unless the buffer has no characters in it left to read. In that case, it will call the read function to actually store new data into that buffer and return the number of bytes read.

However, your terminal has its own input buffer. It will wait for an input sequence recognized as an end-of-line (EOL) delimiter. This is where things get interesting. If ICANON is enabled, and you use Ctrl+D with bytes in the terminal's input buffer already, then you effectively will send all of that pending bytes to the program, as if you had entered an end-of-line delimiter. The read function will receive those bytes and store them in the input buffer used for stdin, resulting in getchar returning an appropriate value.

If Ctrl+D is pressed with no pending bytes in the terminal's input buffer, no data will be sent, read will return 0, and EOF gets returned by getchar after getchar sets the end-of-file indicator for the stdin stream.

Given the two behaviors of Ctrl+D, it follows that pressing it twice will send all pending bytes on the first key press, effectively emptying the terminal's input buffer, followed by the second key press sending 0 bytes to read, which means getchar returns EOF and the end-of-file indicator for stdin is set.

If an error occurs (e.g. stdin was closed), read itself will return -1, and getchar will return EOF after setting the error indicator for the stdin stream. The following may help to illustrate the idea of how it works, though there's likely a lot more going on behind the scenes with the TTY itself than just waiting for an EOL or VEOF and sending data after either one is detected:

Of course, if ICANON isn't set on the controlling terminal, then you will never receive EOF unless your input is not from a terminal because suddenly certain special key sequences like Ctrl+D won't be recognized as special key sequences since the feature is turned off.

For a bit more completeness, please note that the ICANON bit and termios stuff in general do not necessarily apply much on Windows. The Windows Command Prompt uses Ctrl+Z for one thing, and the Windows operating system has no concept of terminals other than things like the _isatty C runtime function that is used to detect whether a file descriptor points to a file description that involves a console handle.

Pressing Ctrl+Z with data pending will effectively cancel any remaining input that follows it, though an end-of-line character (Ctrl+M or Enter) still needs to be pressed for the data to be sent unless processed input was disabled by using the SetConsoleMode Windows API function.

If pressed with no input data pending and sent by entering an end-of-line character, it acts as EOF. For example, hello^Z1234^M results in hello^Z being read, and everything including the ^M end-of-line character is ignored. ^Z1234^M or just ^Z^M will trigger EOF.

Operating systems are weird.



回答2:

Ctrl+D is a bit weird on Unix -- it's not actually an EOF character. Rather, it's a signal to the shell that stdin should be closed. As a result, the behavior can be somewhat unintuitive. Two Ctrl+Ds in a row, or a Return followed by a Ctrl+D, will give you the behavior you're looking for. I tested it with this code:

#include <stdio.h>

int main(void) {
    size_t charcount = 0;

    while (getchar() != EOF)
        charcount++;

    printf("Characters: %zu\n", charcount);

    return 0;
}

Edited to include chux's format character suggestion.



回答3:

You can do it (also) this way:

fseek(stdin,0,SEEK_END);

This works fine for me.