Undoing the effects of ungetc() : “How” do fseek()

2019-03-01 05:26发布

Huh!!How shall I put the whole thing in a clear question!!Let me try:

I know that the files opened using fopen() are buffered into memory.We use a buffer for efficiency and ease.During a read from the file, the contents of the file are first read to the buffer,and we read from that buffer.Similarly,in a write to the file, the contents are written to the buffer first ,and then to the file.

But what with fseek(),fsetpos() and rewind()dropping the effect of the previous calls to ungetc()? Can you tell me how it is done?I mean,given we have opened a file for read and it is copied into the buffer.Now using ungetc() we've changed some characters in the buffer.Here is what I just fail to understand even after much effort:

  • Here's what said about the ungetc() --"A call to fseek, fsetpos or rewind on stream will discard any characters previously put back into it with this function." --How can characters already put into the buffer be discarded?One approach is that the original characters that were removed are "remembered",and each new character that was put in is identified and replaced with original character.But it seems very inefficient.The other option is to load a copy of the original file into buffer and place the file pointer at the intended position.Which approach of these two does fseek, fsetpos or rewind take to discard the characters put using ungetc()?

  • For text streams,how does the presence of unread characters in the stream,characters that were put in using ungetc(), affect the return value of ftell()?My confusion arise from the following line about ftell() and ungetc() from this link about ftell(SOURCE)

"For text streams, the numerical value may not be meaningful but can still be used to restore the position to the same position later using fseek (if there are characters put back using ungetc still pending of being read, the behavior is undefined)."

  • Focusing on the last line of the above paragraph,what has pending of being read got to do with a "ungetc()-obtained" character being discarded? Each time we read a character that was put into the stream using ungetc(),is it discarded after the read?

2条回答
成全新的幸福
2楼-- · 2019-03-01 06:00

Lets start from the beginning,

int ungetc(int c, FILE *stream);

The ungetc() function shall push the byte specified by c (converted to an unsigned char) back onto the input stream pointed to by stream.A character is virtually put back into an input stream, decreasing its internal file position as if a previous getc operation was undone.This only affects further input operations on that stream, and not the content of the physical file associated with it, which is not modified by any calls to this function.

int fseek(FILE *stream, long offset, int whence);

The new position, measured in bytes from the beginning of the file, shall be obtained by adding offset to the position specified by whence. The specified point is the beginning of the file for SEEK_SET, the current value of the file-position indicator for SEEK_CUR, or end-of-file for SEEK_END.fseek either flushes any buffered output before setting the file position or else remembers it so it will be written later in its proper place in the file

int fsetpos(FILE *stream, const fpos_t *pos);

The fsetpos() function sets the file position and state indicators for the stream pointed to by stream according to the value of the object pointed to by pos, which must be a value obtained from an earlier call to fgetpos() on the same stream.

void rewind(FILE *stream);

The rewind function repositions the file pointer associated with stream to the beginning of the file. A call to rewind is similar to

(void) fseek( stream, 0L, SEEK_SET );

So as you see ungetc(), Pushing back characters doesn't alter the file; only the internal buffering for the stream is affected.so your second comment "The other option is to load a copy of the original file into buffer and place the file pointer at the intended position" is correct.

Now Answering your second question - A successful intervening call (with the stream pointed to by stream) to a file-positioning function discards any pushed-back characters for the stream. The external storage corresponding to the stream is unchanged

查看更多
混吃等死
3楼-- · 2019-03-01 06:07

A good mental model of the put back character is simply that it's some extra little property which hangs off the FILE * object. Imagine you have:

typedef struct { 
   /* ... */
   int putback_char;
   /* ... */
} FILE;

Imagine putback_char is initialized to the value EOF which indicates "there is no putback char", and ungetc simply stores the character to this member.

Imagine that every read operation goes through getc, and that getc does something like this:

int getc(FILE *stream)
{
   int ret = stream->putback_char;

   if (ret != EOF) {
     stream->putback_char = EOF;
     if (__is_binary(stream))
        stream->current_position--;
     return ret;
   }

   return __internal_getc(stream); /* __internal_getc doesn't know about putback_char */
}

The functions which clear the pushback simply assign EOF to putback_char.

In other words, the put back character (and only one needs to be supported) can actually be a miniature buffer which is separate from the regular buffering. (Consider that even an unbuffered stream supports ungetc: such a stream has to put the byte or character somewhere.)

Regarding the position indicator, the C99 standard says this:

For a text stream, the value of its file position indicator after a successful call to the ungetc function is unspecified until all pushed-back characters are read or discarded. For a binary stream, its file position indicator is decremented by each successful call to the ungetc function; if its value was zero before a call, it is indeterminate after the call. [7.19.7.11 The ungetc function]

So, the www.cplusplus.com reference you're using is incorrect; the behavior of ftell is not undefined when there are pending characters pushed back with ungetc. For text streams, the value is unspecified. Accessing an unspecified value isn't undefined behavior, because an unspecified value cannot be a trap representation. The undefined behavior exists for binary streams if a push back occurs at position zero, because the position then becomes indeterminate. Indeterminate means that it's an unspecified value which could be a trap representation. Accessing it could halt the program with an error message, or trigger other behaviors.

It's better to get programming language and library specifications from the horse's mouth, rather than from random websites.

查看更多
登录 后发表回答