Consider these two files:
file1.txt (Windows newline)
abc\r\n
def\r\n
file2.txt (Unix newline)
abc\n
def\n
I've noticed that for the file2.txt, the position obtained with fgetpos
is not incremented correctly. I'm working on Windows.
Let me show you an example. The following code:
#include<cstdio>
void read(FILE *file)
{
int c = fgetc(file);
printf("%c (%d)\n", (char)c, c);
fpos_t pos;
fgetpos(file, &pos); // save the position
c = fgetc(file);
printf("%c (%d)\n", (char)c, c);
fsetpos(file, &pos); // restore the position - should point to previous
c = fgetc(file); // character, which is not the case for file2.txt
printf("%c (%d)\n", (char)c, c);
c = fgetc(file);
printf("%c (%d)\n", (char)c, c);
}
int main()
{
FILE *file = fopen("file1.txt", "r");
printf("file1:\n");
read(file);
fclose(file);
file = fopen("file2.txt", "r");
printf("\n\nfile2:\n");
read(file);
fclose(file);
return 0;
}
gives such result:
file1:
a (97)
b (98)
b (98)
c (99)
file2:
a (97)
b (98)
(-1)
(-1)
file1.txt works as expected, while file2.txt behaves strange. To explain what's wrong with it, I tried the following code:
void read(FILE *file)
{
int c;
fpos_t pos;
while (1)
{
fgetpos(file, &pos);
printf("pos: %d ", (int)pos);
c = fgetc(file);
if (c == EOF) break;
printf("c: %c (%d)\n", (char)c, c);
}
}
int main()
{
FILE *file = fopen("file1.txt", "r");
printf("file1:\n");
read(file);
fclose(file);
file = fopen("file2.txt", "r");
printf("\n\nfile2:\n");
read(file);
fclose(file);
return 0;
}
I got this output:
file1:
pos: 0 c: a (97)
pos: 1 c: b (98)
pos: 2 c: c (99)
pos: 3 c:
(10)
pos: 5 c: d (100)
pos: 6 c: e (101)
pos: 7 c: f (102)
pos: 8 c:
(10)
pos: 10
file2:
pos: 0 c: a (97) // something is going wrong here...
pos: -1 c: b (98)
pos: 0 c: c (99)
pos: 1 c:
(10)
pos: 3 c: d (100)
pos: 4 c: e (101)
pos: 5 c: f (102)
pos: 6 c:
(10)
pos: 8
I know that fpos_t
is not meant to be interpreted by coder, because it's depending on implementation. However, the above example explains the problems with fgetpos
/fsetpos
.
How is it possible that the newline sequence affects the internal position of the file, even before it encounters that characters?
I'm adding this as supporting information for teppic's answer:
When dealing with a
FILE*
that has been opened as text instead of binary, thefgetpos()
function in VC++ 11 (VS 2012) may (and does for yourfile2.txt
example) end up in this stretch of code:It assumes that any
\n
character in the buffer was originally a\r\n
sequence that had been normalized when the data was read into the buffer. So there are times when it tries to account for that (now missing)\r
character that it believes previous processing of the file had removed from the buffer. This particular adjustment happens when you're near the end of the file; however there are other similar adjustments to account for the removed\r
bytes in thefgetpos()
handling.I would say the problem is probably caused by the second file confusing the implementation, since it's being opened in text mode, but it doesn't follow the requirements.
In the standard,
Your second file stream contains no valid newline characters (since it looks for
\r\n
to convert to the newline character internally). As a result, the implementation may not understand the line length properly, and get hopelessly confused when you try to move about in it.Additionally,
Bear in mind that the library will not just read each byte from the file as you call
fgetc
- it will read the entire file (for one so small) into the stream's buffer and operate on that.