Why fstream::tellg() return value is enlarged by t

2019-08-20 17:05发布

问题:

Program openes input file and prints current reading/writing position several times.

If file is formated with '\n' for newline, values are as expected: 0, 1, 2, 3.

On the other side, if the newline is '\r\n' it appears that after some reading, current position returned by all tellg() calls are offsetted by the number of newlines in the file - output is: 0, 5, 6, 7.

All returned values are increased by 4, which is a number of newlines in example input file.

#include <fstream>
#include <iostream>
#include <iomanip>
using std::cout;
using std::setw;
using std::endl;

int main()
{
    std::fstream ioff("su9.txt");
    if(!ioff) return -1;
    int c = 0;

    cout << setw(30) << std::left << " Before any operation " << ioff.tellg() << endl;

    c = ioff.get();
    cout << setw(30) << std::left << " After first 'get' " << ioff.tellg() << " Character read: " << (char)c << endl;

    c = ioff.get();
    cout << setw(30) << std::left << " After second 'get' " << ioff.tellg() << " Character read: " << (char)c << endl;

    c = ioff.get();
    cout << setw(30) << std::left << " Third 'get' " << ioff.tellg() << "\t\tCharacter read: " << (char)c << endl;

    return 0;
}

Input file is 5 lines long (has 4 newlines), with a content:

-------------------------------------------
abcd
efgh
ijkl


--------------------------------------------

output (\n):

Before any operation         0
After first 'get'            1      Character read: a
After second 'get'           2      Character read: b
Third 'get'                  3      Character read: c

output (\r\n):

Before any operation         0
After first 'get'            5      Character read: a
After second 'get'           6      Character read: b
Third 'get'                  7      Character read: c

Notice that character values are read corectly.

回答1:

The first, and most obvious question, is why do you expect any particular values when teh results of tellg are converted to an integral type. The only defined use of the results of tellg is as a later argument to seekg; they have no defined numerical significance what so ever.

Having said that: in Unix and Windows implementations, they will practically always correspond to the byte offset of the physical position in the file. Which means that they will have some signification if the file is opened in binary mode; under Windows, for example, text mode (the default) maps the two character sequence 0x0D, 0x0A in the file to the single character '\n', and treats the single character 0x1A as if it had encountered end of file. (Binary and text mode are indentical under Unix, so things often seem to work there even when they aren't guaranteed.)

I might add that I cannot reproduce your results with MSC++. Not that that means anything; as I said, the only requirements for tellg is that the returned value can be used in a seekg to return to the same place. (Another issue might be how you created the files. Might one of them start with a UTF-8 encoding of a BOM, for example, and the other not?)