C++ reading from file puts three weird characters

2020-02-29 04:26发布

问题:

When i read from a file string by string, >> operation gets first string but it starts with "i" . Assume that first string is "street", than it gets as "istreet".

Other strings are okay. I tried for different txt files. The result is same. First string starts with "i". What is the problem?

Here is my code :

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int cube(int x){ return (x*x*x);}

int main(){

int maxChar;
int lineLength=0;
int cost=0;

cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;

fstream inFile("bla.txt",ios::in);

if (!inFile) {
    cerr << "Unable to open file datafile.txt";
    exit(1);   // call system to stop
}

while(!inFile.eof()) {
    string word;

    inFile >> word;
    cout<<word<<endl;
    cout<<word.length()<<endl;
    if(word.length()+lineLength<=maxChar){
        lineLength +=(word.length()+1);
    }
    else {
        cost+=cube(maxChar-(lineLength-1));
        lineLength=(word.length()+1);
    }   
}

}

回答1:

You're seeing a UTF-8 Byte Order Mark (BOM). It was added by the application that created the file.

To detect and ignore the marker you could try this (untested) function:

bool SkipBOM(std::istream & in)
{
    char test[4] = {0};
    in.read(test, 3);
    if (strcmp(test, "\xEF\xBB\xBF") == 0)
        return true;
    in.seekg(0);
    return false;
}


回答2:

With reference to the excellent answer by Mark Ransom above, adding this code skips the BOM (Byte Order Mark) on an existing stream. Call it after opening a file.

// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
    char test[3] = {0};
    in.read(test, 3);
    if ((unsigned char)test[0] == 0xEF && 
        (unsigned char)test[1] == 0xBB && 
        (unsigned char)test[2] == 0xBF)
    {
        return;
    }
    in.seekg(0);
}

To use:

ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
    // Process lines of input here.
}


回答3:

Here is another two ideas.

  1. if you are the one who create the files, save they length along with them, and when reading them, just cut all the prefix with this simple calculation: trueFileLength - savedFileLength = numOfByesToCut
  2. create your own prefix when saving the files, and when reading search for it and delete all what you found before.