c++ ifstream function and field separators

2019-05-03 09:18发布

问题:

For this program i have only used field separators from data files in shell script. But I am trying to use the standard library function ifstream() to read in from a data file. The only problem is I am getting the data like so

A:KT5:14:executive desk:

This is for a hash table, and I need to separate the values in the line for the data structure as well as the transaction type. I’ve been looking around the web and have not found much on field separators and what I have found was quite confusing.

The question then being, is there a way to set a field separator with the ifstream function or is there another standard library i/o function I should be using?

Thanks.

回答1:

getline gives you the option to specify a delimiter. You can then read the input from a stream as a sequence of string separated by _Delim:

template<class CharType, class Traits, class Allocator>
   basic_istream< CharType, Traits >& getline(
       basic_istream< CharType, Traits >& _Istr,
       basic_string< CharType, Traits, Allocator >& _Str,
       CharType _Delim
   );

If this is uniformly structured data it might be useful to define a struct to contain it and implement operator>> to load each instance from the stream, using the above function internal to the operator code.

If you have to process multiple lines (so that newline is a record separator and : a field separator), load each line in turn into a stringstream using basic_istream::getline, and then postprocess the line into fields as shown.



回答2:

@Steve Townsend has already pointed out one possibility. If you prefer to use operator>> instead of std::getline, you can do that as well. An istream always treats whitespace as a separator. Each stream has an associated locale, and each locale includes a ctype facet. That ctype facet is what the istream uses to determine what input characters are whitespace.

In your case, you apparently want the stream to treat only newlines and colons as "whitespace" (i.e., separators), while the actual space character is just treated as a "normal" character, not a separator.

To do that, you can create a ctype facet like this:

struct field_reader: std::ctype<char> {

    field_reader(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(table_size, std::ctype_base::mask());

        rc['\n'] = std::ctype_base::space;
        rc[':'] = std::ctype_base::space;
        return &rc[0];
    }
};

To use, this, you have to "imbue" the stream with a locale using this facet:

int main() {
    std::stringstream input("A:KT5:14:executive desk:");

    // have the stream use our ctype facet:
    input.imbue(std::locale(std::locale(), new field_reader()));

    // copy fields from the stream to standard output, one per line:
    std::copy(std::istream_iterator<std::string>(input), 
              std::istream_iterator<std::string>(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
    return 0;
}

I'm the first to admit, however, that this has some shortcomings. First of all, locales and facets are generally pretty poorly documented, so most C++ programmers are likely to find this fairly difficult to understand (especially when all the real work happens "under the covers", so to speak).

Another possibility is to use Boost Tokenizer. In all honesty, this is a little more work to use -- it'll require that you do something like reading a string, then breaking it up separately. At the same time, it's well documented, pretty widely known, and fits enough better with people's preconceptions about how to do things like that, that quite a few people will probably find it easier to follow despite the extra complexity.