C++ string parsing (python style)

2019-01-17 01:52发布

问题:

I love how in python I can do something like:

points = []
for line in open("data.txt"):
    a,b,c = map(float, line.split(','))
    points += [(a,b,c)]

Basically it's reading a list of lines where each one represents a point in 3D space, the point is represented as three numbers separated by commas

How can this be done in C++ without too much headache?

Performance is not very important, this parsing only happens one time, so simplicity is more important.

P.S. I know it sounds like a newbie question, but believe me I've written a lexer in D (pretty much like C++) which involves reading some text char by char and recognizing tokens,
it's just that, coming back to C++ after a long period of python, just makes me not wanna waste my time on such things.

回答1:

I`d do something like this:

ifstream f("data.txt");
string str;
while (getline(f, str)) {
    Point p;
    sscanf(str.c_str(), "%f, %f, %f\n", &p.x, &p.y, &p.z); 
    points.push_back(p);
}

x,y,z must be floats.

And include:

#include <iostream>
#include <fstream>


回答2:

The C++ String Toolkit Library (StrTk) has the following solution to your problem:

#include <string>
#include <deque>
#include "strtk.hpp"

struct point { double x,y,z; }

int main()
{
   std::deque<point> points;
   point p;
   strtk::for_each_line("data.txt",
                        [&points,&p](const std::string& str)
                        {
                           strtk::parse(str,",",p.x,p.y,p.z);
                           points.push_back(p);
                        });
   return 0;
}

More examples can be found Here



回答3:

All these good examples aside, in C++ you would normally override the operator >> for your point type to achieve something like this:

point p;
while (file >> p)
    points.push_back(p);

or even:

copy(
    istream_iterator<point>(file),
    istream_iterator<point>(),
    back_inserter(points)
);

The relevant implementation of the operator could look very much like the code by j_random_hacker.



回答4:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>     // For replace()

using namespace std;

struct Point {
    double a, b, c;
};

int main(int argc, char **argv) {
    vector<Point> points;

    ifstream f("data.txt");

    string str;
    while (getline(f, str)) {
        replace(str.begin(), str.end(), ',', ' ');
        istringstream iss(str);
        Point p;
        iss >> p.a >> p.b >> p.c;
        points.push_back(p);
    }

    // Do something with points...

    return 0;
}


回答5:

This answer is based on the previous answer by j_random_hacker and makes use of Boost Spirit.

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <boost/spirit.hpp>

using namespace std;
using namespace boost;
using namespace boost::spirit;

struct Point {
    double a, b, c;
};

int main(int argc, char **argv) 
{
    vector<Point> points;

    ifstream f("data.txt");

    string str;
    Point p;
    rule<> point_p = 
           double_p[assign_a(p.a)] >> ',' 
        >> double_p[assign_a(p.b)] >> ',' 
        >> double_p[assign_a(p.c)] ; 

    while (getline(f, str)) 
    {
        parse( str, point_p, space_p );
        points.push_back(p);
    }

    // Do something with points...

    return 0;
}


回答6:

Fun with Boost.Tuples:

#include <boost/tuple/tuple_io.hpp>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>

int main() {
    using namespace boost::tuples;
    typedef boost::tuple<float,float,float> PointT;

    std::ifstream f("input.txt");
    f >> set_open(' ') >> set_close(' ') >> set_delimiter(',');

    std::vector<PointT> v;

    std::copy(std::istream_iterator<PointT>(f), std::istream_iterator<PointT>(),
             std::back_inserter(v)
    );

    std::copy(v.begin(), v.end(), 
              std::ostream_iterator<PointT>(std::cout)
    );
    return 0;
}

Note that this is not strictly equivalent to the Python code in your question because the tuples don't have to be on separate lines. For example, this:

1,2,3 4,5,6

will give the same output than:

1,2,3
4,5,6

It's up to you to decide if that's a bug or a feature :)



回答7:

You could read the file from a std::iostream line by line, put each line into a std::string and then use boost::tokenizer to split it. It won't be quite as elegant/short as the python one but a lot easier than reading things in a character at a time...



回答8:

Its nowhere near as terse, and of course I didn't compile this.

float atof_s( std::string & s ) { return atoi( s.c_str() ); }
{ 
ifstream f("data.txt")
string str;
vector<vector<float>> data;
while( getline( f, str ) ) {
  vector<float> v;
  boost::algorithm::split_iterator<string::iterator> e;
  std::transform( 
     boost::algorithm::make_split_iterator( str, token_finder( is_any_of( "," ) ) ),
     e, v.begin(), atof_s );
  v.resize(3); // only grab the first 3
  data.push_back(v);
}


回答9:

One of Sony Picture Imagework's open-source projects is Pystring, which should make for a mostly direct translation of the string-splitting parts:

Pystring is a collection of C++ functions which match the interface and behavior of python’s string class methods using std::string. Implemented in C++, it does not require or make use of a python interpreter. It provides convenience and familiarity for common string operations not included in the standard C++ library

There are a few examples, and some documentation



回答10:

all these are good examples. yet they dont answer the following:

  1. a CSV file with different column numbers (some rows with more columns than others)
  2. or when some of the values have white space (ya yb,x1 x2,,x2,)

so for those who are still looking, this class: http://www.codeguru.com/cpp/tic/tic0226.shtml is pretty cool... some changes might be needed