C++, boost: which is fastest way to parse string l

2020-02-15 05:35发布

问题:

we have std::string A with tcp://adr:port/ How to parse it into address std::string and one int for port?

回答1:

void extract(std::string const& ip, std::string& address, std::string& service)
{
   boost::regex e("tcp://(.+):(\\d+)/");
   boost::smatch what;
   if(boost::regex_match(ip, what, e, boost::match_extra))
   {
     boost::smatch::iterator it = what.begin();
     ++it; // skip the first entry..
     address = *it;
     ++it;
     service = *it;
   }
}

EDIT: reason service is a string here is that you'll need it as a string for resolver! ;)



回答2:

Although some wouldn't consider it particularly kosher C++, probably the easiest way would be to use sscanf:

sscanf(A.c_str(), "tcp://%[^:]:%d", &addr, &port);

Another possibility would be to put the string into a stringstream, imbue the stream with a facet that treats most alphabetic and punctuation as whitespace, and just read the address and port like:

std::istringstream buffer(A);
buffer.imbue(new numeric_only);
buffer >> addr >> port;

The facet would look something like this:

struct digits_only: std::ctype<char> 
{
    digits_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        // everything is white-space:
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::space);

        // except digits, which are digits
        std::fill(&rc['0'], &rc['9'], std::ctype_base::digit);

        // and '.', which we'll call punctuation:
        rc['.'] = std::ctype_base::punct;
        return &rc[0];
    }
};

operator>> treats whitespace as separators between "fields", so this will treat something like 192.168.1.1:25 as two strings: "192.168.1.1" and "25".



回答3:

Fastest as in computer time or programmer time? I can't speak of benchmarks but the uri library in the cpp-netlib framework works very well and is very easy and straightforward to use.

http://cpp-netlib.github.com/0.8-beta/uri.html



回答4:

You could use a tool like re2c to create a fast custom scanner. I'm also unclear on what you consider to be "fastest" -- for the processor or development time or both?



回答5:

Nowadays one may also meet IPv6 addresses with a host part that already contains a variable number of colons and dots. Splitting URL's then should be done following RFC3986. See wikipedia IPv6