Simple way to split a sequence of null-separated s

2019-01-23 04:15发布

问题:

I have a series of strings stored in a single array, separated by nulls (for example ['f', 'o', 'o', '\0', 'b', 'a', 'r', '\0'...]), and I need to split this into a std::vector<std::string> or similar.

I could just write a 10-line loop to do this using std::find or strlen (in fact I just did), but I'm wondering if there is a simpler/more elegant way to do it, for example some STL algorithm I've overlooked, which can be coaxed into doing this.

It is a fairly simple task, and it wouldn't surprise me if there's some clever STL trickery that can be applied to make it even simpler.

Any takers?

回答1:

My two cents :

const char* p = str;
std::vector<std::string> vector;

do {
  vector.push_back(std::string(p));
  p += vector.back().size() + 1;
} while ( // whatever condition applies );


回答2:

Boost solution:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
//input_array must be a Range containing the input.
boost::split(
    strs,
    input_array,
    boost::is_any_of(boost::as_array("\0")));


回答3:

The following relies on std::string having an implicit constructor taking a const char*, making the loop a very simple two-liner:

#include <iostream>
#include <string>
#include <vector>

template< std::size_t N >
std::vector<std::string> split_buffer(const char (&buf)[N])
{
    std::vector<std::string> result;

    for(const char* p=buf; p!=buf+sizeof(buf); p+=result.back().size()+1)
        result.push_back(p);

    return result;
}

int main()
{
    std::vector<std::string> test = split_buffer("wrgl\0brgl\0frgl\0srgl\0zrgl");

    for (auto it = test.begin(); it != test.end(); ++it)
        std::cout << '"' << *it << "\"\n";

    return 0;
}

This solution assumes the buffer's size is known and the criterion for the end of the list of strings. If the list is terminated by "\0\0" instead, the condition in the loop needs to be changed from p!=foo+sizeof(foo) to *p.



回答4:

Here's the solution I came up with myself, assuming the buffer ends immediately after the last string:

std::vector<std::string> split(const std::vector<char>& buf) {
    auto cur = buf.begin();
    while (cur != buf.end()) {
        auto next = std::find(cur, buf.end(), '\0');
        drives.push_back(std::string(cur, next));
        cur = next + 1;
    }
    return drives;
}


回答5:

A bad answer, actually, but I doubted your claim of a 10 line loop for manual splitting. 4 Lines do it for me:

#include <vector>
#include <iostream>
int main() {
    using std::vector;

    const char foo[] = "meh\0heh\0foo\0bar\0frob";

    vector<vector<char> > strings(1);
    for (const char *it=foo, *end=foo+sizeof(foo); it!=end; ++it) {
        strings.back().push_back(*it);
        if (*it == '\0') strings.push_back(vector<char>());
    }

    std::cout << "number of strings: " << strings.size() << '\n';
    for (vector<vector<char> >::iterator it=strings.begin(), end=strings.end(); 
         it!=end; ++it)
        std::cout << it->data() << '\n';
}


回答6:

A more elegant and actual solution (compared to my other answer) uses getline and boils down to 2 lines with only C++2003, and no manual loop bookkeeping and conditioning is required:

#include <iostream>
#include <sstream>
#include <string>

int main() {
    const char foo[] = "meh\0heh\0foo\0bar\0frob";

    std::istringstream ss (std::string(foo, foo + sizeof foo));
    std::string str;

    while (getline (ss, str, '\0'))
        std::cout << str << '\n';
}

However, note how the range based string constructor already indicates an inherent problem with splitting-at-'\0's: You must know the exact size, or find some other char-combo for the Ultimate Terminator.



回答7:

In C, string.h has this guy:

char * strtok ( char * str, const char * delimiters );

the example on cplusplus.com :

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

It's not C++, but it will work



标签: c++ stl std