What's wrong with passing C++ iterator by refe

2019-02-01 05:14发布

问题:

I've written a few functions with a prototype like this:

template <typename input_iterator>
int parse_integer(input_iterator &begin, input_iterator end);

The idea is that the caller would provide a range of characters, and the function would interpret the characters as an integer value and return it, leaving begin at one past the last-used character. For example:

std::string sample_text("123 foo bar");
std::string::const_iterator p(sample_text.begin());
std::string::const_iterator end(sample_text.end());
int i = parse_integer(p, end);

This would leave i set to 123 and p "pointing" at the space before foo.

I've since been told (without explanation) that it's bad form to pass an iterator by reference. Is it bad form? If so, why?

回答1:

There is nothing really wrong, but it will certainly limit the use of the template. You won't be able to just put an iterator returned by something else or generated like v.begin(), since those will be temporaries. You will always first have to make a local copy, which is some kind of boilerplate not really nice to have.

One way is to overload it:

int parse_integer(input_iterator begin, input_iterator end, 
                  input_iterator &newbegin);

template<typename input_iterator>
int parse_integer(input_iterator begin, input_iterator end) {
    return parse_integer(begin, end, begin);
} 

Another option is to have an output iterator where the number will be written into:

template<typename input_iterator, typename output_iterator>
input_iterator parse_integer(input_iterator begin, input_iterator end,
                             output_iterator out);

You will have the return value to return the new input iterator. And you could then use a inserter iterator to put the parsed numbers into a vector or a pointer to put them directly into an integer or an array thereof if you already know the amount of numbers.

int i;
b = parse_integer(b, end, &i);

std::vector<int> numbers;
b = parse_integer(b, end, std::back_inserter(numbers));


回答2:

In general:

If you pass a non-const reference, the caller doesn't know if the iterator is being modified.

You could pass a const reference, but usually iterators are small enough that it gives no advantage over passing by value.

In your case:

I don't think there's anything wrong with what you do, except that it's not too standard-esque regarding iterator usage.



回答3:

When they say "don't pass by reference" maybe that's because it's more normal/idiomatic to pass iterators as value parameters, instead of passing them by const reference: which you did, for the second parameter.

In this example however you need to return two values: the parsed int value, and, the new/modified iterator value; and given that a function can't have two return codes, coding one of the return codes as a non-const reference is IMO normal.

An alternative would be to code it something like this:

//Comment: the return code is a pair of values, i.e. the parsed int and etc ...
pair<int, input_iterator> parse(input_iterator start, input_iterator end)
{
}


回答4:

In my opinion, if you want to do this the argument should be a pointer to the iterator you'll be changing. I'm not a big fan of non-const reference arguments because they hide the fact that the passed parameter might change. I know there's a lot of C++ users who disagree with my opinion on this - and that's fine.

However, in this case it's so common for iterators to be treated as value arguments that I think it's a particularly bad idea to pass iterators by non-const reference and modify the passed iterator. It just goes against the idiomatic way iterators are usually used.

Since there is a great way to do what you want that doesn't have this problem, I think you should use it:

template <typename input_iterator>
int parse_integer(input_iterator* begin, input_iterator end);

Now a caller would have to do:

int i = parse_integer(&p, end);

And it'll be obvious that the iterator can be changed.

By the way, I also like litb's suggestion of returning the new iterator and putting the parsed values into a location specified by an output iterator.



回答5:

In this context, I think that passing an iterator by reference is perfectly sensible, as long as it's well-documented.

It's worth noting that your approach (passing an iterator by reference to keep track of where you are when tokenizing a stream) is exactly the approach that is taken by boost::tokenizer. In particular, see the definition of the TokenizerFunction Concept. Overall, I find boost::tokenizer to be pretty well designed and well thought out.



回答6:

I think the Standard Library algorithms pass iterators by value exclusively (someone will now post an obvious exception to this) - this may be the origin of the idea. Of course, nothing says that your own code has to look like the Standard Library!



回答7:

Your function declaration's second parameter is missing the reference, is it?

Anyway, back to your question: No, I haven't ever read anything that says you should not pass iterators by reference. The problem with references is that they allow you to change the referenced object. In this case, if you are to change the iterator, you are potentially screwing up the entire sequence beyond that point thereby rendering further processing impossible.

Just one suggestion: type your parameters carefully.