To support move semantics, should function paramet

2019-02-02 08:23发布

站内文章 / C++

53 0

太酷不给撩

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

One of my function takes a vector as a parameter and stores it as a member variable. I am using const reference to a vector as described below.

class Test {
 public:
  void someFunction(const std::vector<string>& items) {
   m_items = items;
  }

 private:
  std::vector<string> m_items;
};

However, sometimes items contains a large number of strings, so I'd like to add a function (or replace the function with a new one) that supports move semantics.

I am thinking of several approaches, but I'm not sure which one to choose.

1) unique_ptr

void someFunction(std::unique_ptr<std::vector<string>> items) {
   // Also, make `m_itmes` std::unique_ptr<std::vector<string>>
   m_items = std::move(items);
}

2) pass by value and move

void someFunction(std::vector<string> items) {
   m_items = std::move(items);
}

3) rvalue

void someFunction(std::vector<string>&& items) {
   m_items = std::move(items);
}

Which approach should I avoid and why?

回答1:

Unless you have a reason for the vector to live on the heap, I would advise against using unique_ptr

The vector's internal storage lives on the heap anyway, so you'll be requiring 2 degrees of indirection if you use unique_ptr, one to dereference the pointer to the vector, and again to dereference the internal storage buffer.

As such, I would advise to use either 2 or 3.

If you go with option 3 (requiring an rvalue reference), you are foisting a requirement on the users of your class that they pass an rvalue (either directly from a temporary, or move from an lvalue), when calling someFunction.

The requirement of moving from an lvalue is onerous.

If your users want to keep a copy of the vector, they have to jump through hoops to do so.

std::vector<string> items = { "1", "2", "3" };
Test t;
std::vector<string> copy = items; // have to copy first
t.someFunction(std::move(items));

However, if you go with option 2, the user can decide if they want to keep a copy, or not - the choice is theirs

Keep a copy:

std::vector<string> items = { "1", "2", "3" };
Test t;
t.someFunction(items); // pass items directly - we keep a copy

Don't keep a copy:

std::vector<string> items = { "1", "2", "3" };
Test t;
t.someFunction(std::move(items)); // move items - we don't keep a copy

回答2:

On the surface, option 2 seems like a good idea since it handles both lvalues and rvalues in a single function. However, as Herb Sutter notes in his CppCon 2014 talk Back to the Basics! Essentials of Modern C++ Style, this is a pessimization for the common case of lvalues.

If m_items was "bigger" than items, your original code will not allocate memory for the vector:

// Original code:
void someFunction(const std::vector<string>& items) {
   // If m_items.capacity() >= items.capacity(),
   // there is no allocation.
   // Copying the strings may still require
   // allocations
   m_items = items;
}

The copy-assignment operator on std::vector is smart enough to reuse the existing allocation. On the other hand, taking the parameter by value will always have to make another allocation:

// Option 2:
// When passing in an lvalue, we always need to allocate memory and copy over
void someFunction(std::vector<string> items) {
   m_items = std::move(items);
}

To put it simply: copy construction and copy assignment do not necessarily have the same cost. It's not unlikely for copy assignment to be more efficient than copy construction — it is more efficient for std::vector and std::string ^†.

The easiest solution, as Herb notes, is to add an rvalue overload (basically your option 3):

// You can add `noexcept` here because there will be no allocation‡
void someFunction(std::vector<string>&& items) noexcept {
   m_items = std::move(items);
}

Do note that the copy-assignment optimization only works when m_items already exists, so taking parameters to constructors by value is totally fine - the allocation would have to be performed either way.

TL;DR: Choose to add option 3. That is, have one overload for lvalues and one for rvalues. Option 2 forces copy construction instead of copy assignment, which can be more expensive (and is for std::string and std::vector)

_{† If you want to see benchmarks showing that option 2 can be a pessimization, at this point in the talk, Herb shows some benchmarks}

_{‡ We shouldn't have marked this as noexcept if std::vector's move-assignment operator wasn't noexcept. Do consult the documentation if you are using a custom allocator.

As a rule of thumb, be aware that similar functions should only be marked noexcept if the type's move-assignment is noexcept}

回答3:

It depends on your usage patterns:

Option 1

Pros:

Responsibility is explicitly expressed and passed from the caller to the callee

Cons:

Unless the vector was already wrapped using a unique_ptr, this doesn't improve readability
Smart pointers in general manage dynamically allocated objects. Thus, your vector must become one. Since standard library containers are managed objects that use internal allocations for the storage of their values, this means that there are going to be two dynamic allocations for each such vector. One for the management block of the unique ptr + the vector object itself and an additional one for the stored items.

Summary:

If you consistently manage this vector using a unique_ptr, keep using it, otherwise don't.

Option 2

Pros:

This option is very flexible, since it allows the caller to decide whether he wan't to keep a copy or not:

std::vector<std::string> vec { ... };
Test t;
t.someFunction(vec); // vec stays a valid copy
t.someFunction(std::move(vec)); // vec is moved

When the caller uses std::move() the object is only moved twice (no copies), which is efficient.

Cons:

When the caller doesn't use std::move(), a copy constructor is always called to create the temporary object. If we were to use void someFunction(const std::vector<std::string> & items) and our m_items was already big enough (in terms of capacity) to accommodate items, the assignment m_items = items would have been only a copy operation, without the extra allocation.

Summary:

If you know in advance that this object is going to be re-set many times during runtime, and the caller doesn't always use std::move(), I would have avoided it. Otherwise, this is a great option, since it is very flexible, allowing both user-friendliness and higher performance by demand despite the problematic scenario.

Option 3

Cons:

This option forces the caller to give up on his copy. So if he wants to keep a copy to himself, he must write additional code:
```
std::vector<std::string> vec { ... };
Test t;
t.someFunction(std::vector<std::string>{vec});
```

Summary:

This is less flexible than Option #2 and thus I would say inferior in most scenarios.

Option 4

Given the cons of options 2 and 3, I would deem to suggest an additional option:

void someFunction(const std::vector<int>& items) {
    m_items = items;
}

// AND

void someFunction(std::vector<int>&& items) {
    m_items = std::move(items);
}

Pros:

It solves all the problematic scenarios described for options 2 & 3 while enjoying their advantages as well
Caller decided to keep a copy to himself or not
Can be optimized for any given scenario

Cons:

If the method accepts many parameters both as const references and/or rvalue references the number of prototypes grows exponentially

Summary:

As long as you don't have such prototypes, this is a great option.

回答4:

The current advice on this is to take the vector by value and move it into the member variable:

void fn(std::vector<std::string> val)
{
  m_val = std::move(val);
}

And I just checked, std::vector does supply a move-assignment operator. If the caller doesn't want to keep a copy, they can move it into the function at the call site: fn(std::move(vec));.