Strange values in a lambda returning initializer_l

2019-02-04 00:34发布

问题:

Consider this C++11 code snippet:

#include <iostream>
#include <set>
#include <stdexcept>
#include <initializer_list>


int main(int argc, char ** argv)
{
    enum Switch {
        Switch_1,
        Switch_2,
        Switch_3,
        Switch_XXXX,
    };

    int foo_1 = 1;
    int foo_2 = 2;
    int foo_3 = 3;
    int foo_4 = 4;
    int foo_5 = 5;
    int foo_6 = 6;
    int foo_7 = 7;

    auto get_foos = [=] (Switch ss) -> std::initializer_list<int> {
        switch (ss) {
            case Switch_1:
                return {foo_1, foo_2, foo_3};
            case Switch_2:
                return {foo_4, foo_5};
            case Switch_3:
                return {foo_6, foo_7};
            default:
                throw std::logic_error("invalid switch");
        }
    };

    std::set<int> foos = get_foos(Switch_1);
    for (auto && foo : foos) {
        std::cout << foo << " ";
    }
    std::cout << std::endl;
    return 0;
}

Whatever compiler I try, all seem to handle it incorrectly. This makes me think that I am doing something wrong rather than it's a common bug across multiple compilers.

clang 3.5 output:

-1078533848 -1078533752 134518134

gcc 4.8.2 output:

-1078845996 -1078845984 3

gcc 4.8.3 output (compiled on http://www.tutorialspoint.com):

1 2 267998238

gcc (unknown version) output (compiled on http://coliru.stacked-crooked.com)

-1785083736 0 6297428 

The problem seems to be caused by using std::initializer_list<int> as a return value of lambda. When changing lambda definition to [=] (Switch ss) -> std::set<int> {...} returned values are correct.

Please, help me solve this mystery.

回答1:

From: http://en.cppreference.com/w/cpp/utility/initializer_list

The underlying array is not guaranteed to exist after the lifetime of the original initializer list object has ended. The storage for std::initializer_list is unspecified (i.e. it could be automatic, temporary, or static read-only memory, depending on the situation).

I don't think the initializer list is copy-constructable. std::set and other containers are. Basically it looks like your code behaves similar to "returning a reference to a temporary".

C++14 has something slightly different to say about the underlying storage - extending its lifetime - but that does not fix anything having to do with the lifetime of the initializer_list object, let alone copies thereof. Hence, the issue remains, even in C++14.

The underlying array is a temporary array, in which each element is copy-initialized (except that narrowing conversions are invalid) from the corresponding element of the original initializer list. The lifetime of the underlying array is the same as any other temporary object, except that initializing an initializer_list object from the array extends the lifetime of the array exactly like binding a reference to a temporary (with the same exceptions, such as for initializing a non-static class member). The underlying array may be allocated in read-only memory.



回答2:

The problem is that you are referencing an object that no longer exists and therefore you are invoking undefined behavior. initializer_list seems underspecified in the C++11 draft standard, there are no normative sections that actually specify this behavior. Although there are plenty of notes that indicate this will not work and in general although notes are not normative if they don't conflict with the normative text they are strongly indicative.

If we go to section 18.9 Initializer lists it has a note which says:

Copying an initializer list does not copy the underlying elements.

and in section 8.5.4 we have the following examples:

typedef std::complex<double> cmplx;
std::vector<cmplx> v1 = { 1, 2, 3 };

void f() {
    std::vector<cmplx> v2{ 1, 2, 3 };
    std::initializer_list<int> i3 = { 1, 2, 3 };
}

with the following notes:

For v1 and v2, the initializer_list object and array created for { 1, 2, 3 } have full-expression lifetime. For i3, the initializer_list object and array have automatic lifetime.

These notes are consistent with the initializer_list proposal: N2215 which gives the following example:

std::vector<double> v = {1, 2, 3.14};

and says:

Now add vector(initializer_list<E>) to vector<E> as shown above. Now, the example works. The initializer list {1, 2, 3.14} is interpreted as a temporary constructed like this:

const double temp[] = {double(1), double(2), 3.14 } ;
initializer_list<double> tmp(temp,
sizeof(temp)/sizeof(double));
vector<double> v(tmp);

[...]

Note that an initializer_list is a small object (probably two words), so passing it by value makes sense. Passing by value also simplifies inlining of begin() and end() and constant expression evaluation of size().

An initializer_list s will be created by the compiler, but can be copied by users. Think of it as a pair of pointers.

The initializer_list in this case just holds pointers to an automatic variable which will not exist after exiting the scope.

Update

I just realized the proposal actually points out this misuse scenario:

One implication is that an initializer_list is “ pointer like” in that it behaves like a pointer in respect to the underlying array. For example:

int * f(int a)
{ 
   int* p = &a;
   return p; //bug waiting to happen
}

initializer_list<int> g(int a, int b, int c)
{
   initializer_list<int> v = { a, b, c };
   return v; // bug waiting to happen
} 

It actually takes a minor amount of ingenuity to misuse an initializer_list this way. In particular, variables of type initializer_list are going to be rare.

I find the last statement(emphasis mine) particularly ironic.

Update 2

So defect report 1290 fixes the normative wording and so it now covers this behavior, although the copy case could be more explicit. It says:

A question has arisen over expected behavior when an initializer_list is a non-static data member of a class. Initialization of an initializer_list is defined in terms of construction from an implicitly allocated array whose lifetime "is the same as that of the initializer_list object". That would mean that the array needs to live as long as the initializer_list does, which would on the face of it appear to require the array to be stored in something like a std::unique_ptr within the same class (if the member is initialized in this manner).

It would be surprising if that was the intent, but it would make initializer_list usable in this context.

The resolution fixes the wording and we can find the new wording in the N3485 version of the draft standard. So section 8.5.4 [dcl.init.list] now says:

The array has the same lifetime as any other temporary object (12.2), except that initializing an initializer_- list object from the array extends the lifetime of the array exactly like binding a reference to a temporary.

and 12.2 [class.temporary] says:

The lifetime of a temporary bound to the returned value in a function return statement (6.6.3) is not extended; the temporary is destroyed at the end of the full-expression in the return statement.



回答3:

So, initializer_lists do not extend the lifetime of their referenced array when they are themselves copied or moved to the result of the copy/move. This makes returning them problematic. (they do extend the lifetime of the referenced array to their own lifetime, but this extension is not transitive over elision or copies of the list).

To fix this problem, store the data, and manage its lifetime manually:

template<size_t size, class T>
std::array<T, size> partial_array( T const* begin, T const* end ) {
  std::array<T, size> retval;
  size_t delta = (std::min)( size, end-begin );
  end = begin+delta;
  std::copy( begin, end, retval.begin() );
  return retval;
}
template<class T, size_t max_size>
struct capped_array {
  std::array<T, max_size> storage;
  size_t used = 0;
  template<size_t osize, class=std::enable_if_t< (size<=max_size) >>
  capped_array( std::array<T, osize> const& rhs ):
    capped_array( rhs.data(), rhs.data()+osize )
  {}
  template<size_t osize, class=std::enable_if_t< (size<=max_size) >>
  capped_array( capped_array<T, osize> const& rhs ):
    capped_array( rhs.data(), rhs.data()+rhs.used )
  {}
  capped_array(capped_array const& o)=default;
  capped_array(capped_array & o)=default;
  capped_array(capped_array && o)=default;
  capped_array(capped_array const&& o)=default;
  capped_array& operator=(capped_array const& o)=default;
  capped_array& operator=(capped_array & o)=default;
  capped_array& operator=(capped_array && o)=default;
  capped_array& operator=(capped_array const&& o)=default;

  // finish-start MUST be less than max_size, or we will truncate
  capped_array( T const* start, T const* finish ):
    storage( partial_array(start, finish) ),
    used((std::min)(finish-start, size))
  {}
  T* begin() { return storage.data(); }
  T* end() { return storage.data()+used; }
  T const* begin() const { return storage.data(); }
  T const* end() const { return storage.data()+used; }
  size_t size() const { return used; }
  bool empty() const { return !used; }
  T& front() { return *begin(); }
  T const& front() const { return *begin(); }
  T& back() { return *std::prev(end()); }
  T const& back() const { return *std::prev(end()); }

  capped_array( std::initializer_list<T> il ):
    capped_array(il.begin(), il.end() )
  {}
};

the goal here is simple. Create a stack based data type that stores a bunch of Ts, up to a cap, and can handle having fewer.

Now we replace your std::initializer_list with:

auto get_foos = [=] (Switch ss) -> capped_array<int,3> {
    switch (ss) {
        case Switch_1:
            return {foo_1, foo_2, foo_3};
        case Switch_2:
            return {foo_4, foo_5};
        case Switch_3:
            return {foo_6, foo_7};
        default:
            throw std::logic_error("invalid switch");
    }
};

and your code works. The free store is not used (no heap allocation).

A more advanced version would use an array of uninitialized data and manually construct each T.