Segmentation fault when sending struct having std:

Why I get the following error for the following code with mpirun -np 2 ./out command? I called make_layout() after resizing the std::vector so normally I should not get this error. It works if I do not resize. What is the reason?

main.cpp:

#include <iostream>
#include <vector>
#include "mpi.h"

MPI_Datatype MPI_CHILD;

struct Child
{
    std::vector<int> age;

    void make_layout();
};

void Child::make_layout()
{
    int nblock = 1;
    int age_size = age.size();
    int block_count[nblock] = {age_size};
    MPI_Datatype block_type[nblock] = {MPI_INT};
    MPI_Aint offset[nblock] = {0};
    MPI_Type_struct(nblock, block_count, offset, block_type, &MPI_CHILD);
    MPI_Type_commit(&MPI_CHILD);
}

int main()
{
    int rank, size;

    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);    

    Child kid;
    kid.age.resize(5);
    kid.make_layout();
    int datasize;
    MPI_Type_size(MPI_CHILD, &datasize);
    std::cout << datasize << std::endl; // output: 20 (5x4 seems OK).

    if (rank == 0)
    {
        MPI_Send(&kid, 1, MPI_CHILD, 1, 0, MPI_COMM_WORLD);
    }

    if (rank == 1)
    {
        MPI_Recv(&kid, 1, MPI_CHILD, 0, 0, MPI_COMM_WORLD, NULL);
    }

    MPI_Finalize();

    return 0;
}

Error message:

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x14ae7b8
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7fe1ad91c3d0]
[ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x22)[0x7fe1ad5c5a92]
[ 2] ./out[0x400de4]
[ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fe1ad562830]
[ 4] ./out[0x400ec9]
*** End of error message ***

标签： c++ linux mpi c++14 openmpi

3条回答

走好不送

2楼-- · 2019-03-02 19:58

Here is an example with several std::vector members that uses MPI datatypes with absolute addresses:

struct Child
{
    int foo;
    std::vector<float> bar;
    std::vector<int> baz;

    Child() : dtype(MPI_DATATYPE_NULL) {}
    ~Child() { if (dtype != MPI_DATATYPE_NULL) MPI_Type_free(dtype); }

    const MPI_Datatype mpi_dtype();
    void invalidate_dtype();

private:
    MPI_Datatype dtype;
    void make_dtype();
};

const MPI_Datatype Child::mpi_dtype()
{
    if (dtype == MPI_DATATYPE_NULL)
        make_dtype();
    return dtype;
}

void Child::invalidate_dtype()
{
    if (dtype != MPI_DATATYPE_NULL)
        MPI_Datatype_free(&dtype);
}

void Child::make_dtype()
{
    const int nblock = 3;
    int block_count[nblock] = {1, bar.size(), baz.size()};
    MPI_Datatype block_type[nblock] = {MPI_INT, MPI_FLOAT, MPI_INT};
    MPI_Aint offset[nblock];
    MPI_Get_address(&foo, &offset[0]);
    MPI_Get_address(&bar[0], &offset[1]);
    MPI_Get_address(&baz[0], &offset[2]);

    MPI_Type_struct(nblock, block_count, offset, block_type, &dtype);
    MPI_Type_commit(&dtype);
}

Sample use of that class:

Child kid;
kid.foo = 5;
kid.bar.resize(5);
kid.baz.resize(10);

if (rank == 0)
{
    MPI_Send(MPI_BOTTOM, 1, kid.mpi_dtype(), 1, 0, MPI_COMM_WORLD);
}

if (rank == 1)
{
    MPI_Recv(MPI_BOTTOM, 1, kid.mpi_dtype(), 0, 0, MPI_COMM_WORLD, NULL);
}

Notice the use of MPI_BOTTOM as the buffer address. MPI_BOTTOM specifies the bottom of the address space, which is 0 on architectures with flat address space. Since the offsets passed to MPI_Type_create_struct are the absolute addresses of the structure members, when those are added to 0, the result is again the absolute address of each structure member. Child::mpi_dtype() returns a lazily constructed MPI datatype specific to that instance.

Since resize() reallocates memory, which could result in the data being moved to a different location in memory, the invalidate_dtype() method should be used to force the recreation of the MPI datatype after resize() or any other operation that might trigger memory reallocation:

// ...
kid.bar.resize(100);
kid.invalidate_dtype();
// MPI_Send / MPI_Recv

Please excuse any sloppy C++ code above.

0人赞添加讨论(0) 举报

冷血范

3楼-- · 2019-03-02 20:03

The problem here is that you're telling MPI to send a block of integers from &kid, but that's not where your data is. &kid points to an std::vector object, which has an internal pointer to your block of integers allocated somewhere on the heap.

Replace &kid with kid.age.data() and it should work. The reason it "works" when you don't resize is that the vectors will be of 0 size, so MPI will try to send an empty message and no actual memory access takes place.

0人赞添加讨论(0) 举报

Bombasti

4楼-- · 2019-03-02 20:03

Be careful, you faced several problems.

First std::vector stores object in heap, so data is not really stored inside your struct.

Second you are not able to send STL containers even between dynamic libraries, also for app instances this is also true. Because they may be compiled with different versions of STL and work on different architectures differently.

Here is good answer about this part of question: https://stackoverflow.com/a/22797419/440168

0人赞添加讨论(0) 举报

Segmentation fault when sending struct having std:

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间