Why I get the following error for the following code with mpirun -np 2 ./out
command? I called make_layout()
after resizing the std::vector
so normally I should not get this error. It works if I do not resize. What is the reason?
main.cpp:
#include <iostream>
#include <vector>
#include "mpi.h"
MPI_Datatype MPI_CHILD;
struct Child
{
std::vector<int> age;
void make_layout();
};
void Child::make_layout()
{
int nblock = 1;
int age_size = age.size();
int block_count[nblock] = {age_size};
MPI_Datatype block_type[nblock] = {MPI_INT};
MPI_Aint offset[nblock] = {0};
MPI_Type_struct(nblock, block_count, offset, block_type, &MPI_CHILD);
MPI_Type_commit(&MPI_CHILD);
}
int main()
{
int rank, size;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
Child kid;
kid.age.resize(5);
kid.make_layout();
int datasize;
MPI_Type_size(MPI_CHILD, &datasize);
std::cout << datasize << std::endl; // output: 20 (5x4 seems OK).
if (rank == 0)
{
MPI_Send(&kid, 1, MPI_CHILD, 1, 0, MPI_COMM_WORLD);
}
if (rank == 1)
{
MPI_Recv(&kid, 1, MPI_CHILD, 0, 0, MPI_COMM_WORLD, NULL);
}
MPI_Finalize();
return 0;
}
Error message:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x14ae7b8
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7fe1ad91c3d0]
[ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x22)[0x7fe1ad5c5a92]
[ 2] ./out[0x400de4]
[ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fe1ad562830]
[ 4] ./out[0x400ec9]
*** End of error message ***
Here is an example with several
std::vector
members that uses MPI datatypes with absolute addresses:Sample use of that class:
Notice the use of
MPI_BOTTOM
as the buffer address.MPI_BOTTOM
specifies the bottom of the address space, which is0
on architectures with flat address space. Since the offsets passed toMPI_Type_create_struct
are the absolute addresses of the structure members, when those are added to0
, the result is again the absolute address of each structure member.Child::mpi_dtype()
returns a lazily constructed MPI datatype specific to that instance.Since
resize()
reallocates memory, which could result in the data being moved to a different location in memory, theinvalidate_dtype()
method should be used to force the recreation of the MPI datatype afterresize()
or any other operation that might trigger memory reallocation:Please excuse any sloppy C++ code above.
The problem here is that you're telling MPI to send a block of integers from
&kid
, but that's not where your data is.&kid
points to anstd::vector
object, which has an internal pointer to your block of integers allocated somewhere on the heap.Replace
&kid
withkid.age.data()
and it should work. The reason it "works" when you don't resize is that the vectors will be of 0 size, so MPI will try to send an empty message and no actual memory access takes place.Be careful, you faced several problems.
First
std::vector
stores object in heap, so data is not really stored inside your struct.Second you are not able to send STL containers even between dynamic libraries, also for app instances this is also true. Because they may be compiled with different versions of STL and work on different architectures differently.
Here is good answer about this part of question: https://stackoverflow.com/a/22797419/440168