I would like to know if there is a way to create the stack of a thread on a specific NUMA node.
I have written this code but i'm not sure if it does the trick or not.
pthread_t thread1;
int main(int argc, char**argv) {
pthread_attr_t attr;
pthread_attr_init(&attr);
char** stackarray;
int numanode = 1;
stackarray = (char**) numa_alloc_onnode(sizeof(char*), numanode);
// considering that the newly
// created thread will be running on a core on node1
pthread_attr_setstack(&attr, stackarray[0], 1000000);
pthread_create(&thread1, &attr, function, (void*)0);
...
...
}
Thank you for your help
Here's the code I use for this (slightly adapted to remove some constants defined elsewhere). Note that I first create the thread normally, and then call the SetAffinityAndRelocateStack()
below from within the thread. I think this is much better than trying to create your own stack, since stacks have special support for growing in case the bottom is reached.
The code can also be adapted to operate on the newly created thread from outside, but this could give rise to race conditions (e.g. if the thread performs I/O into its stack), so I wouldn't recommend it.
void* PreFaultStack()
{
const size_t NUM_PAGES_TO_PRE_FAULT = 50;
const size_t size = NUM_PAGES_TO_PRE_FAULT * numa_pagesize();
void *allocaBase = alloca(size);
memset(allocaBase, 0, size);
return allocaBase;
}
void SetAffinityAndRelocateStack(int cpuNum)
{
assert(-1 != cpuNum);
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(cpuNum, &cpuset);
const int rc = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
assert(0 == rc);
pthread_attr_t attr;
void *stackAddr = nullptr;
size_t stackSize = 0;
if ((0 != pthread_getattr_np(pthread_self(), &attr)) || (0 != pthread_attr_getstack(&attr, &stackAddr, &stackSize))) {
assert(false);
}
const unsigned long nodeMask = 1UL << numa_node_of_cpu(cpuNum);
const auto bindRc = mbind(stackAddr, stackSize, MPOL_BIND, &nodeMask, sizeof(nodeMask), MPOL_MF_MOVE | MPOL_MF_STRICT);
assert(0 == bindRc);
PreFaultStack();
// TODO: Also lock the stack with mlock() to guarantee it stays resident in RAM
return;
}