As per this article:
If you try and lock a non-recursive mutex twice from the same thread without unlocking in between, you get undefined behavior.
My very naive mind tells me why don't they just return an error? Is there a reason why this has to be UB?
Undefined behavior allows implementations to do whatever is fastest/most convenient. For example, an efficient implementation of a non-recursive mutex might be a single bit where the lock operation is implemented with an atomic compare-and-swap instruction in a loop. If the thread that owns the mutex tries to lock it again it will deadlock because it is waiting for the mutex to unlock but since nobody else can unlock it (unless there's some other bug where some thread that doesn't own it unlocks it) the thread will wait forever.
Because it never happens in a correct program, and making a check for something that never happens is wasteful (and to make that check it needs to store the owning thread ID, which is also wasteful).
Note that it being undefined allows debug implementations to throw an exception, for example, while still allowing release implementations to be as efficient as possible.