How many least-significant bits are the same for b

ARM processors, for example, have a mul instruction, which performs a 32-bit x 32-bit multiplication and returns the least-significant 32-bits of the result. They also have umull and smull instructions, which again do a 32-bit x 32-bit multiplication, but return a full 64-bit result. umull does an unsigned multiplication, and smull does a signed multiplication.

Why is it not necessary to have separate unsigned and signed versions of mul? In the case of a 32-bit x 32-bit multiplication, presumably the least-significant 32-bits of the result are the same in both cases? Are exactly 32 bits the same, or more than 32?

More generally, in the case of an m-bit x n-bit multiplication (producing an (m+n)-bit result), how many least-significant bits are the same for both an unsigned and a signed multiplication?

You can do this with pencil and paper...Elementary school style

-1 * 3 = -3, 7 * 3 = 21 smull vs umull

0b111 * 0b011

   111111
*  000011
==========
   111111
  111111
 000000
 ...
+
==========
 11111101

(technically the signs are extended as wide as the internal alu inputs)

which is -3

take the same 3 bit numbers but use umull

0b111 * 0b011

      000111
*     000011
=============
      000111
     000111
    000000
+  ...
==============
     0010101

the result is 21.

The beauty of twos complement is that add and subtract use the same logic, but multiply and divide you have to sign extend to get the right answer and there in lies the rub. Unsigned sign extends zeros signed sign extends the sign. Multiplies at worse double the number of bits required to store the result so a 16 bit by 16 multiply needs to be done with 32 bit operands and what you fill into those upper 16 bits varies between a signed multiply and unsigned. Once you sign extend then sure you can feed the same multiply logic because there is no difference there. I guess one could argue that is how add and subtract works too, what you feed the same adder logic varies depending add vs subtract likewise what you pull out at the end may vary (if you invert the carry bit to call it a borrow)

Now per your question, yes if you take a 3 bit by 3 bit in and only look at the 3 lower bits of the output which is almost always the wrong answer, but anyway same elementary school math

   AAAabc
   DDDdef
=========
   AAAabc
  AAAabc
 AAAabc
AAAabc
...
===========

The lower 3 bits for a 3x3 bit input are strictly determined by the original 3 bit inputs, the sign extension which is what varies between umull and smull do not come into play. It is an interesting exercise but doesnt have too much real world use, most operand combinations will overflow, not all but a high percentage.

If you repeat this exercise for M * N bits then it should be the smaller of M or N bits that are unaffected by the sign extension. That simple exercise is left to the reader.