Mapping two integers to one, in a unique and deter

2018-12-31 13:52发布

问题:

Imagine two positive integers A and B. I want to combine these two into a single integer C.

There can be no other integers D and E which combine to C. So combining them with the addition operator doesn\'t work. Eg 30 + 10 = 40 = 40 + 0 = 39 + 1 Neither does concatination work. Eg \"31\" + \"2\" = 312 = \"3\" + \"12\"

This combination operation should also be deterministic (always yield the same result with the same inputs) and should always yield an integer on either the positive or the negative side of integers.

回答1:

You\'re looking for a bijective NxN -> N mapping. These are used for e.g. dovetailing. Have a look at this PDF for an introduction to so-called pairing functions. Wikipedia introduces a specific pairing function, namely the Cantor pairing function:

\"pi(k1,

Three remarks:

  • As others have made clear, if you plan to implement a pairing function, you may soon find you need arbitrarily large integers (bignums).
  • If you don\'t want to make a distinction between the pairs (a, b) and (b, a), then sort a and b before applying the pairing function.
  • Actually I lied. You are looking for a bijective ZxZ -> N mapping. Cantor\'s function only works on non-negative numbers. This is not a problem however, because it\'s easy to define a bijection f : Z -> N, like so:
    • f(n) = n * 2 if n >= 0
    • f(n) = -n * 2 - 1 if n < 0


回答2:

Cantor pairing function is really one of the better ones out there considering its simple, fast and space efficient, but there is something even better published at Wolfram by Matthew Szudzik, here. The limitation of Cantor pairing function (relatively) is that the range of encoded results doesn\'t always stay within the limits of a 2N bit integer if the inputs are two N bit integers. That is, if my inputs are two 16 bit integers ranging from 0 to 2^16 -1, then there are 2^16 * (2^16 -1) combinations of inputs possible, so by the obvious Pigeonhole Principle, we need an output of size at least 2^16 * (2^16 -1), which is equal to 2^32 - 2^16, or in other words, a map of 32 bit numbers should be feasible ideally. This may not be of little practical importance in programming world.

Cantor pairing function:

(a + b) * (a + b + 1) / 2 + a; where a, b >= 0

The mapping for two maximum most 16 bit integers (65535, 65535) will be 8589803520 which as you see cannot be fit into 32 bits.

Enter Szudzik\'s function:

a >= b ? a * a + a + b : a + b * b;  where a, b >= 0

The mapping for (65535, 65535) will now be 4294967295 which as you see is a 32 bit (0 to 2^32 -1) integer. This is where this solution is ideal, it simply utilizes every single point in that space, so nothing can get more space efficient.


Now considering the fact that we typically deal with the signed implementations of numbers of various sizes in languages/frameworks, let\'s consider signed 16 bit integers ranging from -(2^15) to 2^15 -1 (later we\'ll see how to extend even the ouput to span over signed range). Since a and b have to be positive they range from 0 to 2^15 - 1.

Cantor pairing function:

The mapping for two maximum most 16 bit signed integers (32767, 32767) will be 2147418112 which is just short of maximum value for signed 32 bit integer.

Now Szudzik\'s function:

(32767, 32767) => 1073741823, much smaller..

Let\'s account for negative integers. That\'s beyond the original question I know, but just elaborating to help future visitors.

Cantor pairing function:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
(A + B) * (A + B + 1) / 2 + A;

(-32768, -32768) => 8589803520 which is Int64. 64 bit output for 16 bit inputs may be so unpardonable!!

Szudzik\'s function:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;

(-32768, -32768) => 4294967295 which is 32 bit for unsigned range or 64 bit for signed range, but still better.

Now all this while the output has always been positive. In signed world, it will be even more space saving if we could transfer half the output to negative axis. You could do it like this for Szudzik\'s:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
C = (A >= B ? A * A + A + B : A + B * B) / 2;
a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;

(-32768, 32767) => -2147483648

(32767, -32768) => -2147450880

(0, 0) => 0 

(32767, 32767) => 2147418112

(-32768, -32768) => 2147483647

What I do: After applying a weight of 2 to the the inputs and going through the function, I then divide the ouput by two and take some of them to negative axis by multiplying by -1.

See the results, for any input in the range of a signed 16 bit number, the output lies within the limits of a signed 32 bit integer which is cool. I\'m not sure how to go about the same way for Cantor pairing function but didn\'t try as much as its not as efficient. Furthermore, more calculations involved in Cantor pairing function means its slower too.

Here is a C# implementation.

public static long PerfectlyHashThem(int a, int b)
{
    var A = (ulong)(a >= 0 ? 2 * (long)a : -2 * (long)a - 1);
    var B = (ulong)(b >= 0 ? 2 * (long)b : -2 * (long)b - 1);
    var C = (long)((A >= B ? A * A + A + B : A + B * B) / 2);
    return a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;
}

public static int PerfectlyHashThem(short a, short b)
{
    var A = (uint)(a >= 0 ? 2 * a : -2 * a - 1);
    var B = (uint)(b >= 0 ? 2 * b : -2 * b - 1);
    var C = (int)((A >= B ? A * A + A + B : A + B * B) / 2);
    return a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;
}

Since the intermediate calculations can exceed limits of 2N signed integer, I have used 4N integer type (the last division by 2 brings back the result to 2N).

The link I have provided on alternate solution nicely depicts a graph of the function utilizing every single point in space. Its amazing to see that you could uniquely encode a pair of coordinates to a single number reversibly! Magic world of numbers!!



回答3:

If A and B can be expressed with 2 bytes, you can combine them on 4 bytes. Put A on the most significant half and B on the least significant half.

In C language this gives (assuming sizeof(short)=2 and sizeof(int)=4):

int combine(short A, short B)
{
    return A<<16 | B;
}

short getA(int C)
{
    return C>>16;
}

short getB(int C)
{
    return C & 0xFFFF;
}


回答4:

Is this even possible?
You are combining two integers. They both have the range -2,147,483,648 to 2,147,483,647 but you will only take the positives. That makes 2147483647^2 = 4,61169E+18 combinations. Since each combination has to be unique AND result in an integer, you\'ll need some kind of magical integer that can contain this amount of numbers.

Or is my logic flawed?



回答5:

Let number a be the first, b the second. Let p be the a+1-th prime number, q be the b+1-th prime number

Then, the result is pq, if a<b, or 2pq if a>b. If a=b, let it be p^2.



回答6:

The standard mathematical way for positive integers is to use the uniqueness of prime factorization.

f( x, y ) -> 2^x * 3^y

The downside is that the image tends to span quite a large range of integers so when it comes to expressing the mapping in a computer algorithm you may have issues with choosing an appropriate type for the result.

You could modify this to deal with negative x and y by encoding a flags with powers of 5 and 7 terms.

e.g.

f( x, y ) -> 2^|x| * 3^|y| * 5^(x<0) * 7^(y<0)


回答7:

It isn\'t that tough to construct a mapping:

   1  2  3  4  5  use this mapping if (a,b) != (b,a)
1  0  1  3  6 10
2  2  4  7 11 16
3  5  8 12 17 23
4  9 13 18 24 31
5 14 19 25 32 40

   1  2  3  4  5 use this mapping if (a,b) == (b,a) (mirror)
1  0  1  2  4  6
2  1  3  5  7 10
3  2  5  8 11 14
4  4  8 11 15 19
5  6 10 14 19 24


    0  1 -1  2 -2 use this if you need negative/positive
 0  0  1  2  4  6
 1  1  3  5  7 10
-1  2  5  8 11 14
 2  4  8 11 15 19
-2  6 10 14 19 24

Figuring out how to get the value for an arbitrary a,b is a little more difficult.



回答8:

f(a, b) = s(a+b) + a, where s(n) = n*(n+1)/2

  • This is a function -- it is deterministic.
  • It is also injective -- f maps different values for different (a,b) pairs. You can prove this using the fact: s(a+b+1)-s(a+b) = a+b+1 < a.
  • It returns quite small values -- good if your are going to use it for array indexing, as the array does not have to be big.
  • It is cache-friendly -- if two (a, b) pairs are close to each other, then f maps numbers to them which are close to each other (compared to other methods).

I did not understand what You mean by:

should always yield an integer on either the positive or the negative side of integers

How can I write (greater than), (less than) characters in this forum?



回答9:

For positive integers as arguments and where argument order doesn\'t matter:

  1. Here\'s an unordered pairing function:

    <x, y> = x * y + trunc((|x - y| - 1)^2 / 4) = <y, x>
    
  2. For x ≠ y, here\'s a unique unordered pairing function:

    <x, y> = if x < y:
               x * (y - 1) + trunc((y - x - 2)^2 / 4)
             if x > y:
               (x - 1) * y + trunc((x - y - 2)^2 / 4)
           = <y, x>
    


回答10:

Although Stephan202\'s answer is the only truly general one, for integers in a bounded range you can do better. For example, if your range is 0..10,000, then you can do:

#define RANGE_MIN 0
#define RANGE_MAX 10000

unsigned int merge(unsigned int x, unsigned int y)
{
    return (x * (RANGE_MAX - RANGE_MIN + 1)) + y;
}

void split(unsigned int v, unsigned int &x, unsigned int &y)
{
    x = RANGE_MIN + (v / (RANGE_MAX - RANGE_MIN + 1));
    y = RANGE_MIN + (v % (RANGE_MAX - RANGE_MIN + 1));
}

Results can fit in a single integer for a range up to the square root of the integer type\'s cardinality. This packs slightly more efficiently than Stephan202\'s more general method. It is also considerably simpler to decode; requiring no square roots, for starters :)



回答11:

Check this: http://en.wikipedia.org/wiki/Pigeonhole_principle. If A, B and C are of same type, it cannot be done. If A and B are 16-bit integers, and C is 32-bit, then you can simply use shifting.

The very nature of hashing algorithms is that they cannot provide a unique hash for each different input.



回答12:

Here is an extension of @DoctorJ \'s code to unbounded integers based on the method given by @nawfal. It can encode and decode. It works with normal arrays and numpy arrays.

#!/usr/bin/env python
from numbers import Integral    

def tuple_to_int(tup):
    \"\"\":Return: the unique non-negative integer encoding of a tuple of non-negative integers.\"\"\"
    if len(tup) == 0:  # normally do if not tup, but doesn\'t work with np
        raise ValueError(\'Cannot encode empty tuple\')
    if len(tup) == 1:
        x = tup[0]
        if not isinstance(x, Integral):
            raise ValueError(\'Can only encode integers\')
        return x
    elif len(tup) == 2:
        # print(\"len=2\")
        x, y = tuple_to_int(tup[0:1]), tuple_to_int(tup[1:2])  # Just to validate x and y

        X = 2 * x if x >= 0 else -2 * x - 1  # map x to positive integers
        Y = 2 * y if y >= 0 else -2 * y - 1  # map y to positive integers
        Z = (X * X + X + Y) if X >= Y else (X + Y * Y)  # encode

        # Map evens onto positives
        if (x >= 0 and y >= 0):
            return Z // 2
        elif (x < 0 and y >= 0 and X >= Y):
            return Z // 2
        elif (x < 0 and y < 0 and X < Y):
            return Z // 2
        # Map odds onto negative
        else:
            return (-Z - 1) // 2
    else:
        return tuple_to_int((tuple_to_int(tup[:2]),) + tuple(tup[2:]))  # ***speed up tuple(tup[2:])?***


def int_to_tuple(num, size=2):
    \"\"\":Return: the unique tuple of length `size` that encodes to `num`.\"\"\"
    if not isinstance(num, Integral):
        raise ValueError(\'Can only encode integers (got {})\'.format(num))
    if not isinstance(size, Integral) or size < 1:
        raise ValueError(\'Tuple is the wrong size ({})\'.format(size))
    if size == 1:
        return (num,)
    elif size == 2:

        # Mapping onto positive integers
        Z = -2 * num - 1 if num < 0 else 2 * num

        # Reversing Pairing
        s = isqrt(Z)
        if Z - s * s < s:
            X, Y = Z - s * s, s
        else:
            X, Y = s, Z - s * s - s

        # Undoing mappint to positive integers
        x = (X + 1) // -2 if X % 2 else X // 2  # True if X not divisible by 2
        y = (Y + 1) // -2 if Y % 2 else Y // 2  # True if Y not divisible by 2

        return x, y

    else:
        x, y = int_to_tuple(num, 2)
        return int_to_tuple(x, size - 1) + (y,)


def isqrt(n):
    \"\"\"\":Return: the largest integer x for which x * x does not exceed n.\"\"\"
    # Newton\'s method, via http://stackoverflow.com/a/15391420
    x = n
    y = (x + 1) // 2
    while y < x:
        x = y
        y = (x + n // x) // 2
    return x


回答13:

What you suggest is impossible. You will always have collisions.

In order to map two objects to another single set, the mapped set must have a minimum size of the number of combinations expected:

Assuming a 32-bit integer, you have 2147483647 positive integers. Choosing two of these where order doesn\'t matter and with repetition yields 2305843008139952128 combinations. This does not fit nicely in the set of 32-bit integers.

You can, however fit this mapping in 61 bits. Using a 64-bit integer is probably easiest. Set the high word to the smaller integer and the low word to the larger one.



回答14:

How about something much simpler: Given two numbers, A and B let str be the concatenation: \'A\' + \';\' + \'B\'. Then let the output be hash(str). I know that this is not a mathematical answer, but a simple python (which has an in built hash function) script should do the job.



回答15:

let us have two number B and C , encoding them into single number A

A = B + C * N

where

B=A % N = B

C=A / N = C