Giving unique IDs to all nodes?

2019-03-01 09:10发布

问题:

I am making a class in Python that relates a lot of nodes and edges together. I also have other operations that can take two separate objects and merge them into a single object of the same type, and so on.

However, I need a way to give every node a unique ID for easy lookup. Is there a "proper way" to do this, or do I just have to keep an external ID variable that I increment and pass into my class methods every time I add more nodes to any object?

I also considered generating a random string for each node upon creation, but there is still a risk of collision error (even if this probability is near-zero, it still exists and seems like a design flaw, if not a longwinded overengineered way of going about it anyway).

回答1:

You could keep a class variable and use it for ordinal ids:

class Node(object):
    _id = 0

    def __init__(self):
        self._id = Node._id
        Node._id += 1

It also has the benefit that your class will be able to know how many objects were altogether created.

This is also way cheaper than random ids.



回答2:

If you just need a unique identifier, the built-in Python id() function would do it:

Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.



回答3:

Pretty much both of your solutions are what is done in practice.

Your first solution is to just increment a number will give you uniqueness, as long as you don't overflow (with python bigintegers this isnt really a problem). The disadvantage of this approach is if you start doing concurrency you have to make sure you use locking to prevent data races when increment and reading your external value.

The other approach where you generate a random number works well in the concurrency situation. The larger number of bits you use, the less likely it is you will run into a collision. In fact you can pretty much guarantee that you won't have collisions if you use say 128-bits for your id.

An approach you can use to further guarantee you don't have collisions, is to make your unique ids something like TIMESTAMP_HASHEDMACHINENAME_PROCESSID/THREADID_UNIQUEID. Then pretty much can't have collisions unless you generate two of the same UNIQUEID on the same process/thread within 1 second. MongoDB does something like this where they just increment the UNIQUEID. I am not sure what they do in the case of an overflow (which I assume doesn't happen too often in practice). One solution might be just to wait till the next second before generating more ids.

This is probably overkill for what you are trying to do, but it is a somewhat interesting problem indeed.



回答4:

UUID is good for this sort of thing.

>>> from uuid import uuid4
>>> uuid4().hex
'461dd72c63db4ae9a969978daadc59f0'

Universally Unique ID's have very low collision rate -- unless you are creating billions of nodes, it should do the trick.