I want to use unique hashes for each model rather than ids.
I implemented the following function to use it across the board easily.
import random,hashlib
from base64 import urlsafe_b64encode
def set_unique_random_value(model_object,field_name='hash_uuid',length=5,use_sha=True,urlencode=False):
while 1:
uuid_number = str(random.random())[2:]
uuid = hashlib.sha256(uuid_number).hexdigest() if use_sha else uuid_number
uuid = uuid[:length]
if urlencode:
uuid = urlsafe_b64encode(uuid)[:-1]
hash_id_dict = {field_name:uuid}
try:
model_object.__class__.objects.get(**hash_id_dict)
except model_object.__class__.DoesNotExist:
setattr(model_object,field_name,uuid)
return
I'm seeking feedback, how else could I do it? How can I improve it? What is good bad and ugly about it?
The ugly:
From the documentation:
If anything, please use os.urandom
This is how I use it in my models:
Django 1.8+ has a built-in
UUIDField
. Here's the suggested implementation, using the standard library'suuid
module, from the docs:For older django versions you can use the django-uuidfield package.
Use your database engine's UUID support instead of making up your own hash. Almost everything beyond SQLite supports them, so there's little reason to not use them.
I do not like this bit:
In the best scenario (uuid are uniformly distributed) you will get a collision with probability greater than 0.5 after 1k of elements!
It is because of the birthday problem. In a brief it is proven that the probability of collision exceeds 0.5 when number of elements is larger than square root from number of possible labels.
You have 0xFFFFF=10^6 labels (different numbers) so after a 1000 of generated values you will start having collisions.
Even if you enlarge length to -1 you have still problem here:
You will start having collisions after 3 * 10^6 (the same calculations follows).
I think your best bet is to use uuid that is more likely to be unique, here is an example
Update If you do not trust math just run the following sample to see the collision: