I want to use unique hashes for each model rather than ids.
I implemented the following function to use it across the board easily.
import random,hashlib
from base64 import urlsafe_b64encode
def set_unique_random_value(model_object,field_name='hash_uuid',length=5,use_sha=True,urlencode=False):
while 1:
uuid_number = str(random.random())[2:]
uuid = hashlib.sha256(uuid_number).hexdigest() if use_sha else uuid_number
uuid = uuid[:length]
if urlencode:
uuid = urlsafe_b64encode(uuid)[:-1]
hash_id_dict = {field_name:uuid}
try:
model_object.__class__.objects.get(**hash_id_dict)
except model_object.__class__.DoesNotExist:
setattr(model_object,field_name,uuid)
return
I'm seeking feedback, how else could I do it? How can I improve it? What is good bad and ugly about it?
I do not like this bit:
uuid = uuid[:5]
In the best scenario (uuid are uniformly distributed) you will get a collision with probability greater than 0.5 after 1k of elements!
It is because of the birthday problem. In a brief it is proven that the probability of collision exceeds 0.5 when number of elements is larger than square root from number of possible labels.
You have 0xFFFFF=10^6 labels (different numbers) so after a 1000 of generated values you will start having collisions.
Even if you enlarge length to -1 you have still problem here:
str(random.random())[2:]
You will start having collisions after 3 * 10^6 (the same calculations follows).
I think your best bet is to use uuid that is more likely to be unique, here is an example
>>> import uuid
>>> uuid.uuid1().hex
'7e0e52d0386411df81ce001b631bdd31'
Update
If you do not trust math just run the following sample to see the collision:
>>> len(set(hashlib.sha256(str(i)).hexdigest()[:5] for i in range(0,2000)))
1999 # it should obviously print 2000 if there wasn't any collision
The ugly:
import random
From the documentation:
This module implements pseudo-random number generators for various distributions.
If anything, please use os.urandom
Return a string of n random bytes suitable for cryptographic use.
This is how I use it in my models:
import os
from binascii import hexlify
def _createId():
return hexlify(os.urandom(16))
class Book(models.Model):
id_book = models.CharField(max_length=32, primary_key=True, default=_createId)
Use your database engine's UUID support instead of making up your own hash. Almost everything beyond SQLite supports them, so there's little reason to not use them.
Django 1.8+ has a built-in UUIDField
. Here's the suggested implementation, using the standard library's uuid
module, from the docs:
import uuid
from django.db import models
class MyUUIDModel(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
# other fields
For older django versions you can use the django-uuidfield package.