可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to compute 8-character short unique random filenames for, let's say, thousands of files without probable name collision. Is this method safe enough?
base64.urlsafe_b64encode(hashlib.md5(os.urandom(128)).digest())[:8]
Edit
To be clearer, I am trying to achieve simplest possible obfuscation of filenames being uploaded to a storage.
I figured out that 8-character string, random enough, would be very efficient and simple way to store tens of thousands of files without probable collision, when implemented right. I don't need guaranteed uniqueness, only high-enough improbability of name collision (talking about only thousands of names).
Files are being stored in concurrent environment, so incrementing shared counter is achievable, but complicated. Storing counter in database would be inefficient.
I am also facing the fact that random() under some circumstances returns same pseudorandom sequences in different processes.
回答1:
Is there a reason you can't use tempfile
to generate the names?
Functions like mkstemp
and NamedTemporaryFile
are absolutely guaranteed to give you unique names; nothing based on random bytes is going to give you that.
If for some reason you don't actually want the file created yet (e.g., you're generating filenames to be used on some remote server or something), you can't be perfectly safe, but mktemp
is still safer than random names.
Or just keep a 48-bit counter stored in some "global enough" location, so you guarantee going through the full cycle of names before a collision, and you also guarantee knowing when a collision is going to happen.
They're all safer, and simpler, and much more efficient than reading urandom
and doing an md5
.
If you really do want to generate random names, ''.join(random.choice(my_charset) for _ in range(8))
is also going to be simpler than what you're doing, and more efficient. Even urlsafe_b64encode(os.urandom(6))
is just as random as the MD5 hash, and simpler and more efficient.
The only benefit of the cryptographic randomness and/or cryptographic hash function is in avoiding predictability. If that's not an issue for you, why pay for it? And if you do need to avoid predictability, you almost certainly need to avoid races and other much simpler attacks, so avoiding mkstemp
or NamedTemporaryFile
is a very bad idea.
Not to mention that, as Root points out in a comment, if you need security, MD5 doesn't actually provide it.
回答2:
Your current method should be safe enough, but you could also take a look into the uuid
module. e.g.
import uuid
print str(uuid.uuid4())[:8]
Output:
ef21b9ad
回答3:
You can try this
import random
uid_chars = ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w', 'x', 'y', 'z','1','2','3','4','5','6','7','8','9','0')
uid_length=8
def short_uid():
count=len(uid_chars)-1
c=''
for i in range(0,uid_length):
c+=uid_chars[random.randint(0,count)]
return c
eg:
print short_uid()
nogbomcv
回答4:
I am using hashids to convert a timestamp into a unique id. (You can even convert it back to a timestamp if you want).
The drawback with this is if you create ids too fast, you will get a duplicate. But, if you are generating them with time in-between, then this is an option.
Here is an example:
from hashids import Hashids
from datetime import datetime
hashids = Hashids(salt = "lorem ipsum dolor sit amet", alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890")
print(hashids.encode(int(datetime.today().timestamp()))) #'QJW60PJ1' when I ran it