I want a random string of characters only (uppercase or lowercase), no numbers, in Go. What is the fastest and simplest way to do this?
相关问题
- how to split a list into a given number of sub-lis
- Generate string from integer with arbitrary base i
- Golang mongodb aggregation
- Unity - Get Random Color at Spawning
- Converting a string array to a byte array
相关文章
- Can I run a single test in a suite?
- JSP String formatting Truncate
- Selecting only the first few characters in a strin
- How to check if a request was cancelled
- why 48 bit seed in util Random class?
- Is it possible to implement an interface with unex
- Python: print in two columns
- How to access value of first index of array in Go
You can just write code for it. This code can be a little simpler if you want to rely on the letters all being single bytes when encoded in UTF-8.
Two possible options (there might be more of course):
You can use the
crypto/rand
package that supports reading random byte arrays (from /dev/urandom) and is geared towards cryptographic random generation. see http://golang.org/pkg/crypto/rand/#example_Read . It might be slower than normal pseudo-random number generation though.Take a random number and hash it using md5 or something like this.
Following
icza's
wonderfully explained solution, here is a modification of it that usescrypto/rand
instead ofmath/rand
.If you want a more generic solution, that allows you to pass in the slice of character bytes to create the string out of, you can try using this:
If you want to pass in your own source of randomness, it would be trivial to modify the above to accept an
io.Reader
instead of usingcrypto/rand
.Use package uniuri, which generates cryptographically secure uniform (unbiased) strings.
If you are willing to add a few characters to your pool of allowed characters, you can make the code work with anything which provides random bytes through a io.Reader. Here we are using
crypto/rand
.Paul's solution provides a simple, general solution.
The question asks for the "the fastest and simplest way". Let's address this. We'll arrive at our final, fastest code in an iterative manner. Benchmarking each iteration can be found at the end of the answer.
All the solutions and the benchmarking code can be found on the Go Playground. The code on the Playground is a test file, not an executable. You have to save it into a file named
XX_test.go
and run it withgo test -bench .
.I. Improvements
1. Genesis (Runes)
As a reminder, the original, general solution we're improving is this:
2. Bytes
If the characters to choose from and assemble the random string contains only the uppercase and lowercase letters of the English alphabet, we can work with bytes only because the English alphabet letters map to bytes 1-to-1 in the UTF-8 encoding (which is how Go stores strings).
So instead of:
we can use:
Or even better:
Now this is already a big improvement: we could achieve it to be a
const
(there arestring
constants but there are no slice constants). As an extra gain, the expressionlen(letters)
will also be aconst
! (The expressionlen(s)
is constant ifs
is a string constant.)And at what cost? Nothing at all.
string
s can be indexed which indexes its bytes, perfect, exactly what we want.Our next destination looks like this:
3. Remainder
Previous solutions get a random number to designate a random letter by calling
rand.Intn()
which delegates toRand.Intn()
which delegates toRand.Int31n()
.This is much slower compared to
rand.Int63()
which produces a random number with 63 random bits.So we could simply call
rand.Int63()
and use the remainder after dividing bylen(letterBytes)
:This works and is significantly faster, the disadvantage is that the probability of all the letters will not be exactly the same (assuming
rand.Int63()
produces all 63-bit numbers with equal probability). Although the distortion is extremely small as the number of letters52
is much-much smaller than1<<63 - 1
, so in practice this is perfectly fine.To make this understand easier: let's say you want a random number in the range of
0..5
. Using 3 random bits, this would produce the numbers0..1
with double probability than from the range2..5
. Using 5 random bits, numbers in range0..1
would occur with6/32
probability and numbers in range2..5
with5/32
probability which is now closer to the desired. Increasing the number of bits makes this less significant, when reaching 63 bits, it is negligible.4. Masking
Building on the previous solution, we can maintain the equal distribution of letters by using only as many of the lowest bits of the random number as many is required to represent the number of letters. So for example if we have 52 letters, it requires 6 bits to represent it:
52 = 110100b
. So we will only use the lowest 6 bits of the number returned byrand.Int63()
. And to maintain equal distribution of letters, we only "accept" the number if it falls in the range0..len(letterBytes)-1
. If the lowest bits are greater, we discard it and query a new random number.Note that the chance of the lowest bits to be greater than or equal to
len(letterBytes)
is less than0.5
in general (0.25
on average), which means that even if this would be the case, repeating this "rare" case decreases the chance of not finding a good number. Aftern
repetition, the chance that we sill don't have a good index is much less thanpow(0.5, n)
, and this is just an upper estimation. In case of 52 letters the chance that the 6 lowest bits are not good is only(64-52)/64 = 0.19
; which means for example that chances to not have a good number after 10 repetition is1e-8
.So here is the solution:
5. Masking Improved
The previous solution only uses the lowest 6 bits of the 63 random bits returned by
rand.Int63()
. This is a waste as getting the random bits is the slowest part of our algorithm.If we have 52 letters, that means 6 bits code a letter index. So 63 random bits can designate
63/6 = 10
different letter indices. Let's use all those 10:6. Source
The Masking Improved is pretty good, not much we can improve on it. We could, but not worth the complexity.
Now let's find something else to improve. The source of random numbers.
There is a
crypto/rand
package which provides aRead(b []byte)
function, so we could use that to get as many bytes with a single call as many we need. This wouldn't help in terms of performance ascrypto/rand
implements a cryptographically secure pseudorandom number generator so it's much slower.So let's stick to the
math/rand
package. Therand.Rand
uses arand.Source
as the source of random bits.rand.Source
is an interface which specifies aInt63() int64
method: exactly and the only thing we needed and used in our latest solution.So we don't really need a
rand.Rand
(either explicit or the global, shared one of therand
package), arand.Source
is perfectly enough for us:Also note that this last solution doesn't require you to initialize (seed) the global
Rand
of themath/rand
package as that is not used (and ourrand.Source
is properly initialized / seeded).One more thing to note here: package doc of
math/rand
states:So the default source is slower than a
Source
that may be obtained byrand.NewSource()
, because the default source has to provide safety under concurrent access / use, whilerand.NewSource()
does not offer this (and thus theSource
returned by it is more likely to be faster).(7. Using
rand.Read()
)Go 1.7 added a
rand.Read()
function and aRand.Read()
method. We should be tempted to use these to read as many bytes as we need in one step, in order to achieve better performance.There is one small "problem" with this: how many bytes do we need? We could say: as many as the number of output letters. We would think this is an upper estimation, as a letter index uses less than 8 bits (1 byte). But at this point we are already doing worse (as getting the random bits is the "hard part"), and we're getting more than needed.
Also note that to maintain equal distribution of all letter indices, there might be some "garbage" random data that we won't be able to use, so we would end up skipping some data, and thus end up short when we go through all the byte slice. We would need to further get more random bytes, "recursively". And now we're even losing the "single call to
rand
package" advantage...We could "somewhat" optimize the usage of the random data we acquire from
math.Rand()
. We may estimate how many bytes (bits) we'll need. 1 letter requiresletterIdxBits
bits, and we needn
letters, so we needn * letterIdxBits / 8.0
bytes rounding up. We can calculate the probability of a random index not being usable (see above), so we could request more that will "more likely" be enough (if it turns out it's not, we repeat the process). We can process the byte slice as a "bit stream" for example, for which we have a nice 3rd party lib:github.com/icza/bitio
(disclosure: I'm the author).But Benchmark code still shows we're not winning. Why is it so?
The answer to the last question is because
rand.Read()
uses a loop and keeps callingSource.Int63()
until it fills the passed slice. Exactly what theRandStringBytesMaskImprSrc()
solution does, without the intermediate buffer, and without the added complexity. That's whyRandStringBytesMaskImprSrc()
remains on the throne. Yes,RandStringBytesMaskImprSrc()
uses an unsynchronizedrand.Source
unlikerand.Read()
. But the reasoning still applies; and which is proven if we useRand.Read()
instead ofrand.Read()
(the former is also unsynchronzed).II. Benchmark
All right, let's benchmark the different solutions.
Just by switching from runes to bytes, we immediately have 22% performance gain.
Getting rid of
rand.Intn()
and usingrand.Int63()
instead gives another 24% boost.Masking (and repeating in case of big indices) slows down a little (due to repetition calls): -20%...
But when we make use of all (or most) of the 63 random bits (10 indices from one
rand.Int63()
call): that speeds up 3.4 times.And finally if we settle with a (non-default, new)
rand.Source
instead ofrand.Rand
, we again gain 23%.Comparing the final to the initial solution:
RandStringBytesMaskImprSrc()
is 5.6 times faster thanRandStringRunes()
.