Generating Random Strings

2019-02-11 20:08发布

问题:

I want to generate random strings in the following way: ABCDE1234E, i.e each string contains 5 Characters, 4 Numerics, then 1 Char.

I figured out a way to create this using the following code.

library(random)
string_5 <- as.vector(randomStrings(n=5000, len=5, digits=FALSE, upperalpha=TRUE,
                        loweralpha=FALSE, unique=TRUE, check=TRUE))
number_4 <- as.vector(randomNumbers(n=5000, min=1111, max=9999, col=5, base=10, check=TRUE))
string_1 <- as.vector(randomStrings(n=5000, len=1, digits=FALSE, upperalpha=TRUE,
                         loweralpha=FALSE, unique=FALSE, check=TRUE))
PAN.Number <- paste(string_5,number_4,string_1,sep = "")

But these functions are taking a long time and the random library needs a network connection.

> system.time(string_5 <- as.vector(randomStrings(n=5000, len=5, digits=FALSE, upperalpha=TRUE,
+                                                 loweralpha=FALSE, unique=TRUE, check=TRUE)))
   user  system elapsed 
   0.07    0.00    3.18 

Is there any method that I could try to reduce the execution time? I also tried using sample() but I couldn't figure it out.

回答1:

Using "stringi" as suggested by @akrun will be faster, but the following is also very fast and does not require any additional packages:

myFun <- function(n = 5000) {
  a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
  paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}

Example output:

myFun(10)
##  [1] "BZHOF3737P" "EPOWI0674X" "YYWEB2825M" "HQIXJ5187K" "IYIMB2578R"
##  [6] "YSGBG6609I" "OBLBL6409Q" "PUMAL5632D" "ABRAT4481L" "FNVEN7870Q"


回答2:

We can use stri_rand_strings from stringi

library(stringi)
sprintf("%s%s%s", stri_rand_strings(5, 5, '[A-Z]'),
      stri_rand_strings(5, 4, '[0-9]'), stri_rand_strings(5, 1, '[A-Z]'))

Or more compactly

do.call(paste0, Map(stri_rand_strings, n=5, length=c(5, 4, 1),
            pattern = c('[A-Z]', '[0-9]', '[A-Z]')))

Benchmarks

system.time({
    do.call(paste0, Map(stri_rand_strings, n=5000, length=c(5, 4, 1),
            pattern = c('[A-Z]', '[0-9]', '[A-Z]')))
    })
#  user  system elapsed 
#   0      0      0

Was able to reproduce the timings even for one part of the expected output using OP's method

system.time(string_5 <- as.vector(randomStrings(n=5000, len=5, digits=FALSE, upperalpha=TRUE,
                                              loweralpha=FALSE, unique=TRUE, check=TRUE)))
#  user  system elapsed 
#   0.86    0.24    5.52 


回答3:

You can directly perform what you want: Sample random 5 capital letters Sample 4 digits Sample 1 random capital letter

digits = 0:9
createRandString<- function() {
  v = c(sample(LETTERS, 5, replace = TRUE),
        sample(digits, 4, replace = TRUE),
        sample(LETTERS, 1, replace = TRUE))
  return(paste0(v,collapse = ""))
}

This will be more easily controlled, and won't take as long.



回答4:

Your performance problem comes from using the random package in the first place: it's understandable that you could find the random::randomStrings() function in an internet search and think it's a good way to generate random strings for use in a program, but the random package is not intended for general-purpose programming. It works by querying the RANDOM.ORG server, which is intrinsically slower than R's built-in pseudo-random number generators.

From one of the vignettes from the random package:

There are a number of situations in which it is desirable to use non-deterministically determined random numbers. Examples include
- to seed distributed computing on different nodes with truly indepedent seeds;
- to obtain portable initializations for RNGs that do not depend on particular operating system or hardware features;
- to validate simulation results using non-deterministic random numbers;
- to provide indeterministic seeds used for lottery drawings or games ...

Note that most of these examples are about seeding or initializing (these are synonyms) R's built-in pseudo-random number generators, rather than replacing them ...



标签: r random