Improve Powershell Performance to Generate a Rando

2019-02-16 01:34发布

问题:

I'd like to user Powershell to create a random text file for use in basic system testing (upload, download, checksum, etc). I've used the following articles and come up with my own code snippet to create a random text file but the performance is terrible.

  • Generating Random Files in Windows (stackoverflow.com)
  • PowerShell – Creating Dummy files (verboon.info)
  • Create large files with Powershell (chris-nullpayload.rhcloud.com based on verboon code above)

Here is my code sample that takes approximately 227 seconds to generate a 1MB random text file on a modern Windows 7 Dell laptop. Run time was determined using the Measure-Command cmdlet. I repeated the test several times during different system load with similar long runtime results.

# select characters from 0-9, A-Z, and a-z
$chars = [char[]] ([char]'0'..[char]'9' + [char]'A'..[char]'Z' + [char]'a'..[char]'z')
# write file using 128 byte lines each with 126 random characters
1..(1mb/128) | %{-join (1..126 | %{get-random -InputObject $chars }) } `
  | out-file test.txt -Encoding ASCII

I am looking for answers that discuss why this code has poor performance and suggestions for simple changes I can make to improve the runtime for generating a similar random text file (ASCII text lines of 126 random alphanumeric characters - 128 bytes with "\r\n" EOL, output file an even number of megabytes such as the above 1MB sample). I would like file output to be written in pieces (one or more lines at a time) so that we never need a string the size of the output file stored in memory.

回答1:

Agree with @dugas that the bottleneck is calling Get-Random for every character.

You should be able to achieve nearly the same randomness if you increase your character array set, and use the -count property of Get-Random.

If you have V4, the .foreach method is considerably faster than foreach-object.

Also traded Out-File for Add-Content, which should also help.

# select characters from 0-9, A-Z, and a-z
$chars = [char[]] ([char]'0'..[char]'9' + [char]'A'..[char]'Z' + [char]'a'..[char]'z')
$chars = $chars * 126
# write file using 128 byte lines each with 126 random characters
(1..(1mb/128)).foreach({-join (Get-Random $chars -Count 126) | add-content testfile.txt }) 

That finished in about 32 seconds on my system.

Edit: Set-Content vs Out-File, using the generated test file:

$x = Get-Content testfile.txt

(Measure-Command {$x | out-file testfile1.txt}).totalmilliseconds
(Measure-Command {$x | Set-Content testfile1.txt}).totalmilliseconds

504.0069
159.0842


回答2:

If you are ok with punctuation you can use this:

Add-Type -AssemblyName System.Web
#get a random filename in the present working directory
$fn = [System.IO.Path]::Combine($pwd, [GUID]::NewGuid().ToString("N") + '.txt')
#set number of iterations
$count = 1mb/128
do{
  #Write the 1267 chars plus eol
  [System.Web.Security.Membership]::GeneratePassword(126,0) | Out-File $fn -Append ascii
  #decrement the counter
  $count--
}while($count -gt 0)

Which gets you to around 7 seconds. Sample Output:

0b5rc@EXV|e{kftc+1+Xn$-c%-*9q_9L}p=I=k@zrDg@HaJDcl}B(38i&m{lV@vlq%5h/a?m2X!yo]qs0=pEw:Tn4wb5F$k$O85$8F.QLvUzA{@X2-w%5(3k;BE2Qi

Using a stream writer instead of Out-File -Append avoids the open/close cycles and drops the same to 62 milliseconds.

Add-Type -AssemblyName System.Web
#get a random filename in the present working directory
$fn = [System.IO.Path]::Combine($pwd, [GUID]::NewGuid().ToString("N") + '.txt')
#set number of iterations
$count = 1mb/128
#create a filestream
$fs = New-Object System.IO.FileStream($fn,[System.IO.FileMode]::CreateNew)
#create a streamwriter
$sw = New-Object System.IO.StreamWriter($fs,[System.Text.Encoding]::ASCII,128)
do{
     #Write the 1267 chars plus eol
     $sw.WriteLine([System.Web.Security.Membership]::GeneratePassword(126,0))
     #decrement the counter
     $count--
}while($count -gt 0)
#close the streamwriter
$sw.Close()
#close the filestream
$fs.Close()

You could also use a stringbuilder, and GUIDs to generate pseudorandom numbers and lowercase.

#get a random filename in the present working directory
$fn = [System.IO.Path]::Combine($pwd, [GUID]::NewGuid().ToString("N") + '.txt')
#set number of iterations
$count = 1mb/128
#create a filestream
$fs = New-Object System.IO.FileStream($fn,[System.IO.FileMode]::CreateNew)
#create a streamwriter
$sw = New-Object System.IO.StreamWriter($fs,[System.Text.Encoding]::ASCII,128)
do{
    $sb = New-Object System.Text.StringBuilder 126,126
    0..3 | %{$sb.Append([GUID]::NewGuid().ToString("N"))} 2> $null
    $sw.WriteLine($sb.ToString())
    #decrement the counter
    $count--
}while($count -gt 0)
#close the streamwriter
$sw.Close()
#close the filestream
$fs.Close()

This takes about 4 seconds and generates the following sample:

1fef6ccabc624e4dbe13a0415764fd2c58aa873377c7465eaecabdf6ba6fdf71c55496600a374c4c8cff75be46b1fe474230231ffccc4e3aa2753391afb32c

If you are hell bent to use the same chars as in your sample you can do so with the following:

#get a random filename in the present working directory
$fn = [System.IO.Path]::Combine($pwd, [GUID]::NewGuid().ToString("N") + '.txt')
#array of valid chars
$chars = [char[]] ([char]'0'..[char]'9' + [char]'A'..[char]'Z' + [char]'a'..[char]'z')
#create a random object
$rand = New-Object System.Random
#set number of iterations
$count = 1mb/128
#get length of valid character array
$charslength = $chars.length
#create a filestream
$fs = New-Object System.IO.FileStream($fn,[System.IO.FileMode]::CreateNew)
#create a streamwriter
$sw = New-Object System.IO.StreamWriter($fs,[System.Text.Encoding]::ASCII,128)
do{
    #get 126 random chars This is the major slowdown
    $randchars = 1..126 | %{$chars[$rand.Next(0,$charslength)]}
    #Write the 1267 chars plus eol
    $sw.WriteLine([System.Text.Encoding]::ASCII.GetString($randchars))
    #decrement the counter
    $count--
}while($count -gt 0)
#close the streamwriter
$sw.Close()
#close the filestream
$fs.Close()

This takes ~27 seconds and generates the following sample:

Fev31lweOXaYKELzWOo1YJn8LpZoxonWjxQYhgZbR62EmgjHit5J1LrvqniBB7hZj4pNonIpoCZSHYLf5H63iUUN6UhtyOQKPSViqMTvbGUomPeIR36t1drEZSHJ6O

Indexing the char array and the out-file -Append opening and closing the file each time is a major slowdown.



回答3:

One of the bottlenecks is calling the get-random cmdlet in the loop. On my machine that join takes ~40ms. If you change to something like:

%{ -join ((get-random -InputObject $chars -Count 62) + (get-random -InputObject $chars -Count 62) + (get-random -InputObject $chars -Count 2)) }

it is reduced to ~1ms.



回答4:

Instead of using Get-Random to generate the text as per mjolinor suggestions, I improved the speed by using GUIDs.

Function New-RandomFile {
    Param(
        $Path = '.', 
        $FileSize = 1kb, 
        $FileName = [guid]::NewGuid().Guid + '.txt'
        ) 
    (1..($FileSize/128)).foreach({-join ([guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid -Replace "-").SubString(1, 126) }) | set-content "$Path\$FileName"
}

I've ran both versions with Measure-Command. The original code took 1.36 seconds.

This one took 491 milliseconds. Running:

New-RandomFile -FileSize 1mb

UPDATE:

I've updated my function to use a ScriptBlock, so you can replace the 'NewGuid()' method with anything you want.

In this scenario, I make 1kb chunks, since I know I'm never creating smaller files. This improved the speed of my function drastically!

Set-Content forces a NewLine at the end, which is why you need to remove 2 Characters each time you write to file. I've replaced it with [io.file]::WriteAllText() instead.

Function New-RandomFile_1kChunks {
    Param(
        $Path = (Resolve-Path '.').Path, 
        $FileSize = 1kb, 
        $FileName = [guid]::NewGuid().Guid + '.txt'
        ) 

    $Chunk = { [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid -Replace "-" }

    $Chunks = [math]::Ceiling($FileSize/1kb)

    [io.file]::WriteAllText("$Path\$FileName","$(-Join (1..($Chunks)).foreach({ $Chunk.Invoke() }))")

    Write-Warning "New-RandomFile: $Path\$FileName"

}

If you dont care that all chunks are random, you can simply Invoke() the generation of the 1kb chunk once.. this improves the speed drastically, but won't make the entire file random.

Function New-RandomFile_Fast {
    Param(
        $Path = (Resolve-Path '.').Path, 
        $FileSize = 1kb, 
        $FileName = [guid]::NewGuid().Guid + '.txt'
        ) 

    $Chunk = { [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid -Replace "-" }
    $Chunks = [math]::Ceiling($FileSize/1kb)
    $ChunkString = $Chunk.Invoke()

    [io.file]::WriteAllText("$Path\$FileName","$(-Join (1..($Chunks)).foreach({ $ChunkString }))")

    Write-Warning "New-RandomFile: $Path\$FileName"

}

Measure-Command all these changes to generate a 10mb file:

Executing New-RandomFile: 35.7688241 seconds.

Executing New-RandomFile_1kChunks: 25.1463777 seconds.

Executing New-RandomFile_Fast: 1.1626236 seconds.