I'd like to user Powershell to create a random text file for use in basic system testing (upload, download, checksum, etc). I've used the following articles and come up with my own code snippet to create a random text file but the performance is terrible.
- Generating Random Files in Windows (stackoverflow.com)
- PowerShell – Creating Dummy files (verboon.info)
- Create large files with Powershell (chris-nullpayload.rhcloud.com based on verboon code above)
Here is my code sample that takes approximately 227 seconds to generate a 1MB random text file on a modern Windows 7 Dell laptop. Run time was determined using the Measure-Command cmdlet. I repeated the test several times during different system load with similar long runtime results.
# select characters from 0-9, A-Z, and a-z
$chars = [char[]] ([char]'0'..[char]'9' + [char]'A'..[char]'Z' + [char]'a'..[char]'z')
# write file using 128 byte lines each with 126 random characters
1..(1mb/128) | %{-join (1..126 | %{get-random -InputObject $chars }) } `
| out-file test.txt -Encoding ASCII
I am looking for answers that discuss why this code has poor performance and suggestions for simple changes I can make to improve the runtime for generating a similar random text file (ASCII text lines of 126 random alphanumeric characters - 128 bytes with "\r\n" EOL, output file an even number of megabytes such as the above 1MB sample). I would like file output to be written in pieces (one or more lines at a time) so that we never need a string the size of the output file stored in memory.
One of the bottlenecks is calling the get-random cmdlet in the loop. On my machine that join takes ~40ms. If you change to something like:
it is reduced to ~1ms.
If you are ok with punctuation you can use this:
Which gets you to around 7 seconds. Sample Output:
Using a stream writer instead of Out-File -Append avoids the open/close cycles and drops the same to 62 milliseconds.
You could also use a stringbuilder, and GUIDs to generate pseudorandom numbers and lowercase.
This takes about 4 seconds and generates the following sample:
If you are hell bent to use the same chars as in your sample you can do so with the following:
This takes ~27 seconds and generates the following sample:
Indexing the char array and the out-file -Append opening and closing the file each time is a major slowdown.
Agree with @dugas that the bottleneck is calling
Get-Random
for every character.You should be able to achieve nearly the same randomness if you increase your character array set, and use the -count property of
Get-Random
.If you have V4, the
.foreach
method is considerably faster thanforeach-object
.Also traded
Out-File
forAdd-Content
, which should also help.That finished in about 32 seconds on my system.
Edit: Set-Content vs Out-File, using the generated test file:
Instead of using Get-Random to generate the text as per mjolinor suggestions, I improved the speed by using GUIDs.
I've ran both versions with Measure-Command. The original code took 1.36 seconds.
This one took 491 milliseconds. Running:
UPDATE:
I've updated my function to use a ScriptBlock, so you can replace the 'NewGuid()' method with anything you want.
In this scenario, I make 1kb chunks, since I know I'm never creating smaller files. This improved the speed of my function drastically!
Set-Content forces a NewLine at the end, which is why you need to remove 2 Characters each time you write to file. I've replaced it with [io.file]::WriteAllText() instead.
If you dont care that all chunks are random, you can simply Invoke() the generation of the 1kb chunk once.. this improves the speed drastically, but won't make the entire file random.
Measure-Command all these changes to generate a 10mb file:
Executing New-RandomFile: 35.7688241 seconds.
Executing New-RandomFile_1kChunks: 25.1463777 seconds.
Executing New-RandomFile_Fast: 1.1626236 seconds.