What's an easy way to read random line from a file in Unix command line?
相关问题
- Is shmid returned by shmget() unique across proces
- how to get running process information in java?
- Unity - Get Random Color at Spawning
- Error building gcc 4.8.3 from source: libstdc++.so
- Why should we check WIFEXITED after wait in order
You can use
shuf
:There is also a utility called
rl
. In Debian it's in therandomize-lines
package that does exactly what you want, though not available in all distros. On its home page it actually recommends the use ofshuf
instead (which didn't exist when it was created, I believe).shuf
is part of the GNU coreutils,rl
is not.Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows:
(This works even if FILENAME is empty, in which case no line is emitted.)
One possible advantage of this approach is that it only calls rand() once.
As pointed out by @AdamKatz in the comments, another possibility would be to call rand() for each line:
(A simple proof of correctness can be given based on induction.)
Caveat about
rand()
"In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk."
-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html
Here is what I discovery since my Mac OS doesn't use all the easy answers. I used the jot command to generate a number since the $RANDOM variable solutions seems not to be very random in my test. When testing my solution I had a wide variance in the solutions provided in the output.
The echo of the variable is to get a visual of the generated random number.
perlfaq5: How do I select a random line from a file? Here's a reservoir-sampling algorithm from the Camel Book:
This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
Single bash line:
Slight problem: duplicate filename.