How can I shuffle the lines of a text file on the-第2页回答

2楼-- · 2019-01-03 04:50

This bash function has the minimal dependency(only sort and bash):

shuf() {
while read -r x;do
    echo $RANDOM$'\x1f'$x
done | sort |
while IFS=$'\x1f' read -r x y;do
    echo $y
done
}

0人赞添加讨论(0) 举报

走好不送

3楼-- · 2019-01-03 04:53

I use a tiny perl script, which I call "unsort":

#!/usr/bin/perl
use List::Util 'shuffle';
@list = <STDIN>;
print shuffle(@list);

I've also got a NULL-delimited version, called "unsort0" ... handy for use with find -print0 and so on.

PS: Voted up 'shuf' too, I had no idea that was there in coreutils these days ... the above may still be useful if your systems doesn't have 'shuf'.

0人赞添加讨论(0) 举报

仙女界的扛把子

4楼-- · 2019-01-03 04:54

This is a python script that I saved as rand.py in my home folder:

#!/bin/python

import sys
import random

if __name__ == '__main__':
  with open(sys.argv[1], 'r') as f:
    flist = f.readlines()
    random.shuffle(flist)

    for line in flist:
      print line.strip()

On Mac OSX sort -R and shuf are not available so you can alias this in your bash_profile as:

alias shuf='python rand.py'

0人赞添加讨论(0) 举报

太酷不给撩

5楼-- · 2019-01-03 04:55

Not mentioned as of yet:

The unsort util. Syntax (somewhat playlist oriented):

unsort [-hvrpncmMsz0l] [--help] [--version] [--random] [--heuristic]
       [--identity] [--filenames[=profile]] [--separator sep] [--concatenate] 
       [--merge] [--merge-random] [--seed integer] [--zero-terminated] [--null] 
       [--linefeed] [file ...]

msort can shuffle by line, but it's usually overkill:
```
seq 10 | msort -jq -b -l -n 1 -c r
```

0人赞添加讨论(0) 举报

The star\"

6楼-- · 2019-01-03 04:57

here's an awk script

awk 'BEGIN{srand() }
{ lines[++d]=$0 }
END{
    while (1){
    if (e==d) {break}
        RANDOM = int(1 + rand() * d)
        if ( RANDOM in lines  ){
            print lines[RANDOM]
            delete lines[RANDOM]
            ++e
        }
    }
}' file

output

$ cat file
1
2
3
4
5
6
7
8
9
10

$ ./shell.sh
7
5
10
9
6
8
2
1
3
4

0人赞添加讨论(0) 举报

Evening l夕情丶

7楼-- · 2019-01-03 04:58

This answer complements the many great existing answers in the following ways:

The existing answers are packaged into flexible shell functions:
- The functions take not only stdin input, but alternatively also filename arguments
- The functions take extra steps to handle SIGPIPE in the usual way (quiet termination with exit code 141), as opposed to breaking noisily. This is important when piping the function output to a pipe that is closed early, such as when piping to head.
A performance comparison is made.

POSIX-compliant function based on awk, sort, and cut, adapted from the OP's own answer:

shuf() { awk 'BEGIN {srand(); OFMT="%.17f"} {print rand(), $0}' "$@" |
               sort -k1,1n | cut -d ' ' -f2-; }

Perl-based function - adapted from Moonyoung Kang's answer:

shuf() { perl -MList::Util=shuffle -e 'print shuffle(<>);' "$@"; }

Python-based function, adapted from scai's answer:

shuf() { python -c '
import sys, random, fileinput; from signal import signal, SIGPIPE, SIG_DFL;    
signal(SIGPIPE, SIG_DFL); lines=[line for line in fileinput.input()];   
random.shuffle(lines); sys.stdout.write("".join(lines))
' "$@"; }

Ruby-based function, adapted from hoffmanc's answer:

shuf() { ruby -e 'Signal.trap("SIGPIPE", "SYSTEM_DEFAULT");
                     puts ARGF.readlines.shuffle' "$@"; }

Performance comparison:

Note: These numbers were obtained on a late-2012 iMac with 3.2 GHz Intel Core i5 and a Fusion Drive, running OSX 10.10.3. While timings will vary with OS used, machine specs, awk implementation used (e.g., the BSD awk version used on OSX is usually slower than GNU awk and especially mawk), this should provide a general sense of relative performance.

Input file is a 1-million-lines file produced with seq -f 'line %.0f' 1000000.
Times are listed in ascending order (fastest first):

shuf
- 0.090s
Ruby 2.0.0
- 0.289s
Perl 5.18.2
- 0.589s
Python
- 1.342s with Python 2.7.6; 2.407s(!) with Python 3.4.2
awk + sort + cut
- 3.003s with BSD awk; 2.388s with GNU awk (4.1.1); 1.811s with mawk (1.3.4);

For further comparison, the solutions not packaged as functions above:

sort -R (not a true shuffle if there are duplicate input lines)
- 10.661s - allocating more memory doesn't seem to make a difference
Scala
- 24.229s
bash loops + sort
- 32.593s

Conclusions:

Use shuf, if you can - it's the fastest by far.
Ruby does well, followed by Perl.
Python is noticeably slower than Ruby and Perl, and, comparing Python versions, 2.7.6 is quite a bit faster than 3.4.1
Use the POSIX-compliant awk + sort + cut combo as a last resort; which awk implementation you use matters (mawk is faster than GNU awk, BSD awk is slowest).
Stay away from sort -R, bash loops, and Scala.

0人赞添加讨论(0) 举报

How can I shuffle the lines of a text file on the

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间