Randomly mix lines of 3 million-line file

2019-02-04 10:58发布

Everything is in the title. I'm wondering if any one knows a quick and with reasonable memory demands way of randomly mixing all the lines of a 3 million lines file. I guess it is not possible with a simple vim command, so any simple script using Python. I tried with python by using a random number generator, but did not manage to find a simple way out.

9条回答
欢心
2楼-- · 2019-02-04 11:35

Takes only a few seconds in Python:

>>> import random
>>> lines = open('3mil.txt').readlines()
>>> random.shuffle(lines)
>>> open('3mil.txt', 'w').writelines(lines)
查看更多
贼婆χ
3楼-- · 2019-02-04 11:36

This is the same as Mr. Kugelman's, but using vim's built-in python interface:

:py import vim, random as r; cb = vim.current.buffer ; l = cb[:] ; r.shuffle(l) ; cb[:] = l
查看更多
爷、活的狠高调
4楼-- · 2019-02-04 11:37

Here is another way using random.choice, this may provide some gradual memory relieve as well, but with a worse Big-O :)

from random import choice

with open('data.txt', 'r') as r:
    lines = r.readlines()

with open('shuffled_data.txt', 'w') as w:
    while lines:
        l = choice(lines)
        lines.remove(l)
        w.write(l)
查看更多
劳资没心,怎么记你
5楼-- · 2019-02-04 11:39

The following Vimscript can be used to swap lines:

function! Random()                                                       
  let nswaps = 100                                                       
  let firstline = 1                                                     
  let lastline = 10                                                      
  let i = 0                                                              
  while i <= nswaps                                                      
    exe "let line = system('shuf -i ".firstline."-".lastline." -n 1')[:-2]"
    exe line.'d'                                                         
    exe "let line = system('shuf -i ".firstline."-".lastline." -n 1')[:-2]"
    exe "normal! " . line . 'Gp'                                         
    let i += 1                                                           
  endwhile                                                               
endfunction

Select the function in visual mode and type :@" then execute it with :call Random()

查看更多
爷、活的狠高调
6楼-- · 2019-02-04 11:45

On many systems the sort shell command takes -R to randomize its input.

查看更多
疯言疯语
7楼-- · 2019-02-04 11:46

Here's another version

At the shell, use this.

python decorate.py | sort | python undecorate.py

decorate.py

import sys
import random
for line in sys.stdin:
    sys.stdout.write( "{0}|{1}".format( random.random(), line ) )

undecorate.py

import sys
for line in sys.stdin:
    _, _, data= line.partition("|")
    sys.stdout.write( line )

Uses almost no memory.

查看更多
登录 后发表回答