Everything is in the title. I'm wondering if any one knows a quick and with reasonable memory demands way of randomly mixing all the lines of a 3 million lines file. I guess it is not possible with a simple vim command, so any simple script using Python. I tried with python by using a random number generator, but did not manage to find a simple way out.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
If you do not want to load everything into memory and sort it there, you have to store the lines on disk while doing random sorting. That will be very slow.
Here is a very simple, stupid and slow version. Note that this may take a surprising amount of diskspace, and it will be very slow. I ran it with 300.000 lines, and it takes several minutes. 3 million lines could very well take an hour. So: Do it in memory. Really. It's not that big.
Another version would be to store the files in an SQLite database and pull the lines randomly from that database. That is probably going to be faster than this.
That should do it. 3 million lines will fit into most machine's memory unless the lines are HUGE (over 512 characters).
I just tried this on a file with 4.3M of lines and fastest thing was 'shuf' command on Linux. Use it like this:
It took 2-3 seconds to finish.