I have a 7GB csv
file which I'd like to split into smaller chunks, so it is readable and faster for analysis in Python on a notebook. I would like to grab a small set from it, maybe 250MB, so how can I do this?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
I agree with @jonrsharpe readline should be able to read one line at a time even for big files.
If you are dealing with big csv files might I suggest using pandas.read_csv. I often use it for the same purpose and always find it awesome (and fast). Takes a bit of time to get used to idea of DataFrames. But once you get over that it speeds up large operations like yours massively.
Hope it helps.
See the Python docs on
file
objects (the object returned byopen(filename)
- you can choose toread
a specified number of bytes, or usereadline
to work through one line at a time.You don't need Python to split a csv file. Using your shell:
Would split
data.csv
in chunks of 100 lines.Maybe something like this?
I had to do a similar task, and used the pandas package: