I have a csv import file with 33 million lines that need to be imported into my database. I can import it with a C# console app but then the stored procedures that run after the import timeout. Consequently I want to split the file into 10 smaller files.
I could do it in C# but I suspect there's a much better approach using shell utilities. I have cygwin installed and can use all the common Linux shell utilities. Is there a neat little combination of commands I could use to split the file?
Use split
- e.g. to split a file every 3.4 million lines (should give you 10 files):
split -l 3400000
$ man split
splitting by line is good however you can also split by size
creates 1MB files out of the original
split -b 1024k <file_name>
creates 1GB files out of original
split -b 1024m <file_name>
The version of split in coreutils 8.8 (not yet released) will have the command
split -n l/10
For now you'll need to specify a particular number of lines per file
If your csv file have 500 rows to split two part(250+250)
download and install "Cygwin Terminal"
put comment "split -l 250 filename.csv"