I have a very large .csv file (>500mb) and I wish to break this up into into smaller .csv files in command prompt. (Basically trying to find a linux "split" function in Windows".
This has to be a batch script as my machine only has windows installed and requesting softwares is a pain. I came across a number of sample codes (http://forums.techguy.org/software-development/1023949-split-100000-line-csv-into.html), however, it does not work when I execute the batch. All I get is one output file that is only 125kb when I requested it to parse every 20 000 lines.
Has anyone ever come across a similar problem and how did you resolve the issue?
This will give you lines
1 to 20000
innewfile1.csv
and lines
20001 to the end
in filenewfile2.csv
It overcomes the 8K character limit per line too.
This uses a helper batch file called
findrepl.bat
from - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.batPlace
findrepl.bat
in the same folder as the batch file or on the path.It's more robust than a plain batch file, and quicker too.
Use the cgwin command SPLIT. Samples -split a file every 500 lines counts: split -l 500 [filename.ext]
For more: split --help
Try this out:
As shown in the code above, it will split the original csv file into multiple csv file with a limit of 20 000 lines. All you have to do is to change the
!file!
and!limit!
variable accordingly. Hope it helps.I found this question while looking for a similar solution. I modified the answer that @Dale gave to suit my purposes. I wanted something that was a little more flexible and had some error trapping. Just thought I might put it here for anyone looking for the same thing.
A free windows app that does that
http://www.addictivetips.com/windows-tips/csv-splitter-for-windows/
If splitting very large files, the solution I found is an adaptation from this, with PowerShell "embedded" in a batch file. This works fast, as opposed to many other things I tried (I wouldn't know about other options posted here).
The way to use
mysplit.bat
below isNote: The script was intended to use the first argument as the split size. It is currently hardcoded at 100Mb. It should not be difficult to fix this.
Note 2: The filname should be enclosed in single quotes. Other alternatives for quoting apparently do not work.
Note 3: It splits the file at given number of bytes, not at given number of lines. For me this was good enough. Some lines of code could be probably added to complete each chunk read, up to the next CR/LF. This will split in full lines (not with a constant number of them), with no sacrifice in processing time.
Script
mysplit.bat
: