I have a tab-delimited text file that is very large. Many lines in the file have the same value for one of the columns in the file (call it column k). I want to separate this file into multiple files, putting entries with the same value of k in the same file. How can I do this? For example:
a foo
1 bar
c foo
2 bar
d foo
should be split into a file "foo" containing the entries "a foo" and "c foo" and "d foo" and a file called "bar" containing the entries "1 bar" and "2 bar".
how can I do this in either a shell script or in Python?
thanks.
After running both versions of the above awk commands (+ having awk error out) and seeing the request for a python version, I embarked on a short and not particularly arduous journey of writing a utility to easily split files based on keys.
Github repo: https://github.com/gstaubli/split_file_by_key
Background info: http://garrens.com/blog/2015/04/02/split-file-by-keys/
Awk error:
This should work per your spec
Hope this helps.
I'm not sure how efficient it is, but the quick and easy way is to take advantage of the way file redirection works in
awk
:That will append each line (unmodified) into a file named after column
5
. Adjust as necessary.