Let's assume there is a file which looks like this:
xxxx aa whatever
yyyy bb whatever
zzzz aa whatever
I'd like split it into 2 files, containing:
first:
xxxx aa whatever
zzzz aa whatever
second:
yyyy bb whatever
I.e. I want to group the rows based on some value in the lines (rule can be: 2nd word separated by spaces), but do not reorder the lines within groups.
Of course I can write a program to do it, but I'm wondering if there is any ready tool that can do something like this?
Sorry, I didn't mention it, as I assumed it's pretty obvious - number of different "words" is huge. we are talking about at least 10000 of them. I.e. any solution based on enumeration of the words before hand will not work.
And also - I wouldn't really like multi-pass split - the files in question are usually pretty big.
This will create files named
output.aa
,output.bb
, etc.:Well, you could do a grep to get the lines that match, and a grep -v to get the lines that don't match.
Hm, you could do
sort -f" " -s -k 2,2
, but that's O(n log n).