Split input into multiple outputs based on content

2019-05-10 02:38发布

问题:

Let's assume there is a file which looks like this:

xxxx aa whatever
yyyy bb whatever
zzzz aa whatever

I'd like split it into 2 files, containing:

first:

xxxx aa whatever
zzzz aa whatever

second:

yyyy bb whatever

I.e. I want to group the rows based on some value in the lines (rule can be: 2nd word separated by spaces), but do not reorder the lines within groups.

Of course I can write a program to do it, but I'm wondering if there is any ready tool that can do something like this?

Sorry, I didn't mention it, as I assumed it's pretty obvious - number of different "words" is huge. we are talking about at least 10000 of them. I.e. any solution based on enumeration of the words before hand will not work.

And also - I wouldn't really like multi-pass split - the files in question are usually pretty big.

回答1:

This will create files named output.aa, output.bb, etc.:

awk '{print >> "output." $2}' input.file


回答2:

Well, you could do a grep to get the lines that match, and a grep -v to get the lines that don't match.

Hm, you could do sort -f" " -s -k 2,2, but that's O(n log n).