Split input into multiple outputs based on content

2019-05-10 02:28发布

Let's assume there is a file which looks like this:

xxxx aa whatever
yyyy bb whatever
zzzz aa whatever

I'd like split it into 2 files, containing:

first:

xxxx aa whatever
zzzz aa whatever

second:

yyyy bb whatever

I.e. I want to group the rows based on some value in the lines (rule can be: 2nd word separated by spaces), but do not reorder the lines within groups.

Of course I can write a program to do it, but I'm wondering if there is any ready tool that can do something like this?

Sorry, I didn't mention it, as I assumed it's pretty obvious - number of different "words" is huge. we are talking about at least 10000 of them. I.e. any solution based on enumeration of the words before hand will not work.

And also - I wouldn't really like multi-pass split - the files in question are usually pretty big.

2条回答
虎瘦雄心在
2楼-- · 2019-05-10 03:03

This will create files named output.aa, output.bb, etc.:

awk '{print >> "output." $2}' input.file
查看更多
Bombasti
3楼-- · 2019-05-10 03:06

Well, you could do a grep to get the lines that match, and a grep -v to get the lines that don't match.

Hm, you could do sort -f" " -s -k 2,2, but that's O(n log n).

查看更多
登录 后发表回答