Split input into multiple outputs based on content

2019-05-10 02:28发布

Let's assume there is a file which looks like this:

xxxx aa whatever
yyyy bb whatever
zzzz aa whatever

I'd like split it into 2 files, containing:

first:

xxxx aa whatever
zzzz aa whatever

second:

yyyy bb whatever

I.e. I want to group the rows based on some value in the lines (rule can be: 2nd word separated by spaces), but do not reorder the lines within groups.

Of course I can write a program to do it, but I'm wondering if there is any ready tool that can do something like this?

Sorry, I didn't mention it, as I assumed it's pretty obvious - number of different "words" is huge. we are talking about at least 10000 of them. I.e. any solution based on enumeration of the words before hand will not work.

And also - I wouldn't really like multi-pass split - the files in question are usually pretty big.

标签： bash unix shell text

2条回答

虎瘦雄心在

2楼-- · 2019-05-10 03:03

This will create files named output.aa, output.bb, etc.:

awk '{print >> "output." $2}' input.file

0人赞添加讨论(0) 举报

Bombasti

3楼-- · 2019-05-10 03:06

Well, you could do a grep to get the lines that match, and a grep -v to get the lines that don't match.

Hm, you could do sort -f" " -s -k 2,2, but that's O(n log n).

0人赞添加讨论(0) 举报

Split input into multiple outputs based on content

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间