Grep whole paragraphs of a text containing a speci

My goal is to extract the paragraphs of a text that contain a specific keyword. Not just the lines that contain the keyword, but the whole paragraph. The rule imposed on my text files is that every paragraph starts with a certain pattern (e.g. Pa0) which is used throughout the text only in the start of the paragraph. Each paragraph ends with a new line character.

For example, imagine I have the following text:

Pa0 
This is the first paragraph bla bla bla
This is another line in the same paragraph bla bla 
This is a third line bla bla 

Pa0
This is the second paragraph bla bla bla
Second line bla bla My keyword is here!
bla bla bla 
bla 

Pa0
Hey, third paragraph bla bla bla!
bla bla 

Pa0
keyword keyword
keyword
Another line! bla

My goal is to extract these paragraphs that contain the word "keyword". For example:

Pa0
This is the second paragraph bla bla bla
Second line bla bla My keyword is here!
bla bla bla 
bla 

Pa0
keyword keyword
keyword
Another line! bla

I can use e.g. grep for the keyword and -A, -B or -C option to get a constant number of lines before and/or after the line where the keyword is located but this does not seem enough since the beginning and end of the text block depends on the delimiters "Pa0" and "\n".

Any suggestion for grep or another tool (e.g. awk, sed, perl) would be helpful.

标签： text awk grep paragraph

3条回答

乱世女痞

2楼-- · 2020-03-26 07:14

hope this will help

sed -n '/Pa0/,/^$/p' filename

cat filename | sed -n '/Pa0/,/^$/p'

-n, suppress automatic printing of pattern space

-p, Print the current pattern space

/Pa0/, paragraph starting with Pa0 pattern

/^$/, paragraph ending with a blank line

^, start of line

$, end of line

Reference: http://www.cyberciti.biz/faq/sed-display-text/

0人赞添加讨论(0) 举报

▲ chillily

3楼-- · 2020-03-26 07:15

if text.txt contains the text you want, then:

$ sed -e '/./{H;$!d;}' -e 'x;/keyword/!d;' text.txt
Pa0
This is the second paragraph bla bla bla
Second line bla bla My keyword is here!
bla bla bla
bla

Pa0
keyword keyword
keyword
Another line! bla

0人赞添加讨论(0) 举报

Melony?

4楼-- · 2020-03-26 07:18

It is simple with awk:

awk '/keyword/' RS="\n\n" ORS="\n\n" input.txt

Explanation:

Usually awk operates on a per line basis, because the default value of the record separator RS is \n (a single new line). By changing the RS to two new lines in sequence (an empty line) we can easily operate on a paragraph basis.

/keyword/ is a condition, a regex. Since there is no action after the condition awk will simply print the unchanged record (the paragraph) if it contains keyword.

Setting the output record separator ORS to \n\n will separate the paragraphs of output with an empty line, just like in the input.

0人赞添加讨论(0) 举报

Grep whole paragraphs of a text containing a speci

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间