how to use sed, awk, or gawk to print only what is

I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk.

But in my case, I have a regular expression that I want to run against a text file to extract a specific value. I don't want to do search-and-replace. This is being called from bash. Let's use an example:

Example regular expression:

.*abc([0-9]+)xyz.*

Example input file:

a
b
c
abc12345xyz
a
b
c

As simple as this sounds, I cannot figure out how to call sed/awk/gawk correctly. What I was hoping to do, is from within my bash script have:

myvalue=$( sed <...something...> input.txt )

Things I've tried include:

sed -e 's/.*([0-9]).*/\\1/g' example.txt # extracts the entire input file
sed -n 's/.*([0-9]).*/\\1/g' example.txt # extracts nothing

标签： regex unix sed awk gawk

11条回答

爷、活的狠高调

2楼-- · 2020-05-12 04:45

My sed (Mac OS X) didn't work with +. I tried * instead and I added p tag for printing match:

sed -n 's/^.*abc\([0-9]*\)xyz.*$/\1/p' example.txt

For matching at least one numeric character without +, I would use:

sed -n 's/^.*abc\([0-9][0-9]*\)xyz.*$/\1/p' example.txt

0人赞添加讨论(0) 举报

聊天终结者

3楼-- · 2020-05-12 04:45

I use perl to make this easier for myself. e.g.

perl -ne 'print $1 if /.*abc([0-9]+)xyz.*/'

This runs Perl, the -n option instructs Perl to read in one line at a time from STDIN and execute the code. The -e option specifies the instruction to run.

The instruction runs a regexp on the line read, and if it matches prints out the contents of the first set of bracks ($1).

You can do this will multiple file names on the end also. e.g.

perl -ne 'print $1 if /.*abc([0-9]+)xyz.*/' example1.txt example2.txt

0人赞添加讨论(0) 举报

对你真心纯属浪费

4楼-- · 2020-05-12 04:45

If your version of grep supports it you could use the -o option to print only the portion of any line that matches your regexp.

If not then here's the best sed I could come up with:

sed -e '/[0-9]/!d' -e 's/^[^0-9]*//' -e 's/[^0-9]*$//'

... which deletes/skips with no digits and, for the remaining lines, removes all leading and trailing non-digit characters. (I'm only guessing that your intention is to extract the number from each line that contains one).

The problem with something like:

sed -e 's/.*\([0-9]*\).*/&/'

.... or

sed -e 's/.*\([0-9]*\).*/\1/'

... is that sed only supports "greedy" match ... so the first .* will match the rest of the line. Unless we can use a negated character class to achieve a non-greedy match ... or a version of sed with Perl-compatible or other extensions to its regexes, we can't extract a precise pattern match from with the pattern space (a line).

0人赞添加讨论(0) 举报

神经病院院长

5楼-- · 2020-05-12 04:46

perl is the cleanest syntax, but if you don't have perl (not always there, I understand), then the only way to use gawk and components of a regex is to use the gensub feature.

gawk '/abc[0-9]+xyz/ { print gensub(/.*([0-9]+).*/,"\\1","g"); }' < file

output of the sample input file will be

Note: gensub replaces the entire regex (between the //), so you need to put the .* before and after the ([0-9]+) to get rid of text before and after the number in the substitution.

0人赞添加讨论(0) 举报

再贱就再见

6楼-- · 2020-05-12 04:47

You can use sed to do this

 sed -rn 's/.*abc([0-9]+)xyz.*/\1/gp'

-n don't print the resulting line
-r this makes it so you don't have the escape the capture group parens().
\1 the capture group match
/g global match
/p print the result

I wrote a tool for myself that makes this easier

rip 'abc(\d+)xyz' '$1'

0人赞添加讨论(0) 举报

三岁会撩人

7楼-- · 2020-05-12 04:54

You can use awk with match() to access the captured group:

$ awk 'match($0, /abc([0-9]+)xyz/, matches) {print matches[1]}' file
12345

This tries to match the pattern abc[0-9]+xyz. If it does so, it stores its slices in the array matches, whose first item is the block [0-9]+. Since match() returns the character position, or index, of where that substring begins (1, if it starts at the beginning of string), it triggers the print action.

With grep you can use a look-behind and look-ahead:

$ grep -oP '(?<=abc)[0-9]+(?=xyz)' file
12345

$ grep -oP 'abc\K[0-9]+(?=xyz)' file
12345

This checks the pattern [0-9]+ when it occurs within abc and xyz and just prints the digits.

0人赞添加讨论(0) 举报

1 2 下一页

how to use sed, awk, or gawk to print only what is

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间