可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a text file containing both text and numbers, I want to use grep to extract only the numbers I need for example, given a file as follow:
miss rate 0.21
ipc 222
stalls n shdmem 112
So say I only want to extract the data for miss rate
which is 0.21
. How do I do it with grep or sed? Plus, I need more than one number, not only the one after miss rate
. That is, I may want to get both 0.21
and 112
. A sample output might look like this:
0.21 222 112
Cause I need the data for later plot.
回答1:
Use awk
instead:
awk '/^miss rate/ { print $3 }' yourfile
To do it with just grep, you need non-standard extensions like here with GNU grep using PCRE (-P) with positive lookbehind (?<=..) and match only (-o):
grep -Po '(?<=miss rate ).*' yourfile
回答2:
If you really want to use only grep for this, then you can try:
grep "miss rate" file | grep -oe '\([0-9.]*\)'
It will first find the line that matches, and then only output the digits.
Sed might be a bit more readable, though:
sed -n 's#miss rate ##p' file
回答3:
Using the special look around regex trick \K with pcre engine with grep :
grep -oP 'miss rate \K.*' file.txt
or with perl :
perl -lne 'print $& if /miss rate \K.*/' file.txt
回答4:
The grep
-and-cut
solution would look like:
to get the 3rd field for every successful grep use:
grep "^miss rate " yourfile | cut -d ' ' -f 3
or to get the 3rd field and the rest use:
grep "^miss rate " yourfile | cut -d ' ' -f 3-
Or if you use bash and "miss rate" only occurs once in your file you can also just do:
a=( $(grep -m 1 "miss rate" yourfile) )
echo ${a[2]}
where ${a[2]}
is your result.
If "miss rate" occurs more then once you can loop over the grep output reading only what you need. (in bash)
回答5:
You can use:
grep -P "miss rate \d+(\.\d+)?" file.txt
or:
grep -E "miss rate [0-9]+(\.[0-9]+)?"
Both of those commands will print out miss rate 0.21
. If you want to extract the number only, why not use Perl, Sed or Awk?
If you really want to avoid those, maybe this will work?
grep -E "miss rate [0-9]+(\.[0-9]+)?" g | xargs basename | tail -n 1
回答6:
I believe
sed 's|[^0-9]*\([0-9\.]*\)|\1 |g' fiilename
will do the trick. However every entry will be on it's own line if that is ok. I am sure there is a way for sed to produce a comma or space delimited list but I am not a super master of all things sed.