可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
What is the correct way to parse a string using regular expressions in a linux shell script? I wrote the following script to print my SO rep on the console using curl
and sed
(not solely because I'm rep-crazy - I'm trying to learn some shell scripting and regex before switching to linux).
json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | sed 's/.*"reputation":"\([0-9,]\{1,\}\)".*/\1/' | sed s/,//
But somehow I feel that sed
is not the proper tool to use here. I heard that grep
is all about regex and explored it a bit. But apparently it prints the whole line whenever a match is found - I am trying to extract a number from a single line of text. Here is a downsized version of the string that I'm working on (returned by curl
).
{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 silver badge\"\u003e\u003cspan class=\"badge2\"\u003e●\u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}
I guess my questions are:
- What is the correct way to parse a string using regular expressions in a linux shell script?
- Is
sed
the right thing to use here?
- Could this be done using
grep
?
- Is there any other command that's more easier/appropriate?
回答1:
The grep
command will select the desired line(s) from many but it will not directly manipulate the line. For that, you use sed
in a pipeline:
someCommand | grep 'Amarghosh' | sed -e 's/foo/bar/g'
Alternatively, awk
(or perl
if available) can be used. It's a far more powerful text processing tool than sed
in my opinion.
someCommand | awk '/Amarghosh/ { do something }'
For simple text manipulations, just stick with the grep/sed
combo. When you need more complicated processing, move on up to awk
or perl
.
My first thought is to just use:
echo '{"displayName":"Amarghosh","reputation":"2,737","badgeHtml"'
| sed -e 's/.*tion":"//' -e 's/".*//' -e 's/,//g'
which keeps the number of sed
processes to one (you can give multiple commands with -e
).
回答2:
You may be interested in using Perl for such tasks. As a demonstration, here is a Perl script which prints the number you want:
#!/usr/local/bin/perl
use warnings;
use strict;
use LWP::Simple;
use JSON;
my $url = "http://stackoverflow.com/users/flair/165297.json";
my $flair = get ($url);
my $parsed = from_json ($flair);
print "$parsed->{reputation}\n";
This script requires you to install the JSON module, which you can do with just the command cpan JSON
.
回答3:
For working with JSON in shell script, use jsawk which like awk, but for JSON.
json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | jsawk 'return this.reputation' # 2,747
回答4:
My proposition:
$ echo $json | sed 's/,//g;s/^.*reputation...\([0-9]*\).*$/\1/'
I put two commands in sed argument:
s/,//g
is used to remove all commas, in particular the ones that are present in the reputation value.
s/^.*reputation...\([0-9]*\).*$/\1/
locates the reputation value in the line and replaces the whole line by that value.
In this particular case, I find that sed
provides the most compact command without loss of readability.
Other tools for manipulating strings (not only regex) include:
grep
, awk
, perl
mentioned in most of other answers
tr
for replacing characters
cut
, paste
for handling multicolumn inputs
bash
itself with its rich $(...)
syntax for accessing variables
tail
, head
for keeping last or first lines of a file
回答5:
sed
is appropriate, but you'll spawn a new process for every sed
you use (which may be too heavyweight in more complex scenarios). grep
is not really appropriate. It's a search tool that uses regexps to find lines of interest.
Perl is one appropriate solution here, being a shell scripting language with powerful regexp features. It'll do most everything you need without spawning out to separate processes (unlike normal Unix shell scripting) and has a huge library of additional functions.
回答6:
You can do it with grep. There is -o switch in grep witch extract only matching string not whole line.
$ echo $json | grep -o '"reputation":"[0-9,]\+"' | grep -o '[0-9,]\+'
2,747
回答7:
1) What is the correct way to parse a string using regular expressions in a linux shell script?
Tools that include regular expression capabilities include sed, grep, awk, Perl, Python, to mention a few. Even newer version of Bash have regex capabilities. All you need to do is look up the docs on how to use them.
2) Is sed the right thing to use here?
It can be, but not necessary.
3) Could this be done using grep?
Yes it can. you will just construct similar regex as you would if you use sed, or others. Note that grep just does what it does, and if you want to modify any files, it will not do it for you.
4) Is there any other command that's easier/more appropriate?
Of course. regex can be powerful, but its not necessarily the best tool to use everytime. It also depends on what you mean by "easier/appropriate".
The other method to use with minimal fuss on regex is using the fields/delimiter approach. you look for patterns that can be "splitted". for eg, in your case(i have downloaded the 165297.json file instead of using curl..(but its the same)
awk 'BEGIN{
FS="reputation" # split on the word "reputation"
}
{
m=split($2,a,"\",\"") # field 2 will contain the value you want plus the rest
# Then split on ":" and save to array "a"
gsub(/[:\",]/,"",a[1]) # now, get rid of the redundant characters
print a[1]
}' 165297.json
output:
$ ./shell.sh
2747
回答8:
sed
is a perfectly valid command for your task, but it may not be the only one.
grep
may be useful too, but as you say it prints the whole line. It's most useful for filtering the lines of a multi-line file, and discarding the lines you don't want.
Efficient shell scripts can use a combination of commands (not just the two you mentioned), exploiting the talents of each.
回答9:
Blindly:
echo $json | awk -F\" '{print $8}'
Similar (the field separator can be a regex):
awk -F'{"|":"|","|"}' '{print $5}'
Smarter (look for the key and print its value):
awk -F'{"|":"|","|"}' '{for(i=2; i<=NF; i+=2) if ($i == "reputation") print $(i+1)}'
回答10:
You can use a proper library (as others noted):
E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"
or
$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'
depending on OS/shell combination.
回答11:
Simple RegEx via Shell
Disregarding the specific code in question, there may be times when you want to do a quick regex replace-all from stdin to stdout using shell, in a simple way, using a string syntax similar to JavaScript.
Below are some examples for anyone looking for a way to do this. Perl is a better bet on Mac since it lacks some sed options. If you want to get stdin as a variable you can use MY_VAR=$(cat);
.
echo 'text' | perl -pe 's/search/replace/g'; # using perl
echo 'text' | sed -e 's/search/replace/g'; # using sed
And here's an example of a custom, reusable regex function. Arguments are source string (or -- for stdin), search, replace, and options.
regex() {
case "$#" in
( '0' ) exit 1 ;; ( '1' ) echo "$1"; exit 0 ;;
( '2' ) REP='' ;; ( '3' ) REP="$3"; OPT='' ;;
( * ) REP="$3"; OPT="$4" ;;
esac
TXT="$1"; SRCH="$2";
if [ "$1" = "--" ]; then [ ! -t 0 ] && read -r TXT; fi
echo "$TXT" | perl -pe 's/'"$SRCH"'/'"$REP"'/'"$OPT";
}
echo 'text' | regex -- search replace g;