Using regular expressions in shell script

2020-02-23 06:07发布

What is the correct way to parse a string using regular expressions in a linux shell script? I wrote the following script to print my SO rep on the console using curl and sed (not solely because I'm rep-crazy - I'm trying to learn some shell scripting and regex before switching to linux).

json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | sed 's/.*"reputation":"\([0-9,]\{1,\}\)".*/\1/' | sed s/,//

But somehow I feel that sed is not the proper tool to use here. I heard that grep is all about regex and explored it a bit. But apparently it prints the whole line whenever a match is found - I am trying to extract a number from a single line of text. Here is a downsized version of the string that I'm working on (returned by curl).

{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 silver badge\"\u003e\u003cspan class=\"badge2\"\u003e●\u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}

I guess my questions are:

  • What is the correct way to parse a string using regular expressions in a linux shell script?
  • Is sed the right thing to use here?
  • Could this be done using grep?
  • Is there any other command that's more easier/appropriate?

11条回答
倾城 Initia
2楼-- · 2020-02-23 06:33

You can use a proper library (as others noted):

E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"

or

$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'

depending on OS/shell combination.

查看更多
迷人小祖宗
3楼-- · 2020-02-23 06:35

The grep command will select the desired line(s) from many but it will not directly manipulate the line. For that, you use sed in a pipeline:

someCommand | grep 'Amarghosh' | sed -e 's/foo/bar/g'

Alternatively, awk (or perl if available) can be used. It's a far more powerful text processing tool than sed in my opinion.

someCommand | awk '/Amarghosh/ { do something }'

For simple text manipulations, just stick with the grep/sed combo. When you need more complicated processing, move on up to awk or perl.

My first thought is to just use:

echo '{"displayName":"Amarghosh","reputation":"2,737","badgeHtml"'
    | sed -e 's/.*tion":"//' -e 's/".*//' -e 's/,//g'

which keeps the number of sed processes to one (you can give multiple commands with -e).

查看更多
Juvenile、少年°
4楼-- · 2020-02-23 06:35

sed is appropriate, but you'll spawn a new process for every sed you use (which may be too heavyweight in more complex scenarios). grep is not really appropriate. It's a search tool that uses regexps to find lines of interest.

Perl is one appropriate solution here, being a shell scripting language with powerful regexp features. It'll do most everything you need without spawning out to separate processes (unlike normal Unix shell scripting) and has a huge library of additional functions.

查看更多
孤傲高冷的网名
5楼-- · 2020-02-23 06:36

Simple RegEx via Shell

Disregarding the specific code in question, there may be times when you want to do a quick regex replace-all from stdin to stdout using shell, in a simple way, using a string syntax similar to JavaScript.

Below are some examples for anyone looking for a way to do this. Perl is a better bet on Mac since it lacks some sed options. If you want to get stdin as a variable you can use MY_VAR=$(cat);.

echo 'text' | perl -pe 's/search/replace/g'; # using perl
echo 'text' | sed -e 's/search/replace/g'; # using sed

And here's an example of a custom, reusable regex function. Arguments are source string (or -- for stdin), search, replace, and options.

regex() {
    case "$#" in
        ( '0' ) exit 1 ;; ( '1' ) echo "$1"; exit 0 ;;
        ( '2' ) REP='' ;; ( '3' ) REP="$3"; OPT='' ;;
        ( * ) REP="$3"; OPT="$4" ;;
    esac
    TXT="$1"; SRCH="$2";
    if [ "$1" = "--" ]; then [ ! -t 0 ] && read -r TXT; fi
    echo "$TXT" | perl -pe 's/'"$SRCH"'/'"$REP"'/'"$OPT";
}

echo 'text' | regex -- search replace g;

查看更多
仙女界的扛把子
6楼-- · 2020-02-23 06:41

For working with JSON in shell script, use jsawk which like awk, but for JSON.

json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | jsawk 'return this.reputation' # 2,747
查看更多
啃猪蹄的小仙女
7楼-- · 2020-02-23 06:42

You can do it with grep. There is -o switch in grep witch extract only matching string not whole line.

$ echo $json | grep -o '"reputation":"[0-9,]\+"' | grep -o '[0-9,]\+'
2,747
查看更多
登录 后发表回答