What is the correct way to parse a string using regular expressions in a linux shell script? I wrote the following script to print my SO rep on the console using curl
and sed
(not solely because I'm rep-crazy - I'm trying to learn some shell scripting and regex before switching to linux).
json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | sed 's/.*"reputation":"\([0-9,]\{1,\}\)".*/\1/' | sed s/,//
But somehow I feel that sed
is not the proper tool to use here. I heard that grep
is all about regex and explored it a bit. But apparently it prints the whole line whenever a match is found - I am trying to extract a number from a single line of text. Here is a downsized version of the string that I'm working on (returned by curl
).
{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 silver badge\"\u003e\u003cspan class=\"badge2\"\u003e●\u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}
I guess my questions are:
- What is the correct way to parse a string using regular expressions in a linux shell script?
- Is
sed
the right thing to use here? - Could this be done using
grep
? - Is there any other command that's more easier/appropriate?
You can use a proper library (as others noted):
E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"
or
$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'
depending on OS/shell combination.
The
grep
command will select the desired line(s) from many but it will not directly manipulate the line. For that, you usesed
in a pipeline:Alternatively,
awk
(orperl
if available) can be used. It's a far more powerful text processing tool thansed
in my opinion.For simple text manipulations, just stick with the
grep/sed
combo. When you need more complicated processing, move on up toawk
orperl
.My first thought is to just use:
which keeps the number of
sed
processes to one (you can give multiple commands with-e
).sed
is appropriate, but you'll spawn a new process for everysed
you use (which may be too heavyweight in more complex scenarios).grep
is not really appropriate. It's a search tool that uses regexps to find lines of interest.Perl is one appropriate solution here, being a shell scripting language with powerful regexp features. It'll do most everything you need without spawning out to separate processes (unlike normal Unix shell scripting) and has a huge library of additional functions.
Simple RegEx via Shell
Disregarding the specific code in question, there may be times when you want to do a quick regex replace-all from stdin to stdout using shell, in a simple way, using a string syntax similar to JavaScript.
Below are some examples for anyone looking for a way to do this. Perl is a better bet on Mac since it lacks some sed options. If you want to get stdin as a variable you can use
MY_VAR=$(cat);
.echo 'text' | perl -pe 's/search/replace/g'; # using perl
echo 'text' | sed -e 's/search/replace/g'; # using sed
And here's an example of a custom, reusable regex function. Arguments are source string (or -- for stdin), search, replace, and options.
echo 'text' | regex -- search replace g;
For working with JSON in shell script, use jsawk which like awk, but for JSON.
You can do it with grep. There is -o switch in grep witch extract only matching string not whole line.