How to match until the last occurrence of a charac

2019-04-06 06:45发布

问题:

I am using curl and cut on a output like below.

var=$(curl https://avc.com/actuator/info | tr '"' '\n' | grep - | head -n1 | cut -d'-' -f -1, -3)

Varible var gets have two kinds of values (one at a time).

HIX_MAIN-7ae526629f6939f717165c526dad3b7f0819d85b
HIX-R1-1-3b5126629f67892110165c524gbc5d5g1808c9b5

I am actually trying to get everything until the last '-'. i.e HIX-MAIN or HIX-R1-1.

The command shown works fine to get HIX-R1-1.

But I figured this is the wrong way to do when I have something something like only 1 - in the variable; it is getting me the entire variable value (e.g. HIX_MAIN-7ae526629f6939f717165c526dad3b7f0819d85b).

How do I go about getting everything up to the last '-' into the variable var?

回答1:

This removes everything from the last - to the end:

sed 's/\(.*\)-.*/\1/'

As examples:

$ echo HIX_MAIN-7ae52 | sed 's/\(.*\)-.*/\1/'
HIX_MAIN
$ echo HIX-R1-1-3b5126629f67 | sed 's/\(.*\)-.*/\1/'
HIX-R1-1

How it works

The sed substitute command has the form s/old/new/ where old is a regular expression. In this case, the regex is \(.*\)-.*. This works because \(.*\)- is greedy: it will match everything up to the last -. Because of the escaped parens,\(...\), everything before the last - will be saved in group 1 which we can refer to as \1. The final .* matches everything after the last -. Thus, as long as the line contains a -, this regex matches the whole line and the substitute command replaces the whole line with \1.



回答2:

You can use bash string manipulation:

$ foo=a-b-c-def-ghi
$ echo "${foo%-*}"
a-b-c-def

The operators, # and % are on either side of $ on a QWERTY keyboard, which helps to remember how they modify the variable:

  • #pattern trims off the shortest prefix matching "pattern".
  • ##pattern trims off the longest prefix matching "pattern".
  • %pattern trims off the shortest suffix matching "pattern".
  • %%pattern trims off the longest suffix matching "pattern".

where pattern matches the bash pattern matching rules, including ? (one character) and * (zero or more characters).

Here, we're trimming off the shortest suffix matching the pattern -*, so ${foo%-*} will get you what you want.

Of course, there are many ways to do this using awk or sed, possibly reusing the sed command you're already running. Variable manipulation, however, can be done natively in bash without launching another process.



回答3:

You can reverse the string with rev, cut from the second field and then rev again:

rev <<< "$VARIABLE" | cut -d"-" -f2- | rev

For HIX-R1-1----3b5126629f67892110165c524gbc5d5g1808c9b5, prints:

HIX-R1-1---


回答4:

I think you should be using sed, at least after the tr:

var=$(curl https://avc.com/actuator/info | tr '"' '\n' | sed -n '/-/{s/-[^-]*$//;p;q}')

The -n means "don't print by default". The /-/ looks for a line containing a dash; it then executes s/-[^-]*$// to delete the last dash and everything after it, followed by p to print and q to quit (so it only prints the first such line).


I'm assuming that the output from curl intrinsically contains multiple lines, some of them with unwanted double quotes in them, and that you need to match only the first line that contains a dash at all (which might very well not be the first line). Once you've whittled the input down to the sole interesting line, you could use pure shell techniques to get the result that's desired, but getting the sole interesting line is not as trivial as some of the answers seem to be assuming.