Extract all Strings between two characters to arra

2019-08-03 03:58发布

问题:

i searched ours but can't find a solution to extract all Strings between two characters to array using Bash.

I find

sed -n 's/.*\[\(.*\)\].*/\1/p'

But this only show me the last entry.

My String looks like:

var="[a1] [b1] [123] [Text text] [0x0]"

I want a Array like this:

arr[0]="a1"
arr[1]="b1"
arr[2]="123"
arr[3]="Text text"
arr[4]="0x0"

So i search for Stings between [ and ] and load it into an Array without [ and ].

Thank you for helping!

回答1:

There are a lot of suggestions that may work for you here already, but may not depending on your data. For example, substituting your current field separator of ] [ for a comma works unless you have commas embedded in your fields. Which your sample data does not have, but one never knows. :)

An ideal solution would be to use something as a field separator that is guaranteed never to be part of your field, like a null. But that's hard to do in a portable way (i.e. without knowing what tools are available). So a less extreme stance might be to use a newline as a separator:

var="[a1] [b1] [123] [Text text] [0x0]"

mapfile -t arr < <(sed $'s/^\[//;s/] \[/\\\n/g;s/]$//' <<<"$var")

declare -p arr

which would result in:

declare -a arr='([0]="a1" [1]="b1" [2]="123" [3]="Text text" [4]="0x0")'

This is functionally equivalent to the awk solution that Inian provided. Note that mapfile requires bash version 4 or above.

That said, you could also this exclusively within bash, without relying on any external tools like sed:

arr=( $var )

last=0
for i in "${!arr[@]}"; do
  if [[ ${arr[$i]} != \[* ]]; then
    arr[$last]="${arr[$last]} ${arr[$i]}"
    unset arr[$i] 
    continue
  fi
  last=$i
done

for i in "${!arr[@]}"; do
  arr[$i]="${arr[$i]:1:$((${#arr[$i]}-2))}"
done

At this point, declare -p arr results in:

declare -a arr='([0]="a1" [1]="b1" [2]="123" [3]="Text text" [5]="0x0")'

This sucks your $var into the array $arr[] with fields separated by whitespace, then it collapses the fields based on whether they begin with a square bracket. It then goes through the fields and replaces them with the substring that eliminates the first and last character. It may be a little less resilient and harder to read, but it's all within bash. :)



回答2:

There's no simple way to do it. I would use a loop to extract them one at a time:

var="[a1] [b1] [123] [Text text] [0x0]"
regex='\[([^]]*)\](.*)'
while [[ $var =~ $regex ]]; do
  arr+=("${BASH_REMATCH[1]}")
  var=${BASH_REMATCH[2]}
done

In the regular expression, \[([^]]*)\] captures everything after the first [ up to (but not including) the next ]. (.*) captures everything after that for the next iteration.

You can use declare -n in bash 4.3 or later to make this look a little less intimidating.

declare -n m1=BASH_REMATCH[1] m2=BASH_REMATCH[2]
regex='\[([^]]*)\](.*)'

var="[a1] [b1] [123] [Text text] [0x0]"
while [[ $var =~ $regex ]]; do
  arr+=("$m1")
  var=$m2
done


回答3:

$ IFS=, arr=($(sed 's/\] \[/","/g;s/\]/"/;s/\[/"/' <<< "$var")); echo "${arr[3]}"

"Text text"


回答4:

With GNU awk for multi-char RS and RT and newer versions of bash for mapfile:

$ mapfile -t arr < <(echo "$var" | awk -v RS='[^][]+' 'NR%2{print RT}')

$ declare -p arr
declare -a arr=([0]="a1" [1]="b1" [2]="123" [3]="Text text" [4]="0x0")