i searched ours but can't find a solution to extract all Strings between two characters to array using Bash.
I find
sed -n 's/.*\[\(.*\)\].*/\1/p'
But this only show me the last entry.
My String looks like:
var="[a1] [b1] [123] [Text text] [0x0]"
I want a Array like this:
arr[0]="a1"
arr[1]="b1"
arr[2]="123"
arr[3]="Text text"
arr[4]="0x0"
So i search for Stings between [ and ] and load it into an Array without [ and ].
Thank you for helping!
There are a lot of suggestions that may work for you here already, but may not depending on your data. For example, substituting your current field separator of ] [
for a comma works unless you have commas embedded in your fields. Which your sample data does not have, but one never knows. :)
An ideal solution would be to use something as a field separator that is guaranteed never to be part of your field, like a null. But that's hard to do in a portable way (i.e. without knowing what tools are available). So a less extreme stance might be to use a newline as a separator:
var="[a1] [b1] [123] [Text text] [0x0]"
mapfile -t arr < <(sed $'s/^\[//;s/] \[/\\\n/g;s/]$//' <<<"$var")
declare -p arr
which would result in:
declare -a arr='([0]="a1" [1]="b1" [2]="123" [3]="Text text" [4]="0x0")'
This is functionally equivalent to the awk
solution that Inian provided. Note that mapfile
requires bash version 4 or above.
That said, you could also this exclusively within bash, without relying on any external tools like sed:
arr=( $var )
last=0
for i in "${!arr[@]}"; do
if [[ ${arr[$i]} != \[* ]]; then
arr[$last]="${arr[$last]} ${arr[$i]}"
unset arr[$i]
continue
fi
last=$i
done
for i in "${!arr[@]}"; do
arr[$i]="${arr[$i]:1:$((${#arr[$i]}-2))}"
done
At this point, declare -p arr
results in:
declare -a arr='([0]="a1" [1]="b1" [2]="123" [3]="Text text" [5]="0x0")'
This sucks your $var
into the array $arr[]
with fields separated by whitespace, then it collapses the fields based on whether they begin with a square bracket. It then goes through the fields and replaces them with the substring that eliminates the first and last character. It may be a little less resilient and harder to read, but it's all within bash. :)
There's no simple way to do it. I would use a loop to extract them one at a time:
var="[a1] [b1] [123] [Text text] [0x0]"
regex='\[([^]]*)\](.*)'
while [[ $var =~ $regex ]]; do
arr+=("${BASH_REMATCH[1]}")
var=${BASH_REMATCH[2]}
done
In the regular expression, \[([^]]*)\]
captures everything after the first [
up to (but not including) the next ]
. (.*)
captures everything after that for the next iteration.
You can use declare -n
in bash
4.3 or later to make this look a little less intimidating.
declare -n m1=BASH_REMATCH[1] m2=BASH_REMATCH[2]
regex='\[([^]]*)\](.*)'
var="[a1] [b1] [123] [Text text] [0x0]"
while [[ $var =~ $regex ]]; do
arr+=("$m1")
var=$m2
done
$ IFS=, arr=($(sed 's/\] \[/","/g;s/\]/"/;s/\[/"/' <<< "$var")); echo "${arr[3]}"
"Text text"
With GNU awk for multi-char RS and RT and newer versions of bash for mapfile:
$ mapfile -t arr < <(echo "$var" | awk -v RS='[^][]+' 'NR%2{print RT}')
$ declare -p arr
declare -a arr=([0]="a1" [1]="b1" [2]="123" [3]="Text text" [4]="0x0")