I am looking for a way to split a string in bash over a delimiter string, and place the parts in an array.
Simple case:
#!/bin/bash
b="aaaaa/bbbbb/ddd/ffffff"
echo "simple string: $b"
IFS='/' b_split=($b)
echo ;
echo "split"
for i in ${b_split[@]}
do
echo "------ new part ------"
echo "$i"
done
Gives output
simple string: aaaaa/bbbbb/ddd/ffffff
split
------ new part ------
aaaaa
------ new part ------
bbbbb
------ new part ------
ddd
------ new part ------
ffffff
More complex case:
#!/bin/bash
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
for i in ${c_split[@]}
do
echo "------ new part ------"
echo "$i"
done
Gives output:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA
------ new part ------
A
B
------ new part ------
BB
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
C
------ new part ------
------ new part ------
CC
DD
------ new part ------
D
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
EEE
FF
I would like the second output to be like
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
I.e. to split the string on a sequence of characters, instead of one. How can I do this?
I am looking for an answer that would only modify this line in the second script:
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
IFS
disambiguation
IFS
mean Input Field Separators, as list of characters that could be used as separators
.
By default, this is set to
\t\n
, meaning that any number (greater than zero) of space, tabulation and/or newline could be one separator
.
So the string:
" blah foo=bar
baz "
Leading and trailing separators would be ignored and this string will contain only 3
parts: blah
, foo=bar
and baz
.
Splitting a string using IFS
is possible if you know a valid field separator not used in your string.
OIFS="$IFS"
IFS='§'
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
c_split=(${c//=======/§})
IFS="$OIFS"
printf -- "------ new part ------\n%s\n" "${c_split[@]}"
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
But this work only while string do not contain §
.
You could use another character, like IFS=$'\026';c_split=(${c//=======/$'\026'})
but anyway this may involve furter bugs.
You could browse character maps for finding one who's not in your string:
myIfs=""
for i in {1..255};do
printf -v char "$(printf "\\\%03o" $i)"
[ "$c" == "${c#*$char}" ] && myIfs="$char" && break
done
if ! [ "$myIFS" ] ;then
echo no split char found, could not do the job, sorry.
exit 1
fi
but I find this solution a little overkill.
Splitting on spaces (or without modifying IFS)
Under bash, we could use this bashism:
b="aaaaa/bbbbb/ddd/ffffff"
b_split=(${b//// })
In fact, this syntaxe ${varname//
will initiate a translation (delimited by /
) replacing all occurences of /
by a space
, before assigning it to an array b_split
.
Of course, this still use IFS
and split array on spaces.
This is not the best way, but could work with specific cases.
You could even drop unwanted spaces before splitting:
b='12 34 / 1 3 5 7 / ab'
b1=${b// }
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]}" ;echo
<12>, <34>, <1>, <3>, <5>, <7>, <ab>,
or exchange thems...
b1=${b// /§}
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]//§/ }" ;echo
<12 34 >, < 1 3 5 7 >, < ab>,
Splitting line on strings
:
So you have to not use IFS
for your meaning, but bash do have nice features:
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep='======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
echo "${c%%$mySep*}"
c="${c#*$mySep}"
done
echo "------ last part ------"
echo "$c"
Let see:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
Nota: Leading and trailing newlines are not deleted. If this is needed, you could:
mySep=$'\n=======\n'
instead of simply =======
.
Or you could rewrite split loop for keeping explicitely this out:
mySep=$'======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
part="${c%%$mySep*}"
part="${part##$'\n'}"
echo "${part%%$'\n'}"
c="${c#*$mySep}"
done
echo "------ last part ------"
c=${c##$'\n'}
echo "${c%%$'\n'}"
Any case, this match what SO question asked for (: and his sample :)
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
Finaly creating an array
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep=$'======='
export -a c_split
while [ "$c" != "${c#*$mySep}" ];do
part="${c%%$mySep*}"
part="${part##$'\n'}"
c_split+=("${part%%$'\n'}")
c="${c#*$mySep}"
done
c=${c##$'\n'}
c_split+=("${c%%$'\n'}")
for i in "${c_split[@]}"
do
echo "------ new part ------"
echo "$i"
done
Do this finely:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
Some explanations:
export -a var
to define var
as an array and share them in childs
${variablename%string*}
, ${variablename%%string*}
result in the left part of variablename, upto but without string. One %
mean last occurence of string and %%
for all occurences. Full variablename is returned is string not found.
${variablename#*string}
, do same in reverse way: return last part of variablename from but without string. One #
mean first occurence and two ##
man all occurences.
Nota in replacement, character *
is a joker mean any number of any character.
The command echo "${c%%$'\n'}"
would echo variable c but without any number of newline at end of string.
So if variable contain Hello WorldZorGluBHello youZorGluBI'm happy
,
variable="Hello WorldZorGluBHello youZorGluBI'm happy"
$ echo ${variable#*ZorGluB}
Hello youZorGlubI'm happy
$ echo ${variable##*ZorGluB}
I'm happy
$ echo ${variable%ZorGluB*}
Hello WorldZorGluBHello you
$ echo ${variable%%ZorGluB*}
Hello World
$ echo ${variable%%ZorGluB}
Hello WorldZorGluBHello youZorGluBI'm happy
$ echo ${variable%happy}
Hello WorldZorGluBHello youZorGluBI'm
$ echo ${variable##* }
happy
All this is explained in the manpage:
$ man -Len -Pless\ +/##word bash
$ man -Len -Pless\ +/%%word bash
$ man -Len -Pless\ +/^\\\ *export\\\ .*word bash
Step by step, the splitting loop:
The separator:
mySep=$'======='
Declaring c_split
as an array (and could be shared with childs)
export -a c_split
While variable c do contain at least one occurence of mySep
while [ "$c" != "${c#*$mySep}" ];do
Trunc c from first mySep
to end of string and assign to part
.
part="${c%%$mySep*}"
Remove leading newlines
part="${part##$'\n'}"
Remove trailing newlines and add result as a new array element to c_split
.
c_split+=("${part%%$'\n'}")
Reassing c whith the rest of string when left upto mySep
is removed
c="${c#*$mySep}"
Done ;-)
done
Remove leading newlines
c=${c##$'\n'}
Remove trailing newlines and add result as a new array element to c_split
.
c_split+=("${c%%$'\n'}")
Into a function:
ssplit() {
local string="$1" array=${2:-ssplited_array} delim="${3:- }" pos=0
while [ "$string" != "${string#*$delim}" ];do
printf -v $array[pos++] "%s" "${string%%$delim*}"
string="${string#*$delim}"
done
printf -v $array[pos] "%s" "$string"
}
Usage:
ssplit "<quoted string>" [array name] [delimiter string]
where array name is $splitted_array
by default and delimiter is one single space.
You could use:
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
ssplit "$c" c_split $'\n=======\n'
printf -- "--- part ----\n%s\n" "${c_split[@]}"
--- part ----
AA=A
B=BB
--- part ----
C==CC
DD=D
--- part ----
EEE
FF
Following script tested in bash:
kent@7pLaptop:/tmp/test$ bash --version
GNU bash, version 4.2.42(2)-release (i686-pc-linux-gnu)
the script: (named t.sh
)
#!/bin/bash
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c"
echo "split now"
c_split=($(echo "$c"|awk -vRS="\n=*\n" '{gsub(/\n/,"\\n");printf $0" "}'))
for i in ${c_split[@]}
do
echo "---- new part ----"
echo -e "$i"
done
output:
kent@7pLaptop:/tmp/test$ ./t.sh
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split now
---- new part ----
AA=A
B=BB
---- new part ----
C==CC
DD=D
---- new part ----
EEE
FF
note the echo statement in that for loop, if you remove the option -e
you will see:
---- new part ----
AA=A\nB=BB
---- new part ----
C==CC\nDD=D
---- new part ----
EEE\nFF\n
take -e
or not depends on your requirement.
Here's an approach that doesn't fumble when the data contains literal backslash sequences, spaces and other:
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";
c_split=()
while IFS= read -r -d '' part
do
c_split+=( "$part" )
done < <(printf "%s" "$c" | sed -e 's/=======/\x00/g')
c_split+=( "$part" )
for i in "${c_split[@]}"
do
echo "------ new part ------"
echo "$i"
done
Note that the string is actually split on "=======" as requested, so the line feeds become part of the data (causing extra blank lines when "echo" adds its own).
Added some in the example text because of this comment:
This breaks if you replace AA=A with AA =A or with AA=\nA – that
other guy
EDIT: I added a suggestion that isn't sensitive for some delimiter in the text. However this isn't using a "one line split" that OP was asking for, but this is how I should have done it if I would do it in bash, and want the result in an array.
script.sh (NEW):
#!/bin/bash
text=$(
echo "AA=A"; echo "AA =A"; echo "AA=\nA"; echo "B=BB"; echo "=======";
echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";
)
echo "more complex string"
echo "$text"
echo "split now"
c_split[0]=""
current=""
del=""
ind=0
# newline
newl=$'\n'
# Save IFS (not necessary when run as sub shell)
saveIFS="$IFS"
IFS="$newl"
for row in $text; do
if [[ $row =~ ^=+$ ]]; then
c_split[$ind]="$current"
((ind++))
current=""
# Avoid preceding newline
del=""
continue
fi
current+="$del$row"
del="$newl"
done
# Restore IFS
IFS="$saveIFS"
# If there is a last poor part of the text
if [[ -n $current ]]; then
c_split[$ind]="$current"
fi
# The result is an array
for i in "${c_split[@]}"
do
echo "---- new part ----"
echo "$i"
done
script.sh (OLD, with "one line split"):
(I stool the idea with awk from @Kent and adjusted it a bit)
#!/bin/bash
c=$(
echo "AA=A"; echo "AA =A"; echo "AA=\nA"; echo "B=BB"; echo "=======";
echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";
)
echo "more complex string"
echo "$c"
echo "split now"
# Now, this will be almost absolute secure,
# perhaps except a direct hit by lightning.
del=""
for ch in $'\1' $'\2' $'\3' $'\4' $'\5' $'\6' $'\7'; do
if [ -z "`echo "$c" | grep "$ch"`" ]; then
del="$ch"
break
fi
done
if [ -z "$del" ]; then
echo "Sorry, all this testing but no delmiter to use..."
exit 1
fi
IFS="$del" c_split=($(echo "$c" | awk -vRS="\n=+\n" -vORS="$del" '1'))
for i in ${c_split[@]}
do
echo "---- new part ----"
echo "$i"
done
Output:
[244an]$ bash --version
GNU bash, version 4.2.24(1)-release (x86_64-pc-linux-gnu)
[244an]$ ./script.sh
more complex string
AA=A
AA =A
AA=\nA
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split now
---- new part ----
AA=A
AA =A
AA=\nA
B=BB
---- new part ----
C==CC
DD=D
---- new part ----
EEE
FF
I'm not using -e
for echo
, to get AA=\\nA
to not do a newline