I want to use bash shell to split string like:
Calcipotriol - Daivonex Cream 50mcg/1g 30 g [1]
Aspirin - DBL Aspirin 100mg [1] tablet
I want to get brand name "Davionex Cream" and "DBL Aspirin"
I want to get the name in front of parttern ***mg or ***mcg or ***g
how to do it?
In Bash you can do:
while IFS= read -r line || [[ -n "$line" ]]; do
if [[ "$line" =~ ^([[:alpha:]]+)[[:space:][:punct:]]+([[:alpha:][:space:]]+)[[:space:]](.*)$ ]]
then
printf "1:'%s' 2:'%s' 3:'%s'\n" "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
fi
done <<<"Calcipotriol - Daivonex Cream 50mcg/1g 30 g [1]
Aspirin - DBL Aspirin 100mg [1] tablet"
Prints:
1:'Calcipotriol' 2:'Daivonex Cream' 3:'50mcg/1g 30 g [1]'
1:'Aspirin' 2:'DBL Aspirin' 3:'100mg [1] tablet'
If your sample input is representative, awk
may offer the simplest solution:
awk -F'- | [0-9]+(mc?)?g' '{ print $2 }' <<'EOF'
Calcipotriol - Daivonex Cream 50mcg/1g 30 g [1]
Aspirin - DBL Aspirin 100mg [1] tablet
Foo - Foo Bar 22g [1] other
EOF
yields:
Daivonex Cream
DBL Aspirin
Foo Bar
You can use sed
this way:
sed -E 's/^[[:alpha:]]+ - ([[:alpha:] ]+) [[:digit:]]+.*/\1/' <<< "Calcipotriol - Daivonex Cream 50mcg/1g 30 g [1]"
=> Daivonex Cream
^[[:alpha:]]+ -
=> matches all the characters until the pattern we need to extract
([[:alpha:] ]+)
=> this is the part we want to extract
[[:digit:]]+.*
=> this is everything that comes after; we assume this part starts with a space and one or more digits, followed by any number of characters
\1
=> the part extracted by the (...)
expression above;
we replace the entire string with the matched part
You can check out this site to learn more about regexes: http://regexr.com/