Bash: parsing XML block into an array

2019-07-19 17:14发布

I have an xml-like textfile which I would like to parse into an array. The input file looks like this

<AA>
  <BB>content 1</BB>
</AA>
<AA>
  <BB>content 2</BB>
</AA>

I want the output to be like (meaning one whole AA-block per array element):

ARRAY[0]=<AA><BB>content 1</BB></AA>
ARRAY[1]=<AA><BB>content 2</BB></AA>

I tried

ARRAY=(`cat input.txt | grep -A 3 \<AA\>`)

but this only returns me one line per array element. Does anyone have an idea?

4条回答
萌系小妹纸
2楼-- · 2019-07-19 17:27

Assuming <AA> and </AA> are fixed names, here's a pure bash solution

#!/bin/bash
declare -a ARRAY
while read -r line; do
    [ "$line" =~ ^\<BB\>$ ] && ARRAY+=("<AA>$line</AA>")
done < file.xml
查看更多
贪生不怕死
3楼-- · 2019-07-19 17:31
sed '/^<AA>$/,/^<[/]AA>$/{H;/<[/]AA>/{s:.*::g;x;s:\n::g;s:[ ]*<B:<B:g;b};d}' FILE
查看更多
Anthone
4楼-- · 2019-07-19 17:44

XML and shell scripts don't mix very well. If you can, consider using a different file format or a different scripting language.

(
    IFS=$'\n'
    ARRAY=($(grep -A 3 '<AA>' test.xml | awk '{printf "%s",$0} $0~"</AA>" {print}'))

    for MATCH in "${ARRAY[@]}"; do
        echo "$MATCH"
    done
)

Explanation:

  1. Setting IFS to \n controls how array elements are split apart. We don't want them split on spaces or tabs, just new lines.
  2. ARRAY=($(COMMAND)) captures COMMAND's output and takes each line as an array element (since we set IFS to \n).
  3. {printf "%s",$0} prints each line without the trailing newline.
  4. $0~"</AA>" {print} prints a newline whenever we see a closing tag </AA>.
  5. The whole thing is in parentheses to limit the scope of the $IFS change. We don't want that change to be permanent; better to limit it to a sub-shell.
查看更多
虎瘦雄心在
5楼-- · 2019-07-19 17:44

If your XML was well-formed, the following example demonstrates how it could be properly parsed using xpath:

#!/bin/bash

XML="
<doc>
<AA>
  <BB>content 1</BB>
</AA>
<AA>
  <BB>content 2</BB>
</AA>
</doc>
"

CONTENT1=`echo $XML | xmllint --xpath "string((/doc/AA/BB)[1])" -`
CONTENT2=`echo $XML | xmllint --xpath "string((/doc/AA/BB)[2])" -`

echo $CONTENT1
echo $CONTENT2
查看更多
登录 后发表回答