Bash: parsing XML block into an array

I have an xml-like textfile which I would like to parse into an array. The input file looks like this

<AA>
  <BB>content 1</BB>
</AA>
<AA>
  <BB>content 2</BB>
</AA>

I want the output to be like (meaning one whole AA-block per array element):

ARRAY[0]=<AA><BB>content 1</BB></AA>
ARRAY[1]=<AA><BB>content 2</BB></AA>

I tried

ARRAY=(`cat input.txt | grep -A 3 \<AA\>`)

but this only returns me one line per array element. Does anyone have an idea?

标签： xml arrays bash parsing

4条回答

萌系小妹纸

2楼-- · 2019-07-19 17:27

Assuming <AA> and </AA> are fixed names, here's a pure bash solution

#!/bin/bash
declare -a ARRAY
while read -r line; do
    [ "$line" =~ ^\<BB\>$ ] && ARRAY+=("<AA>$line</AA>")
done < file.xml

0人赞添加讨论(0) 举报

贪生不怕死

3楼-- · 2019-07-19 17:31

sed '/^<AA>$/,/^<[/]AA>$/{H;/<[/]AA>/{s:.*::g;x;s:\n::g;s:[ ]*<B:<B:g;b};d}' FILE

0人赞添加讨论(0) 举报

Anthone

4楼-- · 2019-07-19 17:44

XML and shell scripts don't mix very well. If you can, consider using a different file format or a different scripting language.

(
    IFS=$'\n'
    ARRAY=($(grep -A 3 '<AA>' test.xml | awk '{printf "%s",$0} $0~"</AA>" {print}'))

    for MATCH in "${ARRAY[@]}"; do
        echo "$MATCH"
    done
)

Explanation:

Setting IFS to \n controls how array elements are split apart. We don't want them split on spaces or tabs, just new lines.
ARRAY=($(COMMAND)) captures COMMAND's output and takes each line as an array element (since we set IFS to \n).
{printf "%s",$0} prints each line without the trailing newline.
$0~"</AA>" {print} prints a newline whenever we see a closing tag </AA>.
The whole thing is in parentheses to limit the scope of the $IFS change. We don't want that change to be permanent; better to limit it to a sub-shell.

0人赞添加讨论(0) 举报

虎瘦雄心在

5楼-- · 2019-07-19 17:44

If your XML was well-formed, the following example demonstrates how it could be properly parsed using xpath:

#!/bin/bash

XML="
<doc>
<AA>
  <BB>content 1</BB>
</AA>
<AA>
  <BB>content 2</BB>
</AA>
</doc>
"

CONTENT1=`echo $XML | xmllint --xpath "string((/doc/AA/BB)[1])" -`
CONTENT2=`echo $XML | xmllint --xpath "string((/doc/AA/BB)[2])" -`

echo $CONTENT1
echo $CONTENT2

0人赞添加讨论(0) 举报

Bash: parsing XML block into an array

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间