I have an xml-like textfile which I would like to parse into an array. The input file looks like this
<AA>
<BB>content 1</BB>
</AA>
<AA>
<BB>content 2</BB>
</AA>
I want the output to be like (meaning one whole AA-block per array element):
ARRAY[0]=<AA><BB>content 1</BB></AA>
ARRAY[1]=<AA><BB>content 2</BB></AA>
I tried
ARRAY=(`cat input.txt | grep -A 3 \<AA\>`)
but this only returns me one line per array element. Does anyone have an idea?
Assuming
<AA>
and</AA>
are fixed names, here's a pure bash solutionXML and shell scripts don't mix very well. If you can, consider using a different file format or a different scripting language.
Explanation:
\n
controls how array elements are split apart. We don't want them split on spaces or tabs, just new lines.ARRAY=($(COMMAND))
captures COMMAND's output and takes each line as an array element (since we set IFS to\n
).{printf "%s",$0}
prints each line without the trailing newline.$0~"</AA>" {print}
prints a newline whenever we see a closing tag</AA>
.$IFS
change. We don't want that change to be permanent; better to limit it to a sub-shell.If your XML was well-formed, the following example demonstrates how it could be properly parsed using xpath: