I have an xml that looks like this:
<root>
<G>
<G1>1</G1>
<G2>some text</G2>
<G3>some text</G3>
<GP>
<GP1>1</GP1>
<GP2>a</GP2>
<GP3>a</GP3>
</GP>
<GP>
<GP1>2</GP1>
<GP2>b</GP2>
<GP3>b</GP3>
</GP>
<GP>
<GP1>3</GP1>
<GP2>c</GP2>
<GP3>c</GP3>
</GP>
</G>
<G>
<G1>2</G1>
<G2>some text</G2>
<G3>some text</G3>
<GP>
<GP1>1</GP1>
<GP2>aa</GP2>
<GP3>aa</GP3>
</GP>
<GP>
<GP1>2</GP1>
<GP2>bb</GP2>
<GP3>bb</GP3>
</GP>
<GP>
<GP1>3</GP1>
<GP2>cc</GP2>
<GP3>cc</GP3>
</GP>
</G>
<G>
<G1>3</G1>
<G2>some text</G2>
<G3>some text</G3>
<GP>
<GP1>1</GP1>
<GP2>aaa</GP2>
<GP3>aaa</GP3>
</GP>
<GP>
<GP1>2</GP1>
<GP2>bbb</GP2>
<GP3>bbb</GP3>
</GP>
<GP>
<GP1>3</GP1>
<GP2>ccc</GP2>
<GP3>ccc</GP3>
</GP>
</G>
</root>
Im trying to transform this xml into a nested dictionary called "G":
{ 1: {G1: 1,
G2: some text,
G3: some text,
GP: { 1: {GP1: 1,
GP2: a,
GP3: a},
2: {GP1: 2,
GP2: b,
GP3: b},
3: {GP1: 3,
GP2: c,
GP3: c}}
},
2: {G1: 2,
G2: some text,
G3: some text,
GP: { 1: {GP1: 1,
GP2: aa,
GP3: aa},
2: {GP1: 2,
GP2: bb,
GP3: bb},
3: {GP1: 3,
GP2: cc,
GP3: cc}}
},
3: {G1: 3,
G2: some text,
G3: some text,
GP: { 1: {GP1: 1,
GP2: a,
GP3: a},
2: {GP1: 2,
GP2: bbb,
GP3: bbb},
3: {GP1: 3,
GP2: ccc,
GP3: ccc}}
}
}
My code works fine to get all elements that are straight under "G", so G1, G2 etc, but for GP I either only just get one record, either I get all of them but it duplicates the same thing couple of times either I get all 9 GP elements under one single GP in the dictionary. Here is my code:
f = 'path to file'
tree = ET.parse(f)
root = tree.getroot()
self.tree = tree
self.root = root
gs = len(self.tree.getiterator('G'))
g = {}
for i in range(0, gs):
d = {}
for elem in self.tree.getiterator('G')[i]:
if elem.text == "\n " and elem.tag not in ['GP']:
dd = {}
for parent in elem:
if parent.text == "\n ":
ddd = {}
for child in parent:
ddd[child.tag] = child.text
dd[parent.tag] = ddd
else:
dd[parent.tag] = parent.text
d[elem.tag] = dd
else:
d[elem.tag] = elem.text
g[i+1] = d
# Build GP
count = 0
gp = {}
for elem in self.tree.getiterator('GP'):
d = {}
for parent in elem:
if parent.text == "\n ":
dd = {}
for child in parent:
dd[child.tag] = child.text
d[parent.tag] = dd
else:
d[parent.tag] = parent.text
count += 1
gp[count] = d
g["GP"] = gp