Writing nested dictionary (forest) of a huge depth

Continued to my older question: Writing nested dictionary (forest) of a huge depth to a text file

Now I want to write the forest traversal in the BFS style: I have a huge depth dictionary that represents forest (many non-binary trees) which I want to process the forest and create a text file with the sequences of (father, son) relations from the forest, i.e. given the dictionary:

{'a': {'b': {'c': {'x': {}}, 'd': {'p': {}}}, 'g': {}, 'f': {}},
 't': {'r': {'o': {}}, 'y': {}}}

the generated text file will look like:

(ROOT,b) (ROOT,g) (ROOT,f) (b,c) (b,d) (c,x) (d,p) \n
(ROOT,r) (ROOT,y) (r,o) \n

Note that I replaces all the roots in the forest with the word "ROOT".

Here is a simple visualization of the forest:

The nested dictionary is big and iterating over it recursively makes a memory run-time error, so a "Generator style" solution as in the link at the beginning of this question would be the best.

标签： python dictionary tree nested breadth-first-search

3条回答

走好不送

2楼-- · 2019-07-13 13:07

It is easiest to traverse the structure recursively with a generator:

def flatten_forest(forest, write=True):
  def flatten(d, seen = None):
    for a, b in d.items():
      if seen is None:
       yield ('ROOT', a)
      else:
        yield (seen, a)
      if b:
        yield from flatten(b, a)
  if write:
    with open('full_flattened_tree.txt', 'a') as f:
      f.write(' '.join(map(str, flatten(forest)))+'\n')

data = {'a': {'b': {'c': {'x': {}}, 'd': {'p': {}}}, 'g': {}, 'f': {}}, 't': {'r': {'o': {}}, 'y': {}}}
for i in data.values():
  flatten_forest(i)

File output:

('ROOT', 'b') ('b', 'c') ('c', 'x') ('b', 'd') ('d', 'p') ('ROOT', 'g') ('ROOT', 'f')
('ROOT', 'r') ('r', 'o') ('ROOT', 'y')

This will work on large dictionaries:

import random, string, time
def create_structure(_len, _depth = 5, _count = 0):
 return {string.ascii_lowercase[i]:{} if _depth == _count else create_structure(random.randint(1, 26), _count = _count + 1) for i in range(_len)}

d = create_structure(26)
c = time.time()
flatten_forest(d, write=True)
print(time.time()-c)

Output:

11.871491193771362

0人赞添加讨论(0) 举报

聊天终结者

3楼-- · 2019-07-13 13:14

d = {'a': {'b': {'c': {'x': {}}, 'd': {'p': {}}}, 'g': {}, 'f': {}}, 't': {'r': {'o': {}}, 'y': {}}}
with open('file', 'w') as f:
    for r, s in d.items():
        q = []
        p = r
        while True:
            for k, v in s.items():
                f.write('(%s,%s) ' % ('ROOT' if p == r else p, k))
                if v:
                    q.append((k, v))
            if not q:
                break
            p, s = q.pop(0)
        f.write('\n')

This outputs:

(ROOT,b) (ROOT,g) (ROOT,f) (b,c) (b,d) (c,x) (d,p) 
(ROOT,r) (ROOT,y) (r,o)

0人赞添加讨论(0) 举报

Anthone

4楼-- · 2019-07-13 13:14

To perform breadth-first-search, we must keep a list of the current working nodes and the trees below them - I chose to store these in a tuple.

For instance, when we are working at the depth of the c and d nodes, this list of trees will be:

[('c': {'x': {}}), ('d': {'p': {}})]

Now while there are still trees below us (while len(trees):), we need to step down to the layer below in our tree.

The fist step is obviously to reset the trees list since we will be generating the next layer.

We then iterate over our list of trees and, for each tree, we iterate over its children.

So taking the above example, on the first iteration, the node would be 'c' and the children would be: {'x': {}} and we now want to iterate over the children. So on the first iteration of that children loop, the first child node will be 'x' and its children (c's child's children) are empty: {}.

Now, in this scope (the node's child's), if the child has children we want to add the child and its children (again, as a tuple) to the list of trees.

So to give an example where are there are children, when the current node is b, then one of its children is c and since c has children, a tuple of (c, c's children) is appended to the trees list for the next layer.

Finally, no matter whether this child had children or not, we want to the current line in the file the link between us and them. This is (node, child_node).

And that's almost it. Of course, when we finish a whole tree, we need to write a new-line to the file.

The only annoying detail is the issue of spaces between the tuples written to the file. If we always concatenated a space to the end of each tuple, we would end up with a stray space on the end of each line, as seen below which isn't ideal.

(ROOT, a)S(a,b)S

(where S represents a space)

So to redeem this, we will always concatenate a space before each tuple so long as we are not the first on a new line (line_first). For this to work, at the start of each tree (line), we set the line_first flag to True, but then in the code, we immediately set it to False on the first iteration (but skip writing the space), otherwise (future tuples) we write a space before.

And that's it. Here's the completed code:

the_tree = {'a': {'b': {'c': {'x': {}}, 'd': {'p': {}}}, 'g': {}, 'f': {}},
            't': {'r': {'o': {}}, 'y': {}}}

with open('the_file', 'w') as file:
    for tree in the_tree.values():
        line_first = True
        trees = [('ROOT', tree)]
        while len(trees):
            new_trees = []
            for node, children in trees:
                for child_node, child_children in children.items():
                    if child_children:
                        new_trees.append((child_node, child_children))
                    if line_first: line_first = False
                    else: file.write(' ')
                    file.write(f'({node}, {child_node})')
            trees = new_trees
        file.write('\n')

warning: uses f-strings which were introduced in version 3.6!

And it produces the intended output:

(ROOT, b) (ROOT, g) (ROOT, f) (b, c) (b, d) (c, x) (d, p)
(ROOT, r) (ROOT, y) (r, o)

0人赞添加讨论(0) 举报

Writing nested dictionary (forest) of a huge depth

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间