如何扁平化解析树,并存储在进一步的字符串操作蟒蛇NLTK字符串(How to flatten the

2019-10-22 01:46发布

我试图从树状结构下面给出的一个弄扁树。

我想这整个树犹如没有坏树检测到的错误字符串:

( (S (NP-SBJ (NP (DT The) (JJ high) (JJ seven-day) )(PP (IN of) (NP (DT the) (CD 400) (NNS money) )))(VP (VBD was) (NP-PRD (CD 8.12) (NN %) )(, ,) (ADVP (RB down) (PP (IN from) (NP (CD 8.14) (NN %) ))))(. .) ))

Answer 1:

Python的NLTK提供了树的操作和节点抽出功能

from nltk.tree import Tree
for tr in trees:
    tr1 = str(tr)
    s1 = Tree.fromstring(tr1)
    s2 = s1.productions()


Answer 2:

您可以使用STR功能,然后分裂和加入如下树转换成字符串:

parse_string = ' '.join(str(tree).split()) 

print parse_string


Answer 3:

所述文档提供了pprint()该展平了树成一行的方法。

解析这句话:

string = "My name is Ross and I am cool. What's going on world? I'm looking for friends."

然后调用pprint()产生以下:

u"(NP+SBAR+S\n  (S\n    (NP (PRP$ my) (NN name))\n    (VP\n      (VBZ is)\n      (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.))\n      (SBAR\n        (WHNP (WP What))\n        (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world)))))\n    (. ?))\n  (S\n    (NP (PRP I))\n    (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends))))\n    (. .)))"

从这一点来说,如果你想删除的标签和换行,你可以使用下面的splitjoin (见这里) :

splitted = tree.pprint().split()
flat_tree = ' '.join(splitted)

执行该得到这对我来说:

u"(NP+SBAR+S (S (NP (PRP$ my) (NN name)) (VP (VBZ is) (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.)) (SBAR (WHNP (WP What)) (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world))))) (. ?)) (S (NP (PRP I)) (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends)))) (. .)))"


文章来源: How to flatten the parse tree and store in a string for further string operations python nltk