我试图从树状结构下面给出的一个弄扁树。
我想这整个树犹如没有坏树检测到的错误字符串:
( (S (NP-SBJ (NP (DT The) (JJ high) (JJ seven-day) )(PP (IN of) (NP (DT the) (CD 400) (NNS money) )))(VP (VBD was) (NP-PRD (CD 8.12) (NN %) )(, ,) (ADVP (RB down) (PP (IN from) (NP (CD 8.14) (NN %) ))))(. .) ))
Python的NLTK提供了树的操作和节点抽出功能
from nltk.tree import Tree
for tr in trees:
tr1 = str(tr)
s1 = Tree.fromstring(tr1)
s2 = s1.productions()
您可以使用STR功能,然后分裂和加入如下树转换成字符串:
parse_string = ' '.join(str(tree).split())
print parse_string
所述文档提供了pprint()
该展平了树成一行的方法。
解析这句话:
string = "My name is Ross and I am cool. What's going on world? I'm looking for friends."
然后调用pprint()
产生以下:
u"(NP+SBAR+S\n (S\n (NP (PRP$ my) (NN name))\n (VP\n (VBZ is)\n (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.))\n (SBAR\n (WHNP (WP What))\n (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world)))))\n (. ?))\n (S\n (NP (PRP I))\n (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends))))\n (. .)))"
从这一点来说,如果你想删除的标签和换行,你可以使用下面的split
和join
(见这里) :
splitted = tree.pprint().split()
flat_tree = ' '.join(splitted)
执行该得到这对我来说:
u"(NP+SBAR+S (S (NP (PRP$ my) (NN name)) (VP (VBZ is) (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.)) (SBAR (WHNP (WP What)) (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world))))) (. ?)) (S (NP (PRP I)) (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends)))) (. .)))"
文章来源: How to flatten the parse tree and store in a string for further string operations python nltk