Highlight verb phrases using spacy and html

I have devised a code to red font verb phrases and output it as HTML.

from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
import codecs
nlp = en_core_web_sm.load()
sentence = 'The author is writing a new book. The dog is barking.'
pattern = r'<VERB>?<ADV>*<VERB>+'
doc = textacy.Doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.pos_regex_matches(doc, pattern)
with open("my.html","w") as fp:
    for list in lists:
        search_word = (list.text)
        fp.write(sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))

Current output

The author **is writing** a new book. The dog is barking.The author is writing a new book. The dog **is barking.**

The sentence is getting repeated twice and first is writing and last is barking is detected.

Expected output:

The author **is writing** a new book. The dog **is barking.**

Should i have to do a sentence tokenization before sending it to list check? Please help?

标签： html beautifulsoup nltk spacy

1条回答

▲ chillily

2楼-- · 2020-07-22 05:01

Found an alternative and more logical way. Instead of replacing in whole sentence, it is better to replace in a sentence which have the pattern.

with open("my.html","w") as fp:
for _list in lists:
    search_word = (_list.text)
    containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]
    fp.write(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))

the above code will write the sentences separately. If you want to do it as a sentence, append the modifications to a list and join them before writing to a file as below.

mod_sentence = []
for _list in lists:
    search_word = (_list.text)
    containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]+'.'
    mod_sentence.append(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
with open("my.html","w") as fp:
    fp.write(''.join(mod_sentence))

Hope this helps! Cheers!

0人赞添加讨论(0) 举报

Highlight verb phrases using spacy and html

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间