How to extract information between two unique word

I have about 150 text files filled with character information. Each file contains two unique words ()alpha and bravo and i want to extract the text between these unique words and write it to a different file.

Manually i can CTRL+F for the two words and copy the text between, i just want to know how to do this using a program (preferably Python) for many files.

标签： python parsing search text batch-file

4条回答

何必那么认真

2楼-- · 2020-01-28 08:46

You can use regular expressions for that.

>>> st = "alpha here is my text bravo"
>>> import re
>>> re.findall(r'alpha(.*?)bravo',st)
[' here is my text ']

My test.txt file

alpha here is my line
yipee
bravo

Now using open to read the file and than applying regular expressions.

>>> f = open('test.txt','r')
>>> data = f.read()
>>> x = re.findall(r'alpha(.*?)bravo',data,re.DOTALL)
>>> x
[' here is my line\nyipee\n']
>>> "".join(x).replace('\n',' ')
' here is my line yipee '
>>>

0人赞添加讨论(0) 举报

老娘就宠你

3楼-- · 2020-01-28 09:00

a = 'alpha'
b = 'bravo'
text = 'from alpha all the way to bravo and beyond.'

text.split(a)[-1].split(b)[0]
# ' all the way to '

0人赞添加讨论(0) 举报

Evening l夕情丶

4楼-- · 2020-01-28 09:03

Instead of using regular expression use Python string.find method.

>>>> unique_word_a = 'alpha'
>>>> unique_word_b = 'bravo'
>>>> s = 'blah blah alpha i am a good boy bravo blah blah'
>>>> your_string = s[s.find(unique_word_a)+len(unique_word_a):s.find(unique_word_b)].strip()
i am a good boy

0人赞添加讨论(0) 举报

家丑人穷心不美

5楼-- · 2020-01-28 09:06

str.find and its sibling rfind have start and end args.

alpha = 'qawsed'
bravo = 'azsxdc'
startpos = text.find(alpha) + len(alpha)
endpos = text.find(bravo, startpos)
do_something_with(text[startpos:endpos]

This is the fastest way if the contained text is short and near the front.

If the contained text is relatively large, use:

startpos = text.find(alpha) + len(alpha)
endpos = text.rfind(bravo)

If the contained text is short and near the end, use:

endpos = text.rfind(bravo)
startpos = text.rfind(alpha, 0, endpos - len(alpha)) + len(alpha)

The first method is in any case better than the naive method of starting the second search from the start of the text; use it if your contained text has no dominant pattern.

0人赞添加讨论(0) 举报

How to extract information between two unique word

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间