I am trying to extract a selected number of words surrounding a given word. I will give example to make it clear:
string = "Education shall be directed to the full development of the human personality and to the strengthening of respect for human rights and fundamental freedoms."
1) The selected word is development and I need to get the 6 words surrounding it, and get : [to, the, full, of, the, human]
2) But if the selected word is in the beginning or in second position I still need to get 6 words, e.g:
The selected word is shall , I should get: [Education, be, directed, to , the , full]
I should use 're' module. What I managed to find until now is :
def search(text,n):
'''Searches for text, and retrieves n words either side of the text, which are retuned seperatly'''
word = r"\W*([\w]+)"
groups = re.search(r'{}\W*{}{}'.format(word*n,'place',word*n), text).groups()
return groups[:n],groups[n:]
but it helps me only with the first case. Can someone help me out with this, I will be really grateful. Thank you in advance!
I don't think regular expressions are necessary here. Assuming the text is well-constructed, just split it up into an array of words, and write a couple if-else statements to make sure it retrieves the necessary amount of surrounding words:
In your example you didn't have the target word included in the output, so I kept it out as well. If you'd like the target word included simply combine the two arrays the function returns (join them at
position
).Hope this helped!
Tricky with potential for off-by-one errors but I think this meets your spec. I have left removal of punctuation, probably best to remove it before sending the string for analysis. I assumed case was not important.
A simple approach to your problem. First separates all the words and then selects words from left and right.
This will extract all occurrences of the target word in your text, with context: