I am trying to remove words from a string if they match a list.
x = "How I Met Your Mother 7x17 (HDTV-LOL) [VTV] - Mon, 20 Feb 2012"
tags = ['HDTV', 'LOL', 'VTV', 'x264', 'DIMENSION', 'XviD', '720P', 'IMMERSE']
print x
for tag in tags:
if tag in x:
print x.replace(tag, '')
It produces this output:
How I Met Your Mother 7x17 (HDTV-LOL) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (-LOL) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (HDTV-) [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 (HDTV-LOL) [] - Mon, 20 Feb 2012
I want it to remove all the words matching the list.
You are not keeping the result of x.replace()
. Try the following instead:
for tag in tags:
x = x.replace(tag, '')
print x
Note that your approach matches any substring, and not just full words. For example, it would remove the LOL
in RUN LOLA RUN
.
One way to address this would be to enclose each tag in a pair of r'\b'
strings, and look for the resulting regular expression. The r'\b'
would only match at word boundaries:
for tag in tags:
x = re.sub(r'\b' + tag + r'\b', '', x)
The method str.replace()
does not change the string in place -- strings are immutable in Python. You have to bind x
to the new string returned by replace()
in each iteration:
for tag in tags:
x = x.replace(tag, "")
Note that the if
statement is redundant; str.replace()
won't do anything if it doesn't find a match.
Using your variables tags
and x
, you can use this:
output = reduce(lambda a,b: a.replace(b, ''), tags, x)
returns:
'How I Met Your Mother 7x17 (-) [] - Mon, 20 Feb 2012'
(1) x.replace(tag, '')
does not modify x
, but rather returns a new string with the replacement.
(2) why are you printing on each iteration?
The simplest modification you could do would be:
for tag in tags:
x = x.replace(tag, '')