Replace single instances of a character that is so

2019-06-14 22:11发布

I have a string with each character being separated by a pipe character (including the "|"s themselves), for example:

"f|u|n|n|y||b|o|y||a||c|a|t"

I would like to replace all "|"s which are not next to another "|" with nothing, to get the result:

"funny|boy|a|cat"

I tried using mytext.replace("|", ""), but that removes everything and makes one long word.

7条回答
Deceive 欺骗
2楼-- · 2019-06-14 22:27

This can be achieved with a relatively simple regex without having to chain str.replace:

>>> import re
>>> s = "f|u|n|n|y||b|o|y||a||c|a|t"
>>> re.sub('\|(?!\|)' , '', s)
'funny|boy|a|cat'

Explanation: \|(?!\|) will look for a | character which is not followed by another | character. (?!foo) means negative lookahead, ensuring that whatever you are matching is not followed by foo.

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-06-14 22:28

You can use a positive look ahead regex to replace the pips that are followed with an alphabetical character:

>>> import re
>>> st = "f|u|n|n|y||b|o|y||a||c|a|t" 
>>> re.sub(r'\|(?=[a-z]|$)',r'',st)
'funny|boy|a|cat'
查看更多
仙女界的扛把子
4楼-- · 2019-06-14 22:39

If you are going to use a regex, the fastest method which is to split and join:

In [18]: r = re.compile("\|(?!\|)")

In [19]: timeit "".join(r.split(s))
100000 loops, best of 3: 2.65 µs per loop
In [20]:  "".join(r.split(s))
Out[20]: 'funny|boy|a|cat'
In [30]: r1 = re.compile('\|(?!\|)')

In [31]: timeit r1.sub("", s)
100000 loops, best of 3: 3.20 µs per loop

In [33]: r2 = re.compile("(?!\|\|)(\|)")
In [34]: timeit r2.sub("",s)
100000 loops, best of 3: 3.96 µs per loop

The str.split and str.replace methods are still faster:

In [38]: timeit '|'.join([ch.replace('|', '') for ch in s.split('||')])
The slowest run took 11.18 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 1.71 µs per loop

In [39]: timeit s.replace('||','|||')[::2]
1000000 loops, best of 3: 536 ns per loop

In [40]: timeit s.replace('||','~').replace('|','').replace('~','|')
1000000 loops, best of 3: 881 ns per loop

Depending on what can be in the string will determine the str.replaceapproach but the str.split method will work no matter what characters are in the string.

查看更多
做个烂人
5楼-- · 2019-06-14 22:41

Use regular expressions.

import re

line = "f|u|n|n|y||b|o|y||a||c|a|t" 
line = re.sub("(?!\|\|)(\|)", "", line)

print(line)

Output :

funny|boy|a|cat
查看更多
戒情不戒烟
6楼-- · 2019-06-14 22:49

An another regex option with capturing group.

>>> import re
>>> re.sub(r'\|(\|?)', r'\1', "f|u|n|n|y||b|o|y||a||c|a|t")
'funny|boy|a|cat'

Explanation:

\| - Matches all the pipe characters. (\|?) - Captures the following pipe character if present. Then replacing the match with \1 will bring you the content of first capturing group. So in the place of single pip, it would give an empty string and in ||, it would bring the second pipe character.

Another trick through word and non-word boundaries...

>>> re.sub(r'\b\|\b|\b\|\B', '', "f|u|n|n|y||b|o|y||a||c|a|t|")
'funny|boy|a|cat'

Yet another one using negative lookbehind..

>>> re.sub(r'(?<!\|)\|', '', "f|u|n|n|y||b|o|y||a||c|a|t|")
'funny|boy|a|cat'

Bonus...

>>> re.sub(r'\|(\|)|\|', lambda m: m.group(1) if m.group(1) else '', "f|u|n|n|y||b|o|y||a||c|a|t")
'funny|boy|a|cat'
查看更多
男人必须洒脱
7楼-- · 2019-06-14 22:53

You could replace the double pipe by something else first to make sure that you can still recognize them after removing the single pipes. And then you replace those back to a pipe:

>>> t = "f|u|n|n|y||b|o|y||a||c|a|t"
>>> t.replace('||', '|-|').replace('|', '').replace('-', '|')
'funny|boy|a|cat'

You should try to choose a replacement value that is a safe temporary value and does not naturally appear in your text. Otherwise you will run into conflicts where that character is replace even though it wasn’t a double pipe originally. So don’t use a dash as above if your text may contain a dash. You can also use multiple characters at once, for example: '<THIS IS A TEMPORARY PIPE>'.

If you want to avoid this conflict completely, you could also solve this entirely different. For example, you could split the string by the double pipes first and perform a replacement on each substring, ultimately joining them back together:

>>> '|'.join([s.replace('|', '') for s in t.split('||')])
'funny|boy|a|cat'

And of course, you could also use regular expressions to replace those pipes that are not followed by another pipe:

>>> import re
>>> re.sub('\|(?!\|)', '', t)
'funny|boy|a|cat'
查看更多
登录 后发表回答