Replace single instances of a character that is so

I have a string with each character being separated by a pipe character (including the "|"s themselves), for example:

"f|u|n|n|y||b|o|y||a||c|a|t"

I would like to replace all "|"s which are not next to another "|" with nothing, to get the result:

"funny|boy|a|cat"

I tried using mytext.replace("|", ""), but that removes everything and makes one long word.

标签： python regex string replace

7条回答

Deceive 欺骗

2楼-- · 2019-06-14 22:27

This can be achieved with a relatively simple regex without having to chain str.replace:

>>> import re
>>> s = "f|u|n|n|y||b|o|y||a||c|a|t"
>>> re.sub('\|(?!\|)' , '', s)
'funny|boy|a|cat'

Explanation: \|(?!\|) will look for a | character which is not followed by another | character. (?!foo) means negative lookahead, ensuring that whatever you are matching is not followed by foo.

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

3楼-- · 2019-06-14 22:28

You can use a positive look ahead regex to replace the pips that are followed with an alphabetical character:

>>> import re
>>> st = "f|u|n|n|y||b|o|y||a||c|a|t" 
>>> re.sub(r'\|(?=[a-z]|$)',r'',st)
'funny|boy|a|cat'

0人赞添加讨论(0) 举报

仙女界的扛把子

4楼-- · 2019-06-14 22:39

If you are going to use a regex, the fastest method which is to split and join:

In [18]: r = re.compile("\|(?!\|)")

In [19]: timeit "".join(r.split(s))
100000 loops, best of 3: 2.65 µs per loop
In [20]:  "".join(r.split(s))
Out[20]: 'funny|boy|a|cat'
In [30]: r1 = re.compile('\|(?!\|)')

In [31]: timeit r1.sub("", s)
100000 loops, best of 3: 3.20 µs per loop

In [33]: r2 = re.compile("(?!\|\|)(\|)")
In [34]: timeit r2.sub("",s)
100000 loops, best of 3: 3.96 µs per loop

The str.split and str.replace methods are still faster:

In [38]: timeit '|'.join([ch.replace('|', '') for ch in s.split('||')])
The slowest run took 11.18 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 1.71 µs per loop

In [39]: timeit s.replace('||','|||')[::2]
1000000 loops, best of 3: 536 ns per loop

In [40]: timeit s.replace('||','~').replace('|','').replace('~','|')
1000000 loops, best of 3: 881 ns per loop

Depending on what can be in the string will determine the str.replaceapproach but the str.split method will work no matter what characters are in the string.

0人赞添加讨论(0) 举报

做个烂人

5楼-- · 2019-06-14 22:41

Use regular expressions.

import re

line = "f|u|n|n|y||b|o|y||a||c|a|t" 
line = re.sub("(?!\|\|)(\|)", "", line)

print(line)

Output :

funny|boy|a|cat

0人赞添加讨论(0) 举报

戒情不戒烟

6楼-- · 2019-06-14 22:49

An another regex option with capturing group.

>>> import re
>>> re.sub(r'\|(\|?)', r'\1', "f|u|n|n|y||b|o|y||a||c|a|t")
'funny|boy|a|cat'

Explanation:

\| - Matches all the pipe characters. (\|?) - Captures the following pipe character if present. Then replacing the match with \1 will bring you the content of first capturing group. So in the place of single pip, it would give an empty string and in ||, it would bring the second pipe character.

Another trick through word and non-word boundaries...

>>> re.sub(r'\b\|\b|\b\|\B', '', "f|u|n|n|y||b|o|y||a||c|a|t|")
'funny|boy|a|cat'

Yet another one using negative lookbehind..

>>> re.sub(r'(?<!\|)\|', '', "f|u|n|n|y||b|o|y||a||c|a|t|")
'funny|boy|a|cat'

Bonus...

>>> re.sub(r'\|(\|)|\|', lambda m: m.group(1) if m.group(1) else '', "f|u|n|n|y||b|o|y||a||c|a|t")
'funny|boy|a|cat'

0人赞添加讨论(0) 举报

男人必须洒脱

7楼-- · 2019-06-14 22:53

You could replace the double pipe by something else first to make sure that you can still recognize them after removing the single pipes. And then you replace those back to a pipe:

>>> t = "f|u|n|n|y||b|o|y||a||c|a|t"
>>> t.replace('||', '|-|').replace('|', '').replace('-', '|')
'funny|boy|a|cat'

You should try to choose a replacement value that is a safe temporary value and does not naturally appear in your text. Otherwise you will run into conflicts where that character is replace even though it wasn’t a double pipe originally. So don’t use a dash as above if your text may contain a dash. You can also use multiple characters at once, for example: '<THIS IS A TEMPORARY PIPE>'.

If you want to avoid this conflict completely, you could also solve this entirely different. For example, you could split the string by the double pipes first and perform a replacement on each substring, ultimately joining them back together:

>>> '|'.join([s.replace('|', '') for s in t.split('||')])
'funny|boy|a|cat'

And of course, you could also use regular expressions to replace those pipes that are not followed by another pipe:

>>> import re
>>> re.sub('\|(?!\|)', '', t)
'funny|boy|a|cat'

0人赞添加讨论(0) 举报

1 2 下一页

Replace single instances of a character that is so

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间