I have a string with each character being separated by a pipe character (including the "|"
s themselves), for example:
"f|u|n|n|y||b|o|y||a||c|a|t"
I would like to replace all "|"
s which are not next to another "|"
with nothing, to get the result:
"funny|boy|a|cat"
I tried using mytext.replace("|", "")
, but that removes everything and makes one long word.
This can be achieved with a relatively simple regex without having to chain
str.replace
:Explanation: \|(?!\|) will look for a
|
character which is not followed by another|
character. (?!foo) means negative lookahead, ensuring that whatever you are matching is not followed by foo.You can use a positive look ahead regex to replace the pips that are followed with an alphabetical character:
If you are going to use a regex, the fastest method which is to split and join:
The
str.split
andstr.replace
methods are still faster:Depending on what can be in the string will determine the
str.replace
approach but thestr.split
method will work no matter what characters are in the string.Use regular expressions.
Output :
An another regex option with capturing group.
Explanation:
\|
- Matches all the pipe characters.(\|?)
- Captures the following pipe character if present. Then replacing the match with\1
will bring you the content of first capturing group. So in the place of single pip, it would give an empty string and in||
, it would bring the second pipe character.Another trick through word and non-word boundaries...
Yet another one using negative lookbehind..
Bonus...
You could replace the double pipe by something else first to make sure that you can still recognize them after removing the single pipes. And then you replace those back to a pipe:
You should try to choose a replacement value that is a safe temporary value and does not naturally appear in your text. Otherwise you will run into conflicts where that character is replace even though it wasn’t a double pipe originally. So don’t use a dash as above if your text may contain a dash. You can also use multiple characters at once, for example:
'<THIS IS A TEMPORARY PIPE>'
.If you want to avoid this conflict completely, you could also solve this entirely different. For example, you could split the string by the double pipes first and perform a replacement on each substring, ultimately joining them back together:
And of course, you could also use regular expressions to replace those pipes that are not followed by another pipe: