How can I split a string by a delimiter, but not if it is escaped? For example, I have a string:
1|2\|2|3\\|4\\\|4
The delimiter is |
and an escaped delimiter is \|
. Furthermore I want to ignore escaped backslashes, so in \\|
the |
would still be a delimiter.
So with the above string the result should be:
[0] => 1
[1] => 2\|2
[2] => 3\\
[3] => 4\\\|4
Use dark magic:
\\\\.
matches a backslash followed by a character,(*SKIP)(*FAIL)
skips it and\|
matches your delimiter.Instead of
split(...)
, it's IMO more intuitive to use some sort of "scan" function that operates like a lexical tokenizer. In PHP that would be thepreg_match_all
function. You simply say you want to match:\
or|
\
followed by a\
or|
The following demo:
will print:
Regex is painfully slow. A better method is removing escaped characters from the string prior to splitting then putting them back in:
which splits on ',' but not if escaped with "|". It also supports double escaping so "||" becomes a single "|" after the split happens:
For future readers, here is a universal solution. It is based on NikiC's idea with
(*SKIP)(*FAIL)
:Make a try:
Output:
Note: There is a theoretical level problem:
implode('::', ['a:', ':b'])
andimplode('::', ['a', '', 'b'])
result the same string:'a::::b'
. Imploding can be also an interesting problem.Recently I devised a solution:
But the black magic solution is still three times faster.