可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Basically I want to replace certain words (e.g. the word "tree" with the word "pizza") in sentences. Restriction: When the word that should be replaced is between double quotes, the replace should not be performed.
Example:
The tree is green. -> REPLACE tree WITH pizza
"The" tree is "green". -> REPLACE tree WITH pizza
"The tree" is green. -> DONT REPLACE
"The tree is" green. -> DONT REPLACE
The ""tree is green. -> REPLACE tree WITH pizza
Is it possible to do this with regular expressions? I would count the number of double quotes before the word and check if it is odd or even. But is this possible using preg_replace in php?
Thanks!
//EDIT:
At the moment my code looks like the following:
preg_replace("/tree/", "pizza", $sentence)
But the problem here is to implement the logic with the double quotes. I tried things like:
preg_replace("/[^"]tree/", "pizza", $sentence)
But this does not work, because it checks only if a double quote is in front of the word. But there are examples above where this check fails.
Import is that I want to solve that problem with regex only.
回答1:
Regular expression is not a tool that will do what you need for every job. You can use regular expression for this to a certain extent, but for all cases amongst nested quotes, it continues to get more complicated.
You could use a Negative Lookahead here.
$text = preg_replace('/\btree\b(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)/i', 'pizza', $text);
See Working demo
Regular expression:
\b the boundary between a word char (\w) and not a word char
tree 'tree'
\b the boundary between a word char (\w) and not a word char
(?! look ahead to see if there is not:
[^"]* any character except: '"' (0 or more times)
" '"'
(?: group, but do not capture (0 or more times)
(?: group, but do not capture (2 times):
[^"]* any character except: '"' (0 or more times)
" '"'
){2} end of grouping
)* end of grouping
[^"]* any character except: '"' (0 or more times)
$ before an optional \n, and the end of the string
) end of look-ahead
Another option is to use controlled backtracking since your able to do this in php
$text = preg_replace('/"[^"]*"(*SKIP)(*FAIL)|\btree\b/i', 'pizza', $text);
See Working demo
The idea is to skip content in quotations. I first match the quotation followed by any character except "
followed by a quotation and then make the subpattern fail and force the regular expression engine to not retry the substring with an other alternative with (*SKIP)
and (*FAIL)
backtracking control verbs.
回答2:
There is a handy trick using some hidden regex powers :
~".*?"(*SKIP)(*FAIL)|\btree\b~s
Explanation:
~ # start delimiter (we could have used /, #, @ etc...)
" # match a double quote
.*? # match anything ungreedy until ...
" # match a double quote
(*SKIP)(*FAIL) # make it fail
| # or
\btree\b # match a tree with wordboundaries
~ # end delimiter
s # setting the s modifier to match newlines with dots .
In actual PHP code, you would want to use preg_quote()
to escape regex characters. Here's a little snippet:
$search = 'tree';
$replace = 'plant';
$input = 'The tree is green.
"The" tree is "green".
"The tree" is green.
"The tree is" green.
The ""tree is green.';
$regex = '~".*?"(*SKIP)(*FAIL)|\b' . preg_quote($search, '~') . '\b~s';
$output = preg_replace($regex, $replace, $input);
echo $output;
Online regex demo Online PHP demo
回答3:
This one matches tree
using a lookahead:
$pattern = '~\btree\b(?=([^"]|("[^"]*"))*$)~im';
$str = '
The tree is green. -> REPLACE tree WITH pizza
"The" tree is "green". -> REPLACE tree WITH pizza
"The tree" is green. -> DONT REPLACE
"The tree is" green. -> DONT REPLACE
The ""tree is green. -> REPLACE tree WITH pizza';
echo "<pre>".preg_replace($pattern,"pizza",$str)."</pre>";
It looks for tree
, if found, matches it only, if followed by characters, that are not double-quotes [^"]
or quoted groups "[^"]*"
until end of line using modifiers i (PCRE_CASELESS) and m (PCRE_MULTILINE).
I don't want a green pizza! Merry Xmas :-)
回答4:
use this pattern tree(?=(?:(?:[^"]*"){2})*[^"]*$)
with gm
options Demo
this is how it is constructed from the ground up:
tree(?=[^"]*")
"tree" that sees any amount of non-quote characters followed by a quote
tree(?=([^"]*"){2})
~ twice
tree(?=(([^"]*"){2})*)
~ as many times as possible
tree(?=(([^"]*"){2})*[^"]*)
~ then optional non-quote characters
tree(?=(([^"]*"){2})*[^"]*$)
~ to the end
tree(?=(?:(?:[^"]*"){2})*[^"]*$)
add non-capturing groups
php demo
回答5:
I'm building a JS minimizer and this page helped me a lot with getting to the right regular expressions for it. But what this page has not answered sofar is what to do when a quoted string contains escaped quotes. I bookmarked this page for when I found the recipe.
/*
Regular expression group 'NotBetween'.
*/
function rgxgNotBetween($chars, $sep="|")
{
$chars = explode($sep, $chars);
$NB = [];
foreach($chars as $CHR){
//(*PRUNE) steps over $CHR when it is escaped; that is, preceded by a backslash.
$NB[] = "(?:$CHR(?:\\\\$CHR(*PRUNE)|.)*?$CHR)";
}
$NB = join("|", $NB);
return "(?:(?:$NB)(*SKIP)(*FAIL))";
}
function jsIdReplace($search, $replace, $source)
{
$search = ""
//SKIP further matching when between...
//double or single qoutes or js regular expression slashes
.rgxgNotBetween("\x22|\x27|\/")
//match when NO preceding '.' and no ending ':' (object properties)
."|(?:(?<!\.)\b$search\b(?!:))"
//but do match when preceding '?' or ':' AND ending ':' (ternary statements)
."|(?:(?<=\?|:)\b$search\b(?=:))";
return preg_replace($search, $replace, $source);
}
function jsNoComments($source)
{
//js comment markers NOT between quotes
$NBQ = rgxgNotBetween("\x22|\x27");
//block comments
$source = preg_replace("#$NBQ|/\*.*?\*/#s", "", $source);
//line comments; not preceded by backslash
$source = preg_replace("#$NBQ|\h*(?<!\\\\)//.*\n?#", "", $source);
return $source;
}