Regex is absolutely my weak point and this one has me completely stumped. I am building a fairly basic search functionality and I need to be able to alter my user input based on the following pattern:
Subject:
%22first set%22 %22second set%22-drupal -wordpress
Desired output:
+"first set" +"second set" -drupal -wordpress
I wish I could be more help as I normally like to at least post the solution I have so far, but on this one I'm at a loss.
Any help is appreciated. Thank you.
Seems your data is URL encoded. If you apply
urldecode
, you will get(I assume you have a space before
-drupal
).Now you have to add
+
. Again, I assume you have to add those before all words that don't have a-
and that are not inside quotes:Update: If you cannot use
urldecode
, you could just usestr_replace
to replace%22
with"
.Explanation: The
$1
is a backreference, which references the first()
-section in the regular expression, in this case,((?:[^%]|%[^2]|%2[^2])*)
. And the[^%]
and the alternations(...|...|...)
after it prevents%22
in between from being matched due to greediness. See http://en.wikipedia.org/wiki/Regular_expression#Lazy_quantification.I found that technique in a JavaCC example of matching block comments (
/* */
), and I can't find any other webpages explaining it, so here is a cleaner example: To match a block of text between 1234512345........12345
with no 12345 in between:/12345([^1]|1[^2]|12[^3]|123[^4]|1234[^5])*12345/
Is this what you're looking for?