Regex is absolutely my weak point and this one has me completely stumped. I am building a fairly basic search functionality and I need to be able to alter my user input based on the following pattern:
Subject:
%22first set%22 %22second set%22-drupal -wordpress
Desired output:
+"first set" +"second set" -drupal -wordpress
I wish I could be more help as I normally like to at least post the solution I have so far, but on this one I'm at a loss.
Any help is appreciated. Thank you.
preg_replace('/%22((?:[^%]|%[^2]|%2[^2])*)%22/', '+"$1"', $str);
Explanation: The $1
is a backreference, which references the first ()
-section in the regular expression, in this case, ((?:[^%]|%[^2]|%2[^2])*)
. And the [^%]
and the alternations (...|...|...)
after it prevents %22
in between from being matched due to greediness. See http://en.wikipedia.org/wiki/Regular_expression#Lazy_quantification.
I found that technique in a JavaCC example of matching block comments (/* */
), and I can't find any other webpages explaining it, so here is a cleaner example: To match a block of text between 12345 12345........12345
with no 12345 in between: /12345([^1]|1[^2]|12[^3]|123[^4]|1234[^5])*12345/
Seems your data is URL encoded. If you apply urldecode
, you will get
"first set" "second set" -drupal -wordpress
(I assume you have a space before -drupal
).
Now you have to add +
. Again, I assume you have to add those before all words that don't have a -
and that are not inside quotes:
$str = '"first set" "second set" -drupal -wordpress foo';
echo preg_replace('#( |^)(?!(?:\w+"|-| ))#','\1+', $str));
// prints +"first set" +"second set" -drupal -wordpress +foo
Update: If you cannot use urldecode
, you could just use str_replace
to replace %22
with "
.
Is this what you're looking for?
<?php
$input = "%22first set%22 %22second set%22-drupal -wordpress";
$res = preg_replace( "/\%22(.+?)\%22/","+\"(\\1)\" ", $input);
print $res;
?>