I have a boolean search string for third party index search service: Germany or (Indian, Tech*)
I want my result to be after processing: Germany[45] or (Indian[45], Tech*[45])
. Here 45 is the weight needed by the search service.
After googling around for long I was able to get the result: Germany[45] or (Indian[45], Tech[45]*)
. Here you can see *
has came after [45]
which is not required.
The output should be: Germany[45] or (Indian[45], Tech*[45])
, look for *
before [45]
.
Code:
preg_replace('/([a-z0-9\*\.])+(\b(?<!or|and|not))/i', '$0'."[45]", $term);
So the simple concept behind it is to apply weight to words, but not to or/and/not
etc. boolean search sensitive words. Please help me to fine tune the regexp or give a new regex to get required result.
Using a lookahead works like a charm:
You can try it HERE
Edit: Also no need to escape "*" and "." inside a character class
The problem was that you were only getting matches that include a
\b
- a word boundary. Since an asterisk is a non-word character, it was eliminating it from the match, so the solution was to allow for either a word boundary or an asterisk(\*|\b)
:However, it's simpler to do it with a negative lookahead:
Note: Within character classes asterisks and periods are not metacharacters, so they don't need to be escaped as you had in your original expression:
[a-z0-9\*\.]+
.