I am working with some code in PHP that grabs the referrer data from a search engine, giving me the query that the user entered.
I would then like to remove certain stop words from that string if they exist. However, the word may or may not have a space at either end.
For example, I have been using str_replace to remove a word as follows:
$keywords = str_replace("for", "", $keywords);
$keywords = str_replace("sale", "", $keywords);
but if the $keywords value is "baby formula" it will change it to "baby mula" - removing the "for" part.
Without having to create further str_replace's that account for " for" and "for " - is there a preg_replace type command I could use that would remove the given word if it is found with a space at either end?
My idea would be to put all of the stop words into an array and step through them that way and I suspect that a preg_replace is going to be quicker than stepping through multiple str_replace lines.
UPDATE: Solved thanks to you guys using the following combination:
$keywords = "...";
$stopwords = array("for","each");
foreach($stopwords as $stopWord)
{
$keywords = preg_replace("/(\b)$stopWord(\b)/", "", $keywords);
}
While Armel's answer will work, it is not performing optimally. Yes, your desired output will require wordboundaries and probably case-insensitive matching, but:
preg_match()
calls for each element in the blacklist array is not efficient. Doing so will ask the regex engine to perform wave after wave of individual keyword checks on the full string.I recommend building a single regex pattern that will check for all keywords during each step of traversing the string -- one time. To generate the single pattern dynamically, you only need to implode your blacklist array of elements with
|
(pipes) which represent the "OR" command in regex. By wrapping all of the pipe-delimited keywords in a non-capturing group ((?:...)
), the wordboundaries (\b
) serve their purpose for all keywords in the blacklist array.Code: (Demo)
Output:
p.s.
/ \K +/
is the second pattern fed topreg_replace()
which means the input string will be read a second time to search for 2 or more consecutive spaces.\K
means "restart the fullstring match from here"; effectively it releases the previously matched space. Then one or more spaces to follow are matched and replaced with an empty string.You can use word boundaries for this
or with multiple words
Try it this way