Consider:
$a = 'How are you?';
if ($a contains 'are')
echo 'true';
Suppose I have the code above, what is the correct way to write the statement if ($a contains 'are')
?
Consider:
$a = 'How are you?';
if ($a contains 'are')
echo 'true';
Suppose I have the code above, what is the correct way to write the statement if ($a contains 'are')
?
Peer to SamGoody and Lego Stormtroopr comments.
If you are looking for a PHP algorithm to rank search results based on proximity/relevance of multiple words here comes a quick and easy way of generating search results with PHP only:
Issues with the other boolean search methods such as
strpos()
,preg_match()
,strstr()
orstristr()
PHP method based on Vector Space Model and tf-idf (term frequency–inverse document frequency):
It sounds difficult but is surprisingly easy.
If we want to search for multiple words in a string the core problem is how we assign a weight to each one of them?
If we could weight the terms in a string based on how representative they are of the string as a whole, we could order our results by the ones that best match the query.
This is the idea of the vector space model, not far from how SQL full-text search works:
CASE 1
RESULT
CASE 2
RESULTS
CASE 3
RESULTS
There are plenty of improvements to be made but the model provides a way of getting good results from natural queries, which don't have boolean operators such as
strpos()
,preg_match()
,strstr()
orstristr()
.NOTA BENE
Optionally eliminating redundancy prior to search the words
thereby reducing index size and resulting in less storage requirement
less disk I/O
faster indexing and a consequently faster search.
1. Normalisation
2. Stopword elimination
3. Dictionary substitution
Replace words with others which have an identical or similar meaning. (ex:replace instances of 'hungrily' and 'hungry' with 'hunger')
Further algorithmic measures (snowball) may be performed to further reduce words to their essential meaning.
The replacement of colour names with their hexadecimal equivalents
The reduction of numeric values by reducing precision are other ways of normalising the text.
RESOURCES
Maybe you could use something like this:
In order to find a 'word', rather than the occurrence of a series of letters that could in fact be a part of another word, the following would be a good solution.
Lot of answers that use
substr_count
checks if the result is>0
. But since theif
statement considers zero the same as false, you can avoid that check and write directly:To check if not present, add the
!
operator:While most of these answers will tell you if a substring appears in your string, that's usually not what you want if you're looking for a particular word, and not a substring.
What's the difference? Substrings can appear within other words:
One way to mitigate this would be to use a regular expression coupled with word boundaries (
\b
):This method doesn't have the same false positives noted above, but it does have some edge cases of its own. Word boundaries match on non-word characters (
\W
), which are going to be anything that isn'ta-z
,A-Z
,0-9
, or_
. That means digits and underscores are going to be counted as word characters and scenarios like this will fail:If you want anything more accurate than this, you'll have to start doing English language syntax parsing, and that's a pretty big can of worms (and assumes proper use of syntax, anyway, which isn't always a given).