How do I check if a string contains a specific wor

2018-12-31 01:12发布

Consider:

$a = 'How are you?';

if ($a contains 'are')
    echo 'true';

Suppose I have the code above, what is the correct way to write the statement if ($a contains 'are')?

30条回答
何处买醉
2楼-- · 2018-12-31 01:29

Peer to SamGoody and Lego Stormtroopr comments.

If you are looking for a PHP algorithm to rank search results based on proximity/relevance of multiple words here comes a quick and easy way of generating search results with PHP only:

Issues with the other boolean search methods such as strpos(), preg_match(), strstr() or stristr()

  1. can't search for multiple words
  2. results are unranked

PHP method based on Vector Space Model and tf-idf (term frequency–inverse document frequency):

It sounds difficult but is surprisingly easy.

If we want to search for multiple words in a string the core problem is how we assign a weight to each one of them?

If we could weight the terms in a string based on how representative they are of the string as a whole, we could order our results by the ones that best match the query.

This is the idea of the vector space model, not far from how SQL full-text search works:

function get_corpus_index($corpus = array(), $separator=' ') {

    $dictionary = array();

    $doc_count = array();

    foreach($corpus as $doc_id => $doc) {

        $terms = explode($separator, $doc);

        $doc_count[$doc_id] = count($terms);

        // tf–idf, short for term frequency–inverse document frequency, 
        // according to wikipedia is a numerical statistic that is intended to reflect 
        // how important a word is to a document in a corpus

        foreach($terms as $term) {

            if(!isset($dictionary[$term])) {

                $dictionary[$term] = array('document_frequency' => 0, 'postings' => array());
            }
            if(!isset($dictionary[$term]['postings'][$doc_id])) {

                $dictionary[$term]['document_frequency']++;

                $dictionary[$term]['postings'][$doc_id] = array('term_frequency' => 0);
            }

            $dictionary[$term]['postings'][$doc_id]['term_frequency']++;
        }

        //from http://phpir.com/simple-search-the-vector-space-model/

    }

    return array('doc_count' => $doc_count, 'dictionary' => $dictionary);
}

function get_similar_documents($query='', $corpus=array(), $separator=' '){

    $similar_documents=array();

    if($query!=''&&!empty($corpus)){

        $words=explode($separator,$query);

        $corpus=get_corpus_index($corpus, $separator);

        $doc_count=count($corpus['doc_count']);

        foreach($words as $word) {

            if(isset($corpus['dictionary'][$word])){

                $entry = $corpus['dictionary'][$word];


                foreach($entry['postings'] as $doc_id => $posting) {

                    //get term frequency–inverse document frequency
                    $score=$posting['term_frequency'] * log($doc_count + 1 / $entry['document_frequency'] + 1, 2);

                    if(isset($similar_documents[$doc_id])){

                        $similar_documents[$doc_id]+=$score;

                    }
                    else{

                        $similar_documents[$doc_id]=$score;

                    }
                }
            }
        }

        // length normalise
        foreach($similar_documents as $doc_id => $score) {

            $similar_documents[$doc_id] = $score/$corpus['doc_count'][$doc_id];

        }

        // sort from  high to low

        arsort($similar_documents);

    }   

    return $similar_documents;
}

CASE 1

$query = 'are';

$corpus = array(
    1 => 'How are you?',
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULT

Array
(
    [1] => 0.52832083357372
)

CASE 2

$query = 'are';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULTS

Array
(
    [1] => 0.54248125036058
    [3] => 0.21699250014423
)

CASE 3

$query = 'we are done';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULTS

Array
(
    [3] => 0.6813781191217
    [1] => 0.54248125036058
)

There are plenty of improvements to be made but the model provides a way of getting good results from natural queries, which don't have boolean operators such as strpos(), preg_match(), strstr() or stristr().

NOTA BENE

Optionally eliminating redundancy prior to search the words

  • thereby reducing index size and resulting in less storage requirement

  • less disk I/O

  • faster indexing and a consequently faster search.

1. Normalisation

  • Convert all text to lower case

2. Stopword elimination

  • Eliminate words from the text which carry no real meaning (like 'and', 'or', 'the', 'for', etc.)

3. Dictionary substitution

  • Replace words with others which have an identical or similar meaning. (ex:replace instances of 'hungrily' and 'hungry' with 'hunger')

  • Further algorithmic measures (snowball) may be performed to further reduce words to their essential meaning.

  • The replacement of colour names with their hexadecimal equivalents

  • The reduction of numeric values by reducing precision are other ways of normalising the text.

RESOURCES

查看更多
十年一品温如言
3楼-- · 2018-12-31 01:29

Maybe you could use something like this:

<?php
    findWord('Test all OK');

    function findWord($text) {
        if (strstr($text, 'ok')) {
            echo 'Found a word';
        }
        else
        {
            echo 'Did not find a word';
        }
    }
?>
查看更多
宁负流年不负卿
4楼-- · 2018-12-31 01:30
if (preg_match('/(are)/', $a)) {
   echo 'true';
}
查看更多
路过你的时光
5楼-- · 2018-12-31 01:31

In order to find a 'word', rather than the occurrence of a series of letters that could in fact be a part of another word, the following would be a good solution.

$string = 'How are you?';
$array = explode(" ", $string);

if (in_array('are', $array) ) {
    echo 'Found the word';
}
查看更多
骚的不知所云
6楼-- · 2018-12-31 01:31

Lot of answers that use substr_count checks if the result is >0. But since the if statement considers zero the same as false, you can avoid that check and write directly:

if (substr_count($a, 'are')) {

To check if not present, add the ! operator:

if (!substr_count($a, 'are')) {
查看更多
还给你的自由
7楼-- · 2018-12-31 01:32

While most of these answers will tell you if a substring appears in your string, that's usually not what you want if you're looking for a particular word, and not a substring.

What's the difference? Substrings can appear within other words:

  • The "are" at the beginning of "area"
  • The "are" at the end of "hare"
  • The "are" in the middle of "fares"

One way to mitigate this would be to use a regular expression coupled with word boundaries (\b):

function containsWord($str, $word)
{
    return !!preg_match('#\\b' . preg_quote($word, '#') . '\\b#i', $str);
}

This method doesn't have the same false positives noted above, but it does have some edge cases of its own. Word boundaries match on non-word characters (\W), which are going to be anything that isn't a-z, A-Z, 0-9, or _. That means digits and underscores are going to be counted as word characters and scenarios like this will fail:

  • The "are" in "What _are_ you thinking?"
  • The "are" in "lol u dunno wut those are4?"

If you want anything more accurate than this, you'll have to start doing English language syntax parsing, and that's a pretty big can of worms (and assumes proper use of syntax, anyway, which isn't always a given).

查看更多
登录 后发表回答