Retrieving matched context of MySQL fulltext searc

2019-04-08 05:26发布

问题:

I'm doing a fulltext search on my MySQL table "pages". I'm displaying a list of pages that match the keyword in their "title" (plain text, VARCHAR, 255) or "content" (html, TEXT). When the match is found in the "content" field, I'd like to display the snippet in which the match was found. I have no idea how to go about this.

Can you put me in the right direction?

$query = '  SELECT 
                *, 
                MATCH(title, content) AGAINST("'.$keyword.'") AS score 
            FROM 
                page 
            WHERE 
                MATCH(title, content) AGAINST("'.$keyword.'")
            ORDER BY 
                score 
            DESC    ';
$result = mysql_query($query) or die (mysql_error());
if(mysql_num_rows($result) > 0) {   
    $output .= '<p>Your keyword matches the following pages:</p>';
    while($row = mysql_fetch_assoc($result)){

        $title      = htmlentities($row['title']);
        $content    = htmlentities(strip_tags($row['content']));
        $content    = limit_text($content, 250); // Cuts it down to 250 characters plus ...

        $output .= '<h2>'.$title.'</h2>';
        if(trim($content) != '') {
            $output .= '<p>'.$content.'</p>'; // I'd like to place a snippet here with the matched context
        }           
    }   
} else {
    $output .= '<p>Keyword not found...</p>';       
}

Also, I have a question regarding security. Right now I'm checking $keyword in three ways:

  • Not blank?
  • More than 2 characters?
  • Not dangerous? (see below)

I use a regular expression to match the following, to see if the user input is dangerous

<script|&lt;script|&gt;script|document.|alert|bcc:|cc:|x-mailer:|to:|recipient|truncate|drop table

This might be a little bit ridiculous and easy to work around, but it is at least a minimal form of protection against XSS exploits. What is the recommended way to secure filter a keyword intended for search? Is PHPIDS overkill?

回答1:

This should get you started on the "context" part...

// return the part of the content where the keyword was matched
function get_surrounding_text($keyword, $content, $padding)
{
    $position = strpos($content, $keyword);
    // starting at (where keyword was found - padding), retrieve
    // (padding + keyword length + padding) characters from the content
    $snippet = substr($content, $position - $padding, (strlen($keyword) + $padding * 2));
    return '...' . $snippet . '...';
}

$content = 'this is a really long string of characters with a magic word buried somewhere in it';
$keyword = 'magic';
echo get_surrounding_text($keyword, $content, 15); // echoes '... string with a magic word in it...'

This function does not account for cases where the padding boundaries would go outside the content string, like when the keyword is found near the beginning or end of the content. It also doesn't account for multiple matches, etc. But it should hopefully at least point you in the right direction.



回答2:

Rather than try to filter the $keywords variable yourself, you could simply use a prepared statement and never worry about missing a potential exploit:

<?php
$stmt = $dbh->prepare("INSERT INTO REGISTRY (name, value) VALUES (:name, :value)");
$stmt->bindParam(':name', $name);
$stmt->bindParam(':value', $value);

// insert one row
$name = 'one';
$value = 1;
$stmt->execute();

// insert another row with different values
$name = 'two';
$value = 2;
$stmt->execute();
?>


回答3:

I would probably get the $keyword to a function clean it up first if I were you. and for the record you would better put all words in $keyword into array so you can use boolean search if necessary.(like putting + in front of every word to get AND effect)