Full text search in HTML ignoring tags / &

2019-01-02 18:10发布

问题:

I've recently seen a lot of libraries for searching and highlighting terms within an HTML page. However, every library I saw has the same problem, they can't find text partly encased in an html tag and/or they'd fail at finding special characters which are &-expressed.


Example a:

<span> This is a test. This is a <b>test</b> too</span>

Searching for "a test" would find the first instance but not the second.


Example b:

<span> Pencils in spanish are called l&aacute;pices</span>

Searching for "lápices" or "lapices" would fail to produce a result.


Is there a JS library that does this or at least a way to circumvent these obstacles?

Thanks in Advance!

Bruno

回答1:

You can use window.find() in non-IE browsers and TextRange's findText() method in IE. Here's an example:

http://jsfiddle.net/xeSQb/6/

Unfortunately Opera prior to the switch to the Blink rendering engine in version 15 doesn't support either window.find or TextRange. If this is a concern for you, a rather heavyweight alternative is to use a combination of the TextRange and CSS class applier modules of my Rangy library, as in the following demo: http://rangy.googlecode.com/svn/trunk/demos/textrange.html

Code:

function doSearch(text) {
    if (window.find && window.getSelection) {
        document.designMode = "on";
        var sel = window.getSelection();
        sel.collapse(document.body, 0);

        while (window.find(text)) {
            document.execCommand("HiliteColor", false, "yellow");
            sel.collapseToEnd();
        }
        document.designMode = "off";
    } else if (document.body.createTextRange) {
        var textRange = document.body.createTextRange();
        while (textRange.findText(text)) {
            textRange.execCommand("BackColor", false, "yellow");
            textRange.collapse(false);
        }
    }
}


回答2:

There are 2 problems here. One is the nested content problem, or search matches that span an element boundary. The other is HTML-escaped characters.

One way to handle the HTML-escaped characters is, if you are using jQuery for example, to use the .text() method, and run the search on that. The text that comes back from that already has the escaped characters "translated" into their real character.

Another way to handle those special characters would be to replace the actual character (in the search string) with the escaped version. Since there are a wide variety of possibilities there, however, that could be a lengthy search depending on the implementation.

The same sort of "text" method can be used to find content matches that span entity boundaries. It gets trickier because the "Text" doesn't have any notion of where the actual parts of the content come from, but it gives you a smaller domain to search over if you drill in. Once you are close, you can switch to a more "series of characters" sort of search rather than a word-based search.

I don't know of any libraries that do this however.



回答3:

Just press F3 and use the <p> and </p> command to tell others on your site. For example:You have the knowledge of the F3 search button so to put text on the screen to tell others you would type..

<p><h4>If your having trouble finding something press F3 to highlight the text<h4></p>


标签: