How does Firefox's 'awesome' bar match

2019-01-21 05:03发布

The question is how the string matching is done to find matching entries by the firefox 3 url bar. Substring matching on every entry might be slow. What algorithm can be used to match at any location in a fast way?

4条回答
一夜七次
2楼-- · 2019-01-21 05:25

The normal way to do fast substring matching is to create a data structure which contain all the suffixes of all the strings you want search. Depending on the organization, this can be called the "suffix tree" or "suffix array". For example, if you have 1000 strings and every one is 50 characters long, you have 1.000 x 50 nontrivial suffixes, i.e. your suffix array would have 50.000 entries.

Then to do the matching, you do binary search (if array) or tree search (if tree) to find all those suffixes in the data structure whose beginning matches the string written in the search box. Because it is the beginning of the suffix you are matching, you can use standard search procedures (binary search, tree descent) to get to the result fast. Every suffix is linked to those strings in which it appears.

Example: you have two strings "CAT" and "DOT". Your suffix array looks like this (note lexiographic = alphabetic ordering):

#1 AT  --> CAT
#2 CAT --> CAT
#3 DOT --> DOT
#4 OT  --> DOT
#5 T   --> DOT, CAT

Note that there are six suffixes but two of the are the same (last "T" in "CAT" and "DOT") and the are both represented by the same entry (#5).

Now when the user types in search, e.g. "OT" (which should match "DOT"), you can do simple lexiographic ordering lookup in log-time as you are now searching for a beginning match in the suffix array.

This is the basic principle of fast text searching when the search pattern does not contain wildcards.

查看更多
The star\"
3楼-- · 2019-01-21 05:27

I think this is left to the underlying storage: the SQLite Database where Firefox stores the Pages you visited offers a fast function for substring comparison.

I think Firefox issues a SQL Query to the Database. This is quite fast because the database is cached in your memory.

查看更多
神经病院院长
4楼-- · 2019-01-21 05:42

The awesomebar suggests URLS by an algorithm called The Places frecency algorithm.

According the Mozilla developer website:

The word "frecency" itself is a combination of the words "frequency" and "recency."

查看更多
走好不送
5楼-- · 2019-01-21 05:49

The algorithm the location bar in Firefox 3.0 is bit complicated. It will get data from two (three for Firefox 3.5 and later) different queries:

  • For the first query, it checks the moz_inputhistory table to see if the current search string is stored in that table. These results are sorted by "rank" which is a number that determines how recently it is used. This number degrades once a day. This search is what makes the location bar adaptive to what you select over time (implemented in bug 95739).

  • In Firefox 3.5 and later, it then checks the moz_keyword table for bookmarks with a keyword that match the search text exactly.

  • The last query, it goes through every entry in moz_places, which will include all history visits and bookmarks. These results are ordered by frecency.

For all three of these cases, the following algorithm is used for matching against the tags, title, and the url (called "searchable text" below). This is a bit difficult to explain in words, so it might be easier to look at the source code.

  1. The search string is broken into tokens determined by whitespace (each non-whitespace "word" is a token).
  2. For each token, start comparing each character of the searchable text and the token in a Unicode, case-insensitive way until it matches completely. If one set of characters do not match, advance to the next "word boundary" in the searchable text and try again.
  3. If we match the any one of the searchable text, it will show up in the location bar.
  4. If we do not have enough results (default is 12), we will then redo the search of the query going through every bookmark and history visit, and test the searchable text in a Unicode, case-insensitive way for each token anywhere (not just at word boundaries).

Hopefully that explains it in an understandable way!

查看更多
登录 后发表回答