-->

Google search API versus MS SQL Server Full Text I

2020-07-30 03:00发布

问题:

We are working on websites for our clients and we want to adopt a search solution that can be easily reused. Which one should we go after ? Should we use Google Search API or should we use MS Sql Server Full Text Indexing and the CONTAINS & FREETEXT predicates ?

回答1:

We use SQL Server full text indexing here on Stack Overflow and it works reasonably well -- but I can only recommend it for 2005 and 2008, the versions we use it on. I heard it's much worse in 2000. There are quirks (stopword lists, etc) but nothing serious. It's fast and does what it says on the tin, mostly.

The problem you run into with contains() and freetext() is that users often expect to search at the "whole page" level, ala Google, where anything that's written to the page / screen is searchable. That's not really how databases work, but users don't care about that. They care about results and have (arguably reasonable) expectations based on years of web searching.

If you expect to need the "whole page" search level, I'd strongly recommend looking at the Google Search API, or Lucene.NET (assuming you're a Microsoft stack based on use of SQL).



回答2:

The good thing about SQL Server full text searching is the barrier to entry is quite low (assuming you're already using SQL Server). StackOverflow uses it for it's search. The downside is that it's effectiveness (or lack thereof) is one of the most frequently criticized features of SO. So much so that a lot of people (myself included) default to using "site:stackoverflow.com ..." in Google.

Google Custom Search also has a low barrier to entry but you lose some control on how often your index is updated and how many search results you can return. Google Site Search is a better version that corrects some of these features (like on-demand indexing).

At the top end you have Google Search Appliances, which is really your only Google option if your data isn't public.

Which is appropriate depends on how often your data needs to be re-indexed, how many requests you make, how much bandwidth you want to use getting indexed, whether your data is public and how good you need the search results to be. There is no one answer.