We are using Lucene for text search as part of sitecore. Is there any method to ignore stop words (like a,an,the...) in the sitecore search?
问题:
回答1:
By default, Sitecore uses Lucene standard analyzer - Lucene.Net.Analysis.Standard.StandardAnalyzer
. You can see this is defined in /configuration/sitecore/search/analyzer
element of web.config file. One of the constructors of StandardAnalyzer
class accepts the array of strings it will consider stop words. By default it uses the hardcoded list of stop words which include:
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
If you'd like to override this behavior, I think you should inherit StandardAnalyzer
and override its default constructor to take the stop words from another source instead of the hardcoded array. You have various options, even reading it from a text file. Don't forget to replace the standard class with yours in web.config.
See other constructors of StandardAnalyzer
class for more details. .NET Reflector is your friend here.
回答2:
An example for Yans post:
public class CaseAnalyzer : Lucene.Net.Analysis.Standard.StandardAnalyzer
{
private static Hashtable stopWords = new Hashtable(); //{{"by","by"}}; <-- Makes "by" a stopword that will not be matched in analyzer
public CaseAnalyzer() : base(Lucene.Net.Util.Version.LUCENE_29, stopWords)
{
}
}
this should be registered in the web.config under
/configuration/sitecore/search/analyzer
an example of the analyzer registration
<caseanalyzer type="EBF.Business.Search.Analyzers.CaseAnalyzer, EBF.Business, Version=1.0.0.0, Culture=neutral"/>
Lastly you just need to register your analyzer in the search configuration like this
<Analyzer ref="search/caseanalyzer" />