I have two table fields in a MySQL table.
One is VARCHAR and is a "headline" for a classified (classifieds website).
The other is TEXT field which contains the "text" for the classified.
Two Questions:
How should I determine how to index these two fields? (what field-type, what classes to use etc)
Currently I have an "ad_id" as a unique identifier for each ad, example "bmw_m3_82398292".
How can I make SOLR return this identifier whenever a 'query match' is found by SOLR?
(The first part of the identifier is actually the headline fields content, the second part is a random number chosen)
Thanks
1. Schema
Your Solr schema is very much determined by your intended search behavior. In your schema.xml file, you'll see a bunch of choices like "text" and "string". They behave differently.
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
The string field type is a literal string match. It would operate like ==
in a SQL statement.
<fieldtype name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldtype>
The text_ws field type does tokenization. However, a big difference in the text
field is the filters for stop-words and delimiters and lower-casing. Notice how these filters are designated for both the Lucene index and the Solr query. So when searching a text field, it will adapt the query terms using these filters to help find a match.
<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter ..... />
<filter ..... />
<filter ..... />
</analyzer>
</fieldtype>
When indexing things like news stories, for example, you probably want to search for company names and headlines differently.
<field name="headline" type="text" />
<field name="coname" type="string" indexed="true" multiValued="false" omitNorms="true" />
The above example would allow you to do a search like &coname:Intel&headline:processor+specifications
and retrieve matches hitting exactly Intel stories.
If you wanted to search a range
2. Result Fields
You can defined a standard set of return fields in your RequestHandler
<requestHandler name="mumble" class="solr.DisMaxRequestHandler" >
<str name="fl">
category,coname,headline
</str>
</requestHandler>
You may also define the desired fields in your query string, using the fl
parameter.:
/select?indent=on&version=2.2&q=coname%3AIn*&start=0&rows=10&fl=coname%2Cid&qt=standard
You can also select ranges in your query terms using the field:[x TO *]
syntax. If you wanted to select certain ads by their date , you might build a query with
ad_date:[20100101 TO 20100201]
in your query terms. (There are many ways to search ranges, I'm presenting a method that uses integers instead of Date class.)