Let me know if I am wrong, but I think solr only expects fields that are already mentioned in the schema.xml. So, if I have a field called 'title', I need to mention this in the schema.
There is no mentioning about modifying the schema.xml in the Sunspot's documentation. I am just wondering how Sunspot modifies schema.xml allowing custom fields to be entered to the index.
I also know that Sunspot uses RSolr to do things. So if there is a way to modify the schema and reload data from DB to Solr using RSolr, please let me know.
Sunspot comes with a stock schema that's a little tuned for a sunspot integration that adheres to the principle of least surprise for the developer—for example, the stock solrconfig.xml is set to turn autocommit off, even though in production you'll want to turn this on. The schema really has more to do with types than fields—see the link below for an example of how to create a new field type. Indexing a field is trivial if it fits into one of the existing types. For example:
And in the search process, you'd do something like this:
Sunspot's wiki has a lot of additional documentation. Here's an example on adding a custom type to allow ngram searching:
https://github.com/outoftime/sunspot/wiki/Wildcard-searching-with-ngrams
As karmajunkie alludes to, Sunspot uses its own standard schema. I'll go in to how that works in a bit more detail here.
Solr Schema 101
For the purposes of this discussion, Solr schemas are mostly comprised of two things: type definitions, and field definitions.
A
type
definition sets up a type by specifying its name, the Java class for the type, and in the case of some types (notably text), a subordinate block of XML configuring how that type is handled.A
field
definition allows you to define the name of a field, and the name of the type of value contained in that field. This allows Solr to correlate the name of a field in a document with its type, and a handful of other options, and thus how that field's value should be processed in your index.Solr also supports a
dynamicField
definition, which, instead of a static field name, lets you specify a pattern with a glob in it. Incoming fields can have their names matched against these patterns in order to determine their types.Sunspot's conventional schema
Sunspot's schema has a handful of
field
definitions for internally used fields, such as the ID and model name. Additionally, Sunspot makes liberal use ofdynamicField
definitions to establish naming conventions based on types.This use of field naming conventions allows Sunspot to define a configuration DSL that creates a mapping from your model into an XML document ready to be indexed by Solr.
For example, this simple configuration block in your model…
…will be used by Sunspot to create a field name of
body_text
. This field name is matched against the*_text
pattern for the followingdynamicField
definition in the schema:This maps any field with the suffix
_text
to Sunspot's definition of thetext
type. If you take a look at Sunspot's schema.xml, you'll see many other similar conventions for other types and options. The:stored => true
option, for example, will typically add ans
on that type's suffix (e.g.,_texts
).Modifying Sunspot's schema in practice
In my experience with clients', and my own, projects, there are two good cases for modifying Sunspot's schema. First, for making changes to the
text
field's analyzers based on the different features your application might need. And, second, for creating brand new types (usually based on the text type) for a more fine-grained application of Solr analyzers.For example, widening search matches with "fuzzy" searches can be done with matches against a special text-based field that also uses linguistic stems, or NGrams. The tokens in the original
text
field may be used to populate spellcheck, or to boost exact matches. And the tokens in the customtext_ngram
ortext_en
can serve to broaden search results when the stricter matching fails.Sunspot's DSL provides one final feature for mapping your fields to these custom fields. Once you have set up the
type
and its correspondingdynamicField
definition(s), you can use Sunspot's:as
option to override the convention-based name generation.For example, adding a custom
ngram
type for the above, we might end up processing the body again with NGrams with the following Ruby code: