I have a solr install to query content on a Drupal site. Many of the title fields have punctuation at the start of the string and so when I sort by title the punctuation appears top of the list.
I would like to get solr to ignore the the title when sorting by title but none of the solutions I have tried work.
I am fairly new to solr and so it may be something really simple that I am doing wrong... I don't really understand much of what is going on in the schema.xml file!
The title field is called label in solr and I have tried various methods in solr.PatternReplaceFilterFactory which do not work.
<field name="label" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<copyField source="label" dest="sort_label"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="(^\p{Punct}+)" replacement="" replace="all"
/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="0"
preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
…
</analyzer>
My query is start=0&rows=25&q=education&fl=id%2Centity_id%2Centity_type%2Cbundle%2Cbundle_name%2Csort_label%2Css_language%2Cis_comment_count%2Cds_created%2Cds_changed%2Cscore%2Cpath%2Curl%2Cis_uid%2Ctos_name%2Czm_parent_entity%2Css_filemime%2Css_file_entity_title%2Css_file_entity_url&pf=content%5E2.0&&sort=sort_label%20asc
This is done with the
WordDelimiterFilterFactory
. SetgenerateWordParts=1.
Add this filter to yourAfter modifying the
schema.xml
restart the server and re-index the data.