Solr PatternReplaceCharFilterFactory not replacing

2019-04-07 09:26发布

问题:

So I am very new at Solr but I am trying to use the PatternReplaceCharFilterFactory to do some pre-processing on a phone number string that will be stored. Here is the configuration for the field:

<fieldType name="phone_number" class="solr.TextField" >
  <analyzer>
  <charFilter class="solr.PatternReplaceCharFilterFactory"
              pattern="\(?(\d{3})?\)?[-. ]?(\d{3})[-. ]?(\d{4})"
              replaceWith="$1-$2-$3"/>
   <tokenizer class="solr.StandardTokenizerFactory"/>
  </analyzer>
</fieldType>

I have tested the regex and it matches everything I would expect it to (eg. 555.444.1234, (555) 444-1234, 5554441234, 4441234, 444-1234, etc).

Now my understanding is that the regex should match what ever is passed to it and replace it with the pattern specified. So if they passed me 555.123.4444 I would expect it result in 555-123-4444 to be passed to the StandardTokenizerFactory. From there it would be broken down into tokens 555,123,4444.

Given how much time I have spent on this I am sure there is a small configuration issue that I am missing but from the available documentation (that I have seen) I have no clue what it is.

Thank you in advance.

回答1:

OK so I figured it out after one more 'lucky' google search I came across this link Solr filters: PatternReplaceCharFilter and at the very bottom they discuss Advanced Parameters which I think explains better how the filter actually works:

CharFilter operates on a single character, and pattern matching requires an internal buffer to read more characters. MaxBlockChars allows you to specify the size of the buffer.

My problem is that it's reading in a single character not the whole string. This was contrary to the examples I saw posted. So the solution was on my charFilter I added the MaxBlockChar attribute and voila it works. There was no mention of this attribute on LucidImagination's site nor on the solr wiki (that I came across).



标签: solr