In Solr, what is the maximum size of a “text” fiel

2019-05-06 22:11发布

问题:

When using Solr client in your app, what is the max size of a text multi line field?

Can I send huge xml documents as text?

E.g.

SolrInputDocument document = new SolrInputDocument();
document.addField("id", rec.getId());
document.addField("hugeTextFile_txt", hugeTextFile);        
UpdateResponse response = solr.add(document);
solr.commit();  

回答1:

Update

I used the same unit test using text fieldType. Below is the declaration I used. Please note that I have removed analyzer section from declaration.

<fieldType name="text" class="solr.TextField"/>

I was able to add 500,000,000 characters and index it successfully. For higher value I got Java heap space error, which is not related to the solr.


I tried to perform a simple test by adding a large value to a field. The limit I found is 32,766 bytes. After that It throws IllegalArgumentException. The fieldType for email was string.

<fieldType name="string" class="solr.StrField" sortMissingLast="true" />

@Test
public void test() throws IOException, SolrServerException {
  SolrInputDocument document = new SolrInputDocument();
  document.addField("profileId", TestConstants.PROFILE_ID);
  StringBuilder builder = new StringBuilder();
  for (int i = 0; i<32767; i++) {
    builder.append((char)((i%26)+'a'));
  }
  document.addField("email", builder.toString());
  solrClient.add(document);
  solrClient.commit();
}

Exception thrown by above for 32767 and more:

Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="email" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 97, 98, 99, 100]...', original message: bytes can be at most 32766 in length; got 32767

I hope this would help.



回答2:

changing the solr field to "text_general" and updating the solr schema helped

commands to update solr schema:

solrctl instancedir --update "directory that contains the schema file with the edited solr field"

solrctl collection --update "collection-name to update"



标签: solr