Is there a size or term limit for a SOLR query str

2019-02-12 15:30发布

问题:

I'm using Java to query a SOLR server for results that have IDs within a set of known IDs that I am interested in.

The best way I could think to get just these results that I am interested in was to create a long query string that looks something like this:

q=(item_id:XXX33-3333 OR item_id:YYY42-3445 OR item_id:JFDE-3838)

I generate this String, queryString, before making my request, and there are over 1500 such ids included in the request I would eventually like to make. I am using an HTTP POST to make the query as such:

        HttpPost post = new HttpPost(url);
        post.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");

        StringEntity entity = new StringEntity(queryString, "UTF-8");
        entity.setContentType("application/x-www-form-urlencoded; charset=utf-8");
        post.setEntity(entity);

        HttpClient client = new DefaultHttpClient();
        HttpResponse response = client.execute(post);

If I limit the query to just the first 1000 ids, it succeeds and I get the results back as I would expect. However, if I increase the query to include all 1500 that I am really interested in, I get an HTTP 400 response code with the following error:

HTTP/1.1 400 org.apache.lucene.queryParser.ParseException: Cannot parse '[my query here...]

Is there a limit to the number of ids that I can OR together in a SOLR query? Is there another reason this might be failing when I go past 1000? I have experimented and it fails at around 1024 (my ids are all almost the same length) so it seems to suggest there is a character or term limit.

Or, if someone has a good suggestion of how I can retrieve the items I'm looking for in another, smarter, way, I would love to hear it. My backup solution is just to query SOLR for all items, parse the results, and use the ones that belong to the set I am interested in. I would prefer not to do this, since the data source could have tens of thousands of items, and it would be inefficient.

回答1:

There is no limit on the Solr side - we regularly use Solr in a similar way with tens of thousands of IDs in the query.

You need to look at the settings for your servlet container (Tomcat, Jetty etc.) and increase the maximum POST size. Look up maxPostSize if you are using Tomcat and maxFormContentSize if you are using Jetty.



回答2:

As of Solr 6.0 there is a maxBooleanClauses configuration within Solr - defaults to 1024.

I wrote a unit test to confirm and confirmed the limitation (with Solr 5.3).

See more here https://wiki.apache.org/solr/SolrConfigXml#The_Query_Section

FWIW there is an open Solr JIRA to remove it so it may be removed in the future https://issues.apache.org/jira/browse/SOLR-4586