I'm using Java to query a SOLR server for results that have IDs within a set of known IDs that I am interested in.
The best way I could think to get just these results that I am interested in was to create a long query string that looks something like this:
q=(item_id:XXX33-3333 OR item_id:YYY42-3445 OR item_id:JFDE-3838)
I generate this String, queryString
, before making my request, and there are over 1500 such ids included in the request I would eventually like to make. I am using an HTTP POST to make the query as such:
HttpPost post = new HttpPost(url);
post.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
StringEntity entity = new StringEntity(queryString, "UTF-8");
entity.setContentType("application/x-www-form-urlencoded; charset=utf-8");
post.setEntity(entity);
HttpClient client = new DefaultHttpClient();
HttpResponse response = client.execute(post);
If I limit the query to just the first 1000 ids, it succeeds and I get the results back as I would expect. However, if I increase the query to include all 1500 that I am really interested in, I get an HTTP 400 response code with the following error:
HTTP/1.1 400 org.apache.lucene.queryParser.ParseException: Cannot parse '[my query here...]
Is there a limit to the number of ids that I can OR together in a SOLR query? Is there another reason this might be failing when I go past 1000? I have experimented and it fails at around 1024 (my ids are all almost the same length) so it seems to suggest there is a character or term limit.
Or, if someone has a good suggestion of how I can retrieve the items I'm looking for in another, smarter, way, I would love to hear it. My backup solution is just to query SOLR for all items, parse the results, and use the ones that belong to the set I am interested in. I would prefer not to do this, since the data source could have tens of thousands of items, and it would be inefficient.