I have a problem with Solr 5.3.1 . My Schema is rather simple. I have one uniqueKey which is the "id" as string. indexed, stored and required, non-multivalued.
I add documents first with a "content_type:document_unfinished" and then overwrite the same document, with the same id but another content_type:document. The document is then twice in the index. Again, the only uniqueKey is "id", as string. The id is coming originally from a mysql-index primary int.
Also looks like this happens not only once:
http://lucene.472066.n3.nabble.com/uniqueKey-not-enforced-td4015086.html
http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-td4129651.html
In my case not all the documents in the index are duplicated, just some. I was assuming - initially - that they are getting overwritten on commit when the same uniqueKey exists in the index. Which doesn't seem to work like I expected it. I do not want to simply update some fields in the document, I want to completely replace it, with all the children.
Some stats: around 350k documents in the index. Mostly with childDocuments. The Documents are distinguished by a "content_type" field. I used SolrJ to import them in that way:
HttpSolrServer server = new HttpSolrServer(url);
server.add(a Collection<SolrInputDocument>);
server.commit();
I am always adding a whole document with all the children again. Its nothing overly fancy. I end up with duplicated documents for the same uniqueKey. There are no side injections. I run only Solr with the integrated Jetty. I do not open the lucene index in java "manually".
What I did then was to delete+insert again. That seemed to work for a while, but then started under some conditions give this error message:
Parent query yields document which is not matched by parents filter
The document where that happens seems to be completely random, just one thing seems to emerge: its a childDocument where it happens. I do not run anything special, basically downloaded the solr package from the website and run it with bin/solr start
Anyone any ideas?
EDIT 1
I think I found the problem, which seems to be a bug? To reproduce the issue:
I downloaded Solr 5.3.1 to a Debian in a virtualBox and started it with bin/solr start
. Added a new core with the basic config set. Nothing changed at the basic config set, just copied it over and added the core.
This leads to two documents with the same id in the index:
SolrClient solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1");
SolrInputDocument inputDocument = new SolrInputDocument();
inputDocument.setField("id", "1");
inputDocument.setField("content_type_s", "doc_unfinished");
solrClient.add(inputDocument);
solrClient.commit();
solrClient.close();
solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1");
inputDocument = new SolrInputDocument();
inputDocument.setField("id", "1");
inputDocument.setField("content_type_s", "doc");
SolrInputDocument childDocument = new SolrInputDocument();
childDocument.setField("id","1-1");
childDocument.setField("content_type_s", "subdoc");
inputDocument.addChildDocument(childDocument);
solrClient.add(inputDocument);
solrClient.commit();
solrClient.close();
Searching with:
http://192.168.56.102:8983/solr/test1/select?q=%3A&wt=json&indent=true
leads to the following output:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"indent": "true",
"wt": "json",
"_": "1450078098465"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
{
"id": "1",
"content_type_s": "doc_unfinished",
"_version_": 1520517084715417600
},
{
"id": "1-1",
"content_type_s": "subdoc"
},
{
"id": "1",
"content_type_s": "doc",
"_version_": 1520517084838101000
}
]
}
}
What am I doing wrong?