Is it somehow possible to create a solr document that contains sub-elements?
For example, how would I represent something like this:
<person first="Bob" last="Smith">
<children>
<child first="Little" last="Smith" />
<child first="Junior" last="Smith" />
</children>
</person>
What is the usual way to solve this problem?
You can model this in different ways, depending on your searching/faceting needs. Usually you'll use multivalued or dynamic fields. In the next examples I'll omit the field type, indexed and stored flags:
<field name="first"/>
<field name="last"/>
<field name="child_first" multiValued="true"/>
<field name="child_last" multiValued="true"/>
It's up to you to correlate the children first names and last names. Or you could just put both in a single field:
<field name="first"/>
<field name="last"/>
<field name="child_first_and_last" multiValued="true"/>
Another one:
<field name="first"/>
<field name="last"/>
<dynamicField name="child_first_*"/>
<dynamicField name="child_last_*"/>
Here you would store fields 'child_first_1', 'child_last_1', 'child_first_2', 'child_last_2', etc. Again it's up to you to correlate values, but at least you have an index. With some code you could make this transparent.
Bottom line: as the Solr wiki says: "Solr provides one table. Storing a set database tables in an index generally requires denormalizing some of the tables. Attempts to avoid denormalizing usually fail." It's up to you to denormalize your data according to your search needs.
UPDATE: Since version 4.5 or so Solr supports nested documents directly: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
As of Solr 4.7 and 4.8, Solr supports nested documents:
{
"id": "chapter1",
"title" : "Indexing Child Documents in JSON",
"content_type": "chapter",
"_childDocuments_": [
{
"id": "1-1",
"content_type": "page",
"text": "ho hum... this is page 1 of chapter 1"
},
{
"id": "1-2",
"content_type": "page",
"text": "more text... this is page 2 of chapter 1"
}
]
}
See the Solr release notes for more.
Having a separate fields for children leads to false positive matches. Concatenated fields works in some meaning but it's really limited approach. We have a lot of experience in the similar tasks blogged at http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html