We are considering a schema with two multi-valued fields. Search is performed on the first field, but sorting should be done on the second field, using the corresponding value. E.g. if documents match because of the n-th value in the first field (where n may be different for each match), then they should be returned sorted by the n-th value in the second field.
Is that possible?
Background: each document has a list of similar documents (IDs) and a corresponding list of similarity scores (value between 0 and 1). Given ID 42, we need to return all similar documents (e.g. documents with 42 in the first field), sorted by their similarity to document 42.
Other schemas we are considering are:
- Dynamic fields for each ID so we can sort by the field Similarity_ID42 when searching for documents similar to 42. This does not seem to scale, at 800K+ documents, CPU goes to 100% during indexing.
- A single multi-valued field storing "ID.score" as a decimal (e.g. 42.563) and then searching for all documents that have a value that is > 42 AND < 43, and sorting by that value (I'm not even sure this is possible).
The approach will not succeed, as you can search, but you cannot sort by a multivalued field. This pointed out in Sorting with Multivalued Field in Solr and written in Solr's Wiki
Update
About the alternatives, as you point out that you need to find similar documents for one given ID, why not create a second core with a schema like
Then you could do a second query, after performing the actual search