Which words appear the most common in an indexed f

2019-02-17 23:29发布

问题:

How could I query Solr for the most common indexed words? For example, given these fields for each document:

  • There's a lady who's sure all the glitters is gold.
  • Gold is worth more than silver.
  • The lady is wearing a gold bracelet.

I would like Solr to return to me, in any format, the following output:

  • gold (3)
  • lady (2)
  • the (2) // Being a stop word this isn't really necessary
  • ...

Thanks.

回答1:

Use the luke request handler

http://wiki.apache.org/solr/LukeRequestHandler

example:

http://localhost:8983/solr/admin/luke?fl=Your_Indexed_Field&numTerms=500



回答2:

The Terms Component seems well-suited to the task. Here is an article about Self Updating Solr Stopwords which uses the Terms Component to find the 1000 most common indexed words and add them to the Stopwords file.

Finding the 1000 indexed keywords (sorted by frequency descending):

http://url.to.solr/solr/terms?terms.fl=MY_FIELD&terms.limit=1000


回答3:

This isn't exactly the use case for Solr as far as I know but it can be done with faceting. No guarantees about performance though. Make sure your field is set to be tokenized properly, and then run a query as usual but with the following additional parameters at the end:

&facet=true&facet.field=yourfield

Replace yourfield with the name of the field you have your data stored in.



标签: solr