I have a sphinx search index and would like to find what are the most common words in my index.
Ideally have a list of words ordered by frequency.
If it is not possible to do it using Sphinx, is there a way to query a mysql table's text fields to get the same stat?
Yes. This is quite simple. Build them with indexer using the --buildstops and --buildfreqs
flags.
indexer --config /path/to/sphinx.conf indexName --buildfreqs --buildstops freq_wordlist.txt 100000
This example gives you the first 100000 word in your sphinx index ordered by its frequency
Build them with indexer using the --buildstops and --buildfreqs flags.
Just be aware this does not build from the existing index, but runs against the data source as if indexing, and builds the word frequencies. It does not affect the index itself.
If you use delta indexes where you save the id of the last indexed document, this will read the last saved id and work from there.