Why isn't Stanford Topic Modeling Toolbox prod

2019-04-16 10:51发布

问题:

I tried to run this code from github (following the 1-2-3 steps) which identifies 30 topics in Sarah Palin's 14,500 emails. The topics discovered by the author are here. However, Stanford Topic Modeling Toolbox is not producing lda-output directory for me. It produced the lda-86a58136-30-2b1a90a6, but the summary.txt in this folder only shows the initial assignment of topics, not the final one. Any idea how to produce lda-output directory with the final summary of topics discovered? Thanks in advance!

回答1:

Have you tried the instructions posted here?

Note that I see the original investigator trained the model with Sarah Palin's emails, and then used that trained model to analyze Sarah Palin's emails. While I am not an LDA expert, this typically smacks of "finding what you have".

In most disciplines, training would be done over a known set of items which had been classified according to discriminant by experts. This means that the training would consist of feeding a set of data in known likely topics from other sources, and then would use the LDA library to determine distance from the topics in the "learned" database.

In any event, good luck.

In the event you encounter a specific issue, please post the error, and the steps you took to arrive at that error. Few people invest the time to attempt to reproduce an issue (a typical prerequisite for correcting an issue) without direction, or even the ability to determine if their encountered issue is similar to yours.