Why isn't Stanford Topic Modeling Toolbox prod

2019-04-16 10:21发布

I tried to run this code from github (following the 1-2-3 steps) which identifies 30 topics in Sarah Palin's 14,500 emails. The topics discovered by the author are here. However, Stanford Topic Modeling Toolbox is not producing lda-output directory for me. It produced the lda-86a58136-30-2b1a90a6, but the summary.txt in this folder only shows the initial assignment of topics, not the final one. Any idea how to produce lda-output directory with the final summary of topics discovered? Thanks in advance!

标签： nlp machine-learning stanford-nlp text-analysis lda

1条回答

劳资没心，怎么记你

2楼-- · 2019-04-16 11:02

Have you tried the instructions posted here?

Note that I see the original investigator trained the model with Sarah Palin's emails, and then used that trained model to analyze Sarah Palin's emails. While I am not an LDA expert, this typically smacks of "finding what you have".

In most disciplines, training would be done over a known set of items which had been classified according to discriminant by experts. This means that the training would consist of feeding a set of data in known likely topics from other sources, and then would use the LDA library to determine distance from the topics in the "learned" database.

In any event, good luck.

In the event you encounter a specific issue, please post the error, and the steps you took to arrive at that error. Few people invest the time to attempt to reproduce an issue (a typical prerequisite for correcting an issue) without direction, or even the ability to determine if their encountered issue is similar to yours.

0人赞添加讨论(0) 举报

Why isn't Stanford Topic Modeling Toolbox prod

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间