how to get word-topic probability using mallet

2020-03-24 07:26发布

I've made a parallel topic model using mallet.

And I want to get top-words for each document.

To do that, I'm trying to get a word-topic probability matrix.

How would I achieve this?

标签： java mallet

2条回答

该账号已被封号

2楼-- · 2020-03-24 07:54

Just to make one point regarding the answer of Praveen.

Using the --word-topic-counts-file, MALLET will create a file which first few rows look something like this:

0 elizabeth 19:1
1 needham 19:2 17:1
2 died 19:2
3 mother 17:1 19:1 14:1

where first line means that the word elizabeth has been present in the topic 19 once; second line means that the word needham is associated two times with the topic 19, and with the topic 17 once; and so on...
Although, this file doesn't give you explicit probabilities, you can use it to calculate them.

0人赞添加讨论(0) 举报

Evening l夕情丶

3楼-- · 2020-03-24 08:03

When you are building topics using MALLET, you have an option called --word-topic-counts-file. When you give this option and specify a file, MALLET writes ( topic, word, probability ) values per each line in the file. You can later read this file in C, Java or R (of course, any language) to create the matrix you want.

0人赞添加讨论(0) 举报

how to get word-topic probability using mallet

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间