How to use R to create a word co-occurrence matrix

2020-03-26 04:36发布

问题:

I am a newbie in r. I have a set of data about online videos and their tags. The data looks like

film  tag1 tag2 tag3 tag4....
1      A    B    C    D
2      A    C    F    G 
3      B    D    C    X 

I want to create a matrix which tells me the co-occurrence of the tags, such as:

       A    B   C    D .....
A     10    13
B     15    2
C      3    16
D     9     20

How should I do it?

回答1:

If I understand what you want here is one way:

dat <- read.table(text='film  tag1 tag2 tag3 tag4
1      A    B    C    D
2      A    C    F    G 
3      B    D    C    X', header=T)

library(qdapTools)
crossprod(as.matrix(mtabulate(as.data.frame(t(dat[, -1])))))

Giving:

  A B C D F G X
A 2 1 2 1 1 1 0
B 1 2 2 2 0 0 1
C 2 2 3 2 1 1 1
D 1 2 2 2 0 0 1
F 1 0 1 0 1 1 0
G 1 0 1 0 1 1 0
X 0 1 1 1 0 0 1


标签: r text analysis