count shared occurrences and remove duplicates

2019-07-25 17:27发布

I have this data.frame :

df <- read.table(text= "   section to from    time
                             a     1  5        9       
                             a     2  5        9        
                             a     1  5        10       
                             a     2  6        10       
                             a     2  7        11       
                             a     2  7        12       
                             a     3  7        12       
                             a     4  7        12
                             a     4  6        13  ", header = TRUE)   

Each row identifies the simultaneoues occurence of an id in to and from at a timepoint time. Basically a time explicit network of ids in to and from.

I want to know which to ids shared a from id within a particular time range which is 2. In otherwards i want to know if ids 1 and 2 in to both went to coffee shop 5 within two days of each other., i.e.

id 1 and 2 in to shared id 5 in from at time 9 and 10 respectively and so would have 1 shared events within the time window 2. If they also shared a from id at time point 13 e.g.

                             a     1  5        9       
                             a     2  5        9        
                             a     1  7        13       
                             a     2  7        13       

then 1 and 2 would get a 2

So the final output I would like for the df would be:

                           section to.a to.b    noShared
                             a     1    2        1       
                             a     2    3        1        
                             a     2    4        1       
                             a     3    4        1       

I can get some of the way there with:

library(plyr)                            
library(tnet)


a <- ddply(df, .(section,to,time), function(x)  
          data.frame(from = unique(x$from)) )

b <- ddply(a, .(section,time), function(x) {

            b <- as.tnet(x[, c("to","from")], type="binary two-mode tnet")
            b <- projecting_tm(b, method="sum")
            return(b)

       })

This gets me which ids in to shared ids in from within each time point.

However there are two main problems with b.

Firstly within each time point the pairs of ids appear twice in both directions i.e.

 1  2  5  9 # id 1 and 2 went to coffee shop 5  at time 9
 2  1  5  9 # id 2  and 1 went to coffee shop 5 at time 9

 I only want each sombination to appear once: 

  1  2  5  # id 1 and 2 went to coffee shop 5  at time 9</strike> 

Secondly I need to bin the results within the time window so that my final result doesnt hav time just number of shared events i.e.


EDIT

The time issue has more issues than expected. The first problem is enough for this question.

标签: r plyr
1条回答
一夜七次
2楼-- · 2019-07-25 18:27

for the generation of b (first part of the question)

I change the code projecteing_tm wihch is transformation of a network.

b <- ddply(a, .(section,time), function(x) {
  ## first I create the origin network
  net2 <- x[, c("to","from")]
  colnames(net2) <- c('i','p')
  net2 <- net2[order(net2[, "i"], net2[, "p"]), ]
  np <- table(net2[, "p"])
  net2 <- merge(net2, cbind(p = as.numeric(rownames(np)),np = np))
  ## trasnformed network
  net1 <- merge(net2, cbind(j = net2[, "i"], p = net2[, "p"]))
  net1 <- net1[net1[, "i"] != net1[, "j"], c("i", "j","np")]
  net1 <- net1[order(net1[, "i"], net1[, "j"]), ]
  index <- !duplicated(net1[, c("i", "j")])
  net1 <- cbind(net1[index, c("i", "j")])
  net1
})

So here you get your b without any warning

> b
  section time i j
1       a    9 1 2
2       a    9 2 1
3       a   12 2 3
4       a   12 2 4
5       a   12 3 2
6       a   12 3 4
7       a   12 4 2
8       a   12 4 3

For the second part of the question , do you want to remove duplicated from b?

b[!duplicated(t(apply(b[3:4], 1, sort))), ]
  section time i j
1       a    9 1 2
3       a   12 2 3
4       a   12 2 4
6       a   12 3 4

For this part Here I use an answer to this question.

查看更多
登录 后发表回答