Transition probabilities representation

2019-06-10 06:19发布

问题:

I would like to identify activity changes across time. Below is an example (from act1_1 to act1_16) of matrix that I was using to calculate transition probabilities between activities.

head (Activities) will return a tibble: 6 x 145

  serial act1_1 act1_2 act1_3 act1_4 act1_5 act1_6 act1_7 act1_8 act1_9  act1_10
     1 1.22e7 110    110    110    110    110    110    110    110    110    110    
     2 1.43e7 110    110    110    110    110    110    110    110    110    110    
     3 2.00e7 110    110    110    110    110    110    110    110    110    110    
     4 2.71e7 110    110    110    110    110    110    110    110    110    110    
     5 1.61e7 110    110    110    110    110    110    110    110    110    110    
     6 1.60e7 110    110    110    110    110    110    110    110    110    110    

# ... with 134 more variables: act1_11 <dbl+lbl>, act1_12 <dbl+lbl>,

The dimension of the "Activities" matrix is ncol=144 and nrows=16533; act1_1...ac1_144 are time-steps, and time is represented in 10 minutes intervals (e.g. act1_1 = 4.10am; act1_2=4.20am..). Time start from 4am (act1_1) and ends at act1_144(4am).The columns are filled in with different activities, such 110=sleep, 111=watching Tv, 123=eating, etc.

Below the function that I am using to calculate the transition probabilities:

transition.matrix <- function(X, prob=T)
{
    tt <- table( c(X[,-ncol(X)]), c(X[,-1]) )
    if(prob) t <- tt / rowSums(tt)
    tt
}
I call the function as:

transitionfunction <- trans.matrix(as.matrix(Activities))

Using this function I managed to calculate the transition probabilities between activities (Activities matrix). Below is an example of this kind of matrix:

Using the transitionfunction I would like to plot on x axis time (10 minutes intervals) and y axis probabilities.

How can I do this? How can I identify the most frequent transition between activities?

This is the plot that I am aiming for:

回答1:

Given one transition matrix m, you can find the most frequent n transitions as follows:

n <- 3 # or whatever
sorted <- sort(m, decreasing = TRUE)
which(m >= sorted[n], arr.ind = TRUE)

Ties may mean you'll get more than n results.

Given your data, you might want to ignore the diagonal. You can do that using

diag(m) <- 0

and then using the code above.

An issue is that you don't have separate transition matrices for each time. If you post some data in a usable form, you're likely to get help with that. (Not all 16533 rows, just enough to make it interesting.)