I am trying to fit a VLMC to a dataset where the longest sequence is 296 states. I do it as shown below:
# Load libraries
library(PST)
library(RCurl)
library(TraMineR)
# Load and transform data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/241ef39125ecb55a85b43d7f4cd3d58f617b2ecf/challenge_level.csv")
data <- read.csv(text = x)
data.seq <- seqdef(data[,2:ncol(data)], missing = NA, right = NA, nr = "*")
S1 <- pstree(data.seq, ymin = 0.01, lik = TRUE, with.missing = TRUE, nmin = 2)
This, however, yields the following error:
Error in res[i, , drop = FALSE] : subscript out of bounds
How can I fit the model to data with sequences this long? Are there any good justifications for limiting the length within the model?
The problem comes from your data. By not setting L in the pstree function, you mean that you want to fit a model of maximum order. The fitting process produces an error at L=8, since you have nmin=2 but at this order only one context has nmin=2
Fitting a model using L=8 works fine
Again, you don't need to use any 'missing', 'right' or 'nr' option in seqdef(), nor 'with.missing' in pstree()
Best, Alexis