Error in levels for seqdef in R

2019-06-03 14:22发布

I've seen this error everytime I try to run seqdef on my data that has already been converted to STS format using seqformat. A sample of my dataframe looks like

head(df.new, 10)
   user_id orderdate         cart to
1        8         1      produce 30
2        8        31      produce 60
3        8        61      produce 70
4        8        71      produce 92
5       10         1      produce 30
6       10        31      produce 42
7       10        43 meat seafood 56
8       10        57         deli 77
9       17         1    beverages  3
10      17         4    beverages  8

It has a total of 14000 rows of orders and there are some orders which occur on the same day for each user (i.e. orderdate == to). Below are the codes that I have used to create my STS data which is used as input to seqdef.

df.form <- seqformat(df.new, id='user_id', begin='orderdate', end='to', status='cart', from='SPELL', to='STS', process=FALSE)
df.seq <- seqdef(df.form, left='DEL', right = 'unknown', xtstep=10, void = 'unknown')

The error message I get when running the seqdef is

 [>] found missing values ('NA') in sequence data
 [>] preparing 35000 sequences
 [>] coding void elements with 'unknown' and missing values with '*'
 [>] 21 distinct states appear in the data: 
     1 = alcohol
     2 = babies
     3 = bakery
     4 = beverages
     5 = breakfast
     6 = bulk
     7 = canned goods
     8 = dairy eggs
     9 = deli
     10 = dry goods pasta
     11 = frozen
     12 = household
      ...
 [>] adding special state(s) to the alphabet: unknown
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : 
  factor level [24] is duplicated

I tried removing those orders where orderdate == to and the same error still occurs. I would appreciate any help I can get to fix this problem. Thanks.

标签: r traminer
1条回答
我命由我不由天
2楼-- · 2019-06-03 15:05

The error occurs because you are using the same code ('unknown') for right missings and voids.

When the sequences contain 'missings', these missings will be considered as a separate state when you set with.missing = TRUE in functions such as seqdist or seqdplot, while voids are used to adjust the row lengths and are simply ignored when plotting the sequences (seqplot) or computing dissimilarities (seqdist).

查看更多
登录 后发表回答