Hopefully a quick one ....
Regarding the output from seqefsub()
operations, please point me to a definition of the output notation.
To be more specific, the parentheses in e.g.
(A)
means what;- the greater than sign in
(A>B)
means what; - and the hyphen in
(A)-(A>B)
means what.
Section 10
of the excellent User Guide has examples, but I may have missed an unambiguous definition statement somewhere.
To quote the example in Section 10.2
of the guide, what is the conceptual difference between (Parent)-(Parent>Left)
and just (Parent>Left)
?
Thanks,
Dave
Update after Gilbert's comment....
In attempting to clarify what I perhaps missed on page 106 of the user guide, I think the explanation - or at least confirmation - that I was looking for was something along the lines of the following framework. Apologies for the possible clumsy wordiness.
The context here is when seqefsub()
results appear in the console....
(A)
this is the number of times state A appears as the first state, and not as any subsequent state. That is - it counts the number of times A appears in the first column. I assume here that I haven't missed another configuration option that counts first and all subsequent states of this type. If there is please let me know.
(A>B)
this is the number of occurrences of an event (i.e. a change of state) from A to B. This count refers to events anywhere in the sequence. I am suggesting this is slightly different therefore to the state count above, assuming I haven't inadvertently misrepresented things. I note that constraints can be set to output single or multiple occurences.
(A)-(A>B)
this counts the number of times state A occurs as a first state, and where the A to B event occurs anywhere in the sequence. This includes A to B events immediately after the first state, and can include intervening other states between the first state A and the event A to B.
I hope this helps, and I hope this is a correct set of statements (based on investigations later than my original question).
2nd Update after Gilbert's comment requesting an example....
For the real data set ... (where J and I take the place of A and B)
> data
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 I J J I J J I K J D J
2 G K R I J D J R I J N
3 K K I R M M K R J K I
4 R R B R I G R G R G G
5 J J J J J J J T Z J Z
6 R K R K M R R J J J R
7 J I I I I I I I I I I
8 J J J J J J J J J J R
9 J R J R J R J J I S R
10 J J J J J I J J J J J
11 G J J J J I I I R J J
12 I I D M D I I D I I D
13 R M R R J J J J J J J
then
> dataseq <- seqdef(data)
> dataseqe <- seqecreate(dataseq)
> datasubseq <- seqefsub(dataseqe, pMinSupport = 0.05)
> datasubseq[1:10]
gives
Subsequence Support Count
1 (J) 0.3846154 5
2 (J>I) 0.3846154 5
3 (R>J) 0.3846154 5
4 (J>R) 0.3076923 4
5 (I>J) 0.2307692 3
6 (J)-(J>I) 0.2307692 3
7 (K>R) 0.2307692 3
8 (R) 0.2307692 3
9 (D>J) 0.1538462 2
10 (G) 0.1538462 2
So ....
1) the count of 5 J-states (J)
applies only to the first column/occurrence, and not to any subsequent J-states. There is a total of 57 J-states.
2) the count of 5 J-state to I-state change events (J>I)
is a total count (for this constraint option), whenever they occur.
3) the count of 3 J-state followed by J-state-to-I-state subsequences (J)-(J>I)
are the counts of the events in row 7 (cols 1 & 2), row 9 (col 1, and cols 8 & 9 ) and lastly row 10 (col 1, and cols 5 & 6); the last two cases having intervening states and/or events between the (J)
and the (J>I)
.
Back then to the question - is this correct and expected behaviour, and a correct interpretation. If so, why are state counts done on a different basis to event/state change counts?
In your example the event sequences are derived from the state sequence object
dataseq
withseqecreate(dataseq)
. Since you don't provide thetevent
argument, the defaulttevent = "transition"
is used (seehelp(seqecreate)
). With this value, the events are defined as the transitions from a stateA
to a stateB
and are labeledA>B
. In addition, a specific event labeledA
is associated to the sequence start to indicate the state at the beginning of the sequence. So, although the same symbol is used,A
in event sequences is an event---the start event---and should not be confused with theA
in state sequences where it is a state.The above is specific to the
tevent="transition"
option. For instance, withtevent="state"
, the events would be the start of the spells and labeled asA
to indicate the start of a spell in stateA
. In that case the eventA
could occur anywhere in the sequence, not only at the start.Now about the parentheses. They indicate the transitions (or transactions), a transition being defined as the set of simultaneous events that provoke the state change. For example:
(a,b)
indicates that two eventsa
andb
occur at the same time point,(A>C)
means that we have the single eventA>C
at the time point.(a)-(b)
denotes a sequence of length 2 where eventa
precedes eventb
.Update in response to Stephan's comment
Let's consider the following example
The state sequence has 5 spells, 2 in state
A
and 1 in each of the statesH
,B
, andG
. Now there are different possibilities to convert this state sequence into an event sequence. Thetevent='state'
andtevent='transition'
are just two possibilities out of many.Using
tevent='state'
we get an event sequence where the event(A)
occurs twice because we have two spells in stateA
. Each of these two spells is initiated by the same event(A)
that does not account for the preceding state.Looking at the event sequence obtained with the
tevent='transition'
option, we observe that the spells inA
are here initiated by two different events(H>A)
and(B>A)
that account for the preceding state.The first event sequence has two subsequences
(H)-(A)
, which correspond to the subsequences(H)-(H>A)
and(H)-(B>A)
in the second event sequence.