Hopefully a quick one ....
Regarding the output from seqefsub()
operations, please point me to a definition of the output notation.
To be more specific, the parentheses in e.g.
(A)
means what;- the greater than sign in
(A>B)
means what; - and the hyphen in
(A)-(A>B)
means what.
Section 10
of the excellent User Guide has examples, but I may have missed an unambiguous definition statement somewhere.
To quote the example in Section 10.2
of the guide, what is the conceptual difference between (Parent)-(Parent>Left)
and just (Parent>Left)
?
Thanks,
Dave
Update after Gilbert's comment....
In attempting to clarify what I perhaps missed on page 106 of the user guide, I think the explanation - or at least confirmation - that I was looking for was something along the lines of the following framework. Apologies for the possible clumsy wordiness.
The context here is when seqefsub()
results appear in the console....
(A)
this is the number of times state A appears as the first state, and not as any subsequent state. That is - it counts the number of times A appears in the first column. I assume here that I haven't missed another configuration option that counts first and all subsequent states of this type. If there is please let me know.
(A>B)
this is the number of occurrences of an event (i.e. a change of state) from A to B. This count refers to events anywhere in the sequence. I am suggesting this is slightly different therefore to the state count above, assuming I haven't inadvertently misrepresented things. I note that constraints can be set to output single or multiple occurences.
(A)-(A>B)
this counts the number of times state A occurs as a first state, and where the A to B event occurs anywhere in the sequence. This includes A to B events immediately after the first state, and can include intervening other states between the first state A and the event A to B.
I hope this helps, and I hope this is a correct set of statements (based on investigations later than my original question).
2nd Update after Gilbert's comment requesting an example....
For the real data set ... (where J and I take the place of A and B)
> data
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 I J J I J J I K J D J
2 G K R I J D J R I J N
3 K K I R M M K R J K I
4 R R B R I G R G R G G
5 J J J J J J J T Z J Z
6 R K R K M R R J J J R
7 J I I I I I I I I I I
8 J J J J J J J J J J R
9 J R J R J R J J I S R
10 J J J J J I J J J J J
11 G J J J J I I I R J J
12 I I D M D I I D I I D
13 R M R R J J J J J J J
then
> dataseq <- seqdef(data)
> dataseqe <- seqecreate(dataseq)
> datasubseq <- seqefsub(dataseqe, pMinSupport = 0.05)
> datasubseq[1:10]
gives
Subsequence Support Count
1 (J) 0.3846154 5
2 (J>I) 0.3846154 5
3 (R>J) 0.3846154 5
4 (J>R) 0.3076923 4
5 (I>J) 0.2307692 3
6 (J)-(J>I) 0.2307692 3
7 (K>R) 0.2307692 3
8 (R) 0.2307692 3
9 (D>J) 0.1538462 2
10 (G) 0.1538462 2
So ....
1) the count of 5 J-states (J)
applies only to the first column/occurrence, and not to any subsequent J-states. There is a total of 57 J-states.
2) the count of 5 J-state to I-state change events (J>I)
is a total count (for this constraint option), whenever they occur.
3) the count of 3 J-state followed by J-state-to-I-state subsequences (J)-(J>I)
are the counts of the events in row 7 (cols 1 & 2), row 9 (col 1, and cols 8 & 9 ) and lastly row 10 (col 1, and cols 5 & 6); the last two cases having intervening states and/or events between the (J)
and the (J>I)
.
Back then to the question - is this correct and expected behaviour, and a correct interpretation. If so, why are state counts done on a different basis to event/state change counts?