I am trying to return from a graph database the students with the most consecutive passes to a series of exams.
Below is my current code but not sure where I can take it from its current state to solve my issue.
MATCH (s:Student)-[r:TAKEN]->(e:Exam)
RETURN s.name, e.date,
CASE
WHEN r.score >= e.pass_mark THEN 1
END as pass
ORDER BY e.date
I would like a basic table which would show the student along with there most consecutive passes in a row.
For example:
| student | pass/fail |
| joe | pass |
| matt | pass |
| joe | fail |
| matt | pass |
| joe | pass |
| matt | pass |
| joe | pass |
| matt | fail |
And I would like the result from my query to display each student and their highest pass streak in a row.
| student | passes in a row |
| joe | 2 |
| matt | 3 |
I have been playing around with CASE
but haven't been able to come up with a good solution, currently, it will just list all the students for each exam and have a 1 if they have passed it.
You can do it with plain Cypher, but I don't think it's very practical - you essentially need to write a program with reduce
.
Basically, the "split" works as follows: initialize an empty accumulator list and calculate streaks by iterating through the list of passes/fails, check whether the current element is the same as the previous one. For example ['pass', 'pass']
keeps the streak, ['pass', 'fail']
breaks it. If it breaks (like at the start of the list), append a new element to the accumulator. If it keeps, append a new element to the last element of the accumulator, e.g. with a new 'fail'
, [['pass', 'pass'], ['fail']]
becomes [['pass', 'pass'], ['fail', 'fail]]
.
UNWIND
[
['joe', 'pass'],
['matt', 'pass'],
['joe', 'fail'],
['matt', 'pass'],
['joe', 'pass'],
['matt', 'pass'],
['joe', 'pass'],
['matt', 'fail']
] AS row
WITH row[0] AS s, row[1] AS passed
WITH s, collect(passed) AS p
WITH s, reduce(acc = [], i IN range(0, size(p) - 1) |
CASE p[i] = p[i-1]
WHEN true THEN [j IN range(0, size(acc) - 1) |
CASE j = size(acc) - 1
WHEN true THEN acc[j] + [p[i]]
ELSE acc[j]
END
]
ELSE acc + [[p[i]]]
END
) AS streaks // (1)
UNWIND streaks AS streak
WITH s, streak
WHERE streak[0] <> 'fail'
RETURN s, max(size(streak)) AS consecutivePasses // (2)
In step (1), this calculates streaks such as:
╒══════╤═════════════════════════════════╕
│"s" │"streaks" │
╞══════╪═════════════════════════════════╡
│"matt"│[["pass","pass","pass"],["fail"]]│
├──────┼─────────────────────────────────┤
│"joe" │[["fail"],["pass","pass"]] │
└──────┴─────────────────────────────────┘
And in (2), it gives:
╒══════╤═══════════════════╕
│"s" │"consecutivePasses"│
╞══════╪═══════════════════╡
│"matt"│3 │
├──────┼───────────────────┤
│"joe" │2 │
└──────┴───────────────────┘
Of course, in this particular case it's not necessary to do the splitting: simply counting would be enough. But in 99% of practical situations, APOC is the way to go, so I did not bother optimising this solution.
This is a tricky one, and as far as I know can't be done with just Cypher, but there is a procedure in APOC Procedures that can help.
apoc.coll.split()
takes a collection and a value to split around, and will yield records for each resulting sub-collection. Basically, we collect the ordered results per student, split around failures to get collections of consecutive passes, then get the max consecutive passes from the sizes of those collections:
MATCH (s:Student)-[r:TAKEN]->(e:Exam)
WITH s, r.score >= e.pass_mark as passed
ORDER BY e.date
WITH s, collect(passed) as resultsColl
CALL apoc.coll.split(resultsColl, false) YIELD value
WITH s, max(size(value)) as consecutivePasses
RETURN s.name as student, consecutivePasses