I am trying to return from a graph database the students with the most consecutive passes to a series of exams.
Below is my current code but not sure where I can take it from its current state to solve my issue.
MATCH (s:Student)-[r:TAKEN]->(e:Exam)
RETURN s.name, e.date,
CASE
WHEN r.score >= e.pass_mark THEN 1
END as pass
ORDER BY e.date
I would like a basic table which would show the student along with there most consecutive passes in a row.
For example:
| student | pass/fail |
| joe | pass |
| matt | pass |
| joe | fail |
| matt | pass |
| joe | pass |
| matt | pass |
| joe | pass |
| matt | fail |
And I would like the result from my query to display each student and their highest pass streak in a row.
| student | passes in a row |
| joe | 2 |
| matt | 3 |
I have been playing around with CASE
but haven't been able to come up with a good solution, currently, it will just list all the students for each exam and have a 1 if they have passed it.
This is a tricky one, and as far as I know can't be done with just Cypher, but there is a procedure in APOC Procedures that can help.
apoc.coll.split()
takes a collection and a value to split around, and will yield records for each resulting sub-collection. Basically, we collect the ordered results per student, split around failures to get collections of consecutive passes, then get the max consecutive passes from the sizes of those collections:You can do it with plain Cypher, but I don't think it's very practical - you essentially need to write a program with
reduce
.Basically, the "split" works as follows: initialize an empty accumulator list and calculate streaks by iterating through the list of passes/fails, check whether the current element is the same as the previous one. For example
['pass', 'pass']
keeps the streak,['pass', 'fail']
breaks it. If it breaks (like at the start of the list), append a new element to the accumulator. If it keeps, append a new element to the last element of the accumulator, e.g. with a new'fail'
,[['pass', 'pass'], ['fail']]
becomes[['pass', 'pass'], ['fail', 'fail]]
.In step (1), this calculates streaks such as:
And in (2), it gives:
Of course, in this particular case it's not necessary to do the splitting: simply counting would be enough. But in 99% of practical situations, APOC is the way to go, so I did not bother optimising this solution.