Find all events between 2 dates

2020-02-26 12:45发布

问题:

Trying to figure out the following scenario.

Setup:

  • Used the suggestions in this post to create a Time Tree http://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-a-time-tree-down-to-the-day/
  • Extended it to go to the Hour and Minutes (for the sake of simplicity, using 0,15,30,45
  • Connected the Minutes with :NEXT (similar to what's done on days, but on minutes)
  • Created several events attached to the minutes with "STARTS_AT" and "ENDS_AT"

Goals

  • Find all events occurring between two dates
  • Find conflicting events (events that overlap)

Attempted the following query:

// Get all events between 2 times
MATCH (m1:Minute { minute: 15 })<--(h:Hour { hour: 8 })<--(d:Day { day: 24 })<--(month:Month { month: 10 })
MATCH (m2:Minute { minute: 45 })<--(h2:Hour { hour: 10 })<--(d2:Day { day: 24 })<--(month2:Month { month: 10 })
MATCH p=((m1)-[:NEXT*]->(m2))
WITH NODES(p) as pnodes UNWIND pnodes AS pwind
MATCH (e:Event)-[:STARTS_AT]->(pwind)
RETURN pwind,e

The results are indeed being retrieved, but noticed that:

  1. If I try to use "shortestpath" instead of regular path, the query only works for a certain threshold (bizarre, can't explain this). For example, if the timespan is more than 2-3 hours it doesn't work; same applies to multiple days, etc.
  2. Without using "shortestpath", the performance is TERRIBLE. 25 seconds to return only 2-3 items.

Another variation using where (tried it only for future dates):

// Get all events between 2 times
MATCH (e:Event)
WHERE (:Month { month: 10 })-->(:Day { day: 24 })-->(:Hour { hour: 9 })-->(:Minute { minute: 00})-[:NEXT*]->(:Minute)<--(e)
RETURN e

Results: the performance is even WORSE. 100 seconds to retrieve 1 item.

The way I understand and would like to do this, is by using some sort of function that allows the path to return related nodes. This is: path function returns only the specific node being queried (in this case Minutes), but I would like to bring for ALL THOSE MINUTES, the Events associated by ":STARTS_AT".

Finally, the questions:

  • What's the recommended way to perform this query?
  • Is this a scenario that is supported by Cypher and neo4j?
  • Would it be preferrable to "fall back" to property based time querying instead of trying to attempt graph based time queries?

Thanks in advance.

回答1:

So there's this weird thing with shortestPath where if you don't specify a maximum length, it arbitrarily sets the max to 15. See here:

ShortestPath doesn't find any path without max hops limit

I would actually call this a bug, because it's not in the documentation and it leads to unexpected behavior, as you've found.

So the solution to your problem is to use shortestPath but pick a maximum length. I'd choose something really high; let's do a billion and call it a day:

MATCH (:Year {year:2015})-[:HAS_MONTH]->(:Month {month:10})-[:HAS_DAY]->(:Day {day:23})-[:HAS_HOUR]->(:Hour {hour:8})-[:HAS_MINUTE]->(startMinute:Minute {minute:15})
MATCH (:Year {year:2015})-[:HAS_MONTH]->(:Month {month:10})-[:HAS_DAY]->(:Day {day:24})-[:HAS_HOUR]->(:Hour {hour:10})-[:HAS_MINUTE]->(endMinute:Minute {minute:45})
MATCH p = shortestPath((startMinute)-[:NEXT*..1000000000]->(endMinute))
UNWIND NODES(p) AS minute
MATCH (event:Event)-[:STARTS_AT]->(minute)
RETURN event, minute;

You should always use shortestPath for finding the span of minute nodes; matching on (startMinute)-[:NEXT*]->(endMinute) without wrapping it in shortestPath is trying to find all paths of any length between the two nodes, so it's exhaustive and takes much longer, whereas shortestPath can stop as soon as it's found the path.

As far as finding if any other events overlap with a certain event:

MATCH (startMinute:Minute)<-[:STARTS_AT]-(event:Event)-[:ENDS_AT]->(endMinute:Minute)
WHERE event.id = {event_id}
MATCH p = shortestPath((startMinute)-[:NEXT*..1000000000]->(endMinute))
UNWIND NODES(p) AS span
MATCH (overlap:Event)-[:STARTS_AT|ENDS_AT]->(span)
WHERE overlap <> event
RETURN overlap;

Below is an appendix of how the data was created for proof-of-concept purposes. Assume all months have 31 days.

Constraints and indexes.

CREATE CONSTRAINT ON (year:Year) ASSERT year.year IS UNIQUE;
CREATE INDEX ON :Month(month);
CREATE INDEX ON :Day(day);
CREATE INDEX ON :Hour(hour);
CREATE INDEX ON :Minute(minute);

Create the time tree.

WITH RANGE(2014, 2015) AS years, RANGE(1, 12) AS months, RANGE(1, 31) AS days, RANGE(0,23) AS hours, RANGE(0, 45, 15) AS minutes
FOREACH(year IN years | 
  MERGE (y:Year {year: year})
  FOREACH(month IN months | 
    CREATE (m:Month {month: month})
    MERGE (y)-[:HAS_MONTH]->(m)
    FOREACH(day IN days |      
      CREATE (d:Day {day: day})
      MERGE (m)-[:HAS_DAY]->(d)
      FOREACH(hour IN hours |
        CREATE (h:Hour {hour: hour})
        MERGE (d)-[:HAS_HOUR]->(h)
        FOREACH(minute IN minutes |
          CREATE (min:Minute {minute: minute})
          MERGE (h)-[:HAS_MINUTE]->(min)
        )
      )
    )
  )
);

Create [:NEXT] relationships between all the Minute nodes.

MATCH (year:Year)-[:HAS_MONTH]->(month:Month)-[:HAS_DAY]->(day:Day)-[:HAS_HOUR]->(hour:Hour)-[:HAS_MINUTE]->(minute:Minute)
WITH year, month, day, hour, minute
ORDER BY year.year, month.month, day.day, hour.hour, minute.minute
WITH COLLECT(minute) AS minutes
FOREACH(i IN RANGE(0, LENGTH(minutes) - 2) | 
    FOREACH(min1 IN [minutes[i]] | 
        FOREACH(min2 IN [minutes[i + 1]] | 
            CREATE UNIQUE (min1)-[:NEXT]->(min2)
        )
    )
);

Randomly simulate events and their start times.

MATCH (minute:Minute)
WHERE RAND() < 0.3
CREATE (event:Event)-[:STARTS_AT]->(minute);

Make all events 5 minute blocks long.

MATCH (event:Event)-[:STARTS_AT]->(startMinute:Minute)-[:NEXT*5]->(endMinute:Minute)
CREATE (event)-[:ENDS_AT]->(endMinute);


标签: neo4j cypher