I have two data sets in google app engine datastore.
class First_Set(db.Model):
start_time = db.DateTimeProperty()
end_time = db.DateTimeProperty()
data1 = db.FloatProperty()
...
class Second_Set(db.Model):
start_time = db.DateTimeProperty()
end_time = db.DateTimeProperty()
data2 = db.FloatProperty()
...
(They have other different data that's why they're in different datasets.)
I'd like to find the datastore IDs all the overlapping start_time and end_time across two datasets, ideally without pulling results from one and iterating the first results over the other.
A great visualization of the initial dataset is from here (it also has the problem solved in SQL):
1 |-----|
2 |-----|
3 |--|
4 |-----|
5 |-----|
6 |---|
7 |---|
8 |---|
9 |-----|
End result I need is something in the tune of (from the same example):
+----+---------------------+----+---------------------+
| id | start | id | end |
+----+---------------------+----+---------------------+
| 2 | 2008-09-01 15:02:00 | 1 | 2008-09-01 15:04:00 |
| 5 | 2008-09-01 16:19:00 | 4 | 2008-09-01 16:23:00 |
| 8 | 2008-09-01 16:20:00 | 4 | 2008-09-01 16:22:00 |
| 8 | 2008-09-01 16:20:00 | 5 | 2008-09-01 16:22:00 |
| 7 | 2008-09-01 18:18:00 | 9 | 2008-09-01 18:22:00 |
+----+---------------------+----+---------------------+
SQL solution is described in the example as below but I couldn't do this in datastore because of lack of JOIN:
SELECT v1.id, v1.start, v2.id, LEAST(v1.end,v2.end) AS end
FROM visits v1
JOIN visits v2 ON v1.id <> v2.id and v1.start >= v2.start and v1.start < v2.end
ORDER BY v1.start;
I understand that one-to-many version of this is rather straightforward using a ListProperty() (from this question).
Can anyone think of a solution to find the overlapping times (ideally in Python)?
Posting my solution with no JOINs, thanks to Shay's direction. Should be able to find overlaps over any number of datasets with minor edits (at least that's the theory).
My Python isn't that great but below should give the idea:
Look into Marzullo's algorithm its time efficiency is O(n log n).
There are also many question on Stackoverflow that cover overlapping intervals which can be used to solve your problem on AppEngine.