(New SQLAlchemy user alert) I have three tables: a person, the persons hourly rate starting at a specific date, and daily time reporting. I am looking for the correct way to have the cost for a Time base off of the persons hourly rate on that day.
Yes, I could compute the value at the time of creation and have that as part of the model, but think of this as an example of summarizing more complex data behind the curtain. How do I compute the Time.cost? Is it a hybrid_propery, a column_property or something completely different?
class Person(Base):
__tablename__ = 'person'
personID = Column(Integer, primary_key=True)
name = Column(String(30), unique=True)
class Payrate(Base):
__tablename__ = 'payrate'
payrateID = Column(Integer, primary_key=True)
personID = Column(Integer, ForeignKey('person.personID'))
hourly = Column(Integer)
starting = Column(Date)
__tableargs__ =(UniqueConstraint('personID', 'starting',
name='uc_peron_starting'))
class Time(Base):
__tablename__ = 'entry'
entryID = Column(Integer, primary_key=True)
personID = Column(Integer, ForeignKey('person.personID'))
workedon = Column(Date)
hours = Column(Integer)
person = relationship("Person")
def __repr__(self):
return "<{date} {hours}hrs ${0.cost:.02f}>".format(self,
date=self.workedon.isoformat(), hours=to_hours(self.hours))
@property
def cost(self):
'''Cost of entry
'''
## This is where I am stuck in propery query creation
return self.hours * query(Payrate).filter(
and_(Payrate.personID==personID,
Payrate.starting<=workedon
).order_by(
Payrate.starting.desc())
The problem you have here, to solve as elegantly as possible, uses very advanced SQLAlchemy techniques, so I know you're a beginner, but this answer is going to show you all the way out to the end. However, solving a problem like this requires walking through one step at a time, and you can get the answer you want in different ways as we go through.
Before you get into how to hybrid this or whatever, you need to think about the SQL. How can we query for Time.cost over an arbitrary series of rows? We can link Time to Person cleanly because we have a simple foreign key. But to link Time to Payrate, with this particular schema is tricky, because Time links to Payrate not just via person_id but also via workedon - in SQL we'd join to this most easily using "time.person_id = person.id AND time.workedon BETWEEN payrate.start_date AND payrate.end_date". But you don't have an "end_date" here, which means we have to derive that also. That derivation is the trickiest part, so what I came up with starts like this (I've lowercased your column names):
There might be other ways to get this, but that's what I came up with - other ways would almost certainly have some similar kind of thing going on (i.e. subqueries, joins).
So with a payrate starting/ending, we can figure out what a query would look like. We want to use BETWEEN to match a time entry to the date range, but the latest payrate entry will have NULL for the "end" date, so one way to work around that is to use COALESCE against a very high date (the other is to use conditionals):
Now what @hybrid can do for you in SQLAlchemy, when run at the SQL expression level, is exactly just the "entry.hours * payrate_derived.hourly" part, that's it. All the JOIN and such there, you'd need to provide externally to the hybrid.
So we need to stick that big subquery into this:
So let's figure out what
<SOMETHING>
is. Build up that SELECT as an object:The
cost()
hybrid, on the expression side, would need to refer to payrate_derived (we'll do the python side in a minute):Then in order to use our
cost()
hybrid, it would have to be in the context of a query that has that join. Note here we use Python'sdatetime.date.max
to get that max date (handy!):So that join is big, and klunky, and we will need to do it often, not to mention we're going to need to load that same collection in Python when we do our in-Python hybrid. We can map to it using
relationship()
, which means we have to set up custom join conditions, but also we need to actually map to that subquery, using a lesser-known technique called a non-primary mapper. A non-primary mapper gives you a way to map a class to some arbitrary table or SELECT construct just for the purposes of selecting rows. We normally never need to use this because Query already lets us query for arbitrary columns and subqueries, but to get it out of arelationship()
it needs a mapping. The mapping needs a primary key to be defined, and the relationship also needs to know which side of the relationship is "foreign". This is the most advanced part here and in this case it works out like this:So that's the last we'd have to see of that join. We can now do our query earlier as:
and finally we can wire up our new
payrate
relationship into the Python-level hybrid as well:The solution we have here took a lot of effort, but at least the most complex part, that payrate mapping, is entirely in just one place and we never need to look at it again.
Here's a full working example:
Output (first line is the aggregate version, remainder is the per-object):
Many times the best advice I can give is to just do it different. A multi-table calculated column like this is what database views are for. Build a view based on the Time table (or whatever else you want) with your calculated column in it, build a model based on the view, and you're set. This will likely be less stressful on the database as well. This is also a good example of why limiting design to what can be accomplished through automated migrations is dangerous.