I'm interested in some of the design behind Rails ActiveRecord, Doctrine for PHP (and similar ORMs).
- How does an ORM manage to accomplish features like chained accessors and how deep are they typically expected to work?
- How does an ORM construct queries internally?
- How does an ORM manage the queries while sustaining the arbitrary nature of all that is expected of it?
Obviously this is an academic question, but all natures of answers are welcome!
(My language of choice is OO PHP5.3!)
Chained method calls are orthogonal to the ORM question, they're used all over the place in OOP. A chain-able method simply returns a reference to the current object, allowing the return value to be called. In PHP
class A {
public function b() {
...
return $this;
}
public function c($param) {
...
return $this;
}
}
$foo = new A();
$foo->b()->c('one');
// chaining is equivilant to
// $foo = $foo->b();
// $foo = $foo->c();
As for how queries are constructed, there are two methods. In ActiveRecord like ORMs there's code that examines the Database's meta-data. Most databases has some kind of SQL or SQL like commands to view this meta-data. (MySQL's DESCRIBE TABLE
, Oracle's USER_TAB_COLUMNS
table, etc.)
Some ORMs have you describe your database tables in a neutral language such as YAML. Others might infer a database structure from the way you've created your Object models (I want to say Django does this, but it's been a while since I looked at it). Finally there's a hybrid approach, where either of the previous two techniques are used, but a separate tool is provided to automatically generate the YAML/etc. or class files.
One the names and data-types of a table are known, it's pretty easy to pragmatically write a SQL query that returns all the rows, or a specific set of rows that meet a certain criteria.
As for your last question,
How does an ORM manage the queries while
sustaining the arbitrary nature of all
that is expected of it?
I'd argue the answer is "not very well". Once you move beyond the one-table, one-object metaphor, each ORM has a different approach an philosophy as to how SQL queries should be used to model objects. In the abstract though, it's just as simple as adding new methods that construct queries based on the assumptions of the ORM (i.e. Zend_Db_Table's "findManyToManyRowset" method)
How does an ORM manage to accomplish features like chained accessors and how deep are they typically expected to work?
Nobody seems to have answered this. I can quickly describe how Doctrine does this in PHP.
In Doctrine, none of the fields which you see on an object model are actually defined for that class. So in your example, $car->owners, there is no actual field called 'owners' defined in $car's class.
Instead, the ORM uses magic methods like __get and __set. So when you use an expression like $car->color, internally PHP calls Doctrine_Record#__get('color').
At this point the ORM is free to satisfy this in anyway necessary. There are a lot of possible designs here. It can store these values in an array called $_values, for example, and then return $this->_values['color']. Doctrine in particular tracks not only the values for each record, but also its status relative to the persistence in the database.
One example of this that is not intuitive is with Doctrine's relations. When you get a reference to $car, it has a relationship to the People table that is called 'owners'. So the data for $car->owners is actually stored in a separate table from the data for $car itself. So the ORM has two choices:
- Each time you load a $user, the ORM automatically joins all related tables and populates that information into the object. Now when you do $car->owners, that data is already there. This method is slow, however, because objects may have many relationships, and those relationships may have relationships themselves. So you'd be adding a lot of joins and not necessarily even using that information.
- Each time you load a $user, the ORM notices which fields are loaded from the User table and it populates them, but any fields which are loaded from related tables are not loaded. Instead, some metadata is attached to those fields to mark them as being 'not loaded, but available'. Now when you write the expression $car->owners, the ORM sees that the 'owners' relationship has not been loaded, and it issues a separate query to get that information, add it into the object, and then return that data. This all happens transparently without you needing to realize it.
Of course, Doctrine uses #2, since #1 becomes unwieldy for any real production site with moderate complexity. But it also has side-effects. If you are using several relations on $car, then Doctrine will load each one separately, as you access it. So you end up running 5-6 queries when maybe only 1 was required.
Doctrine allows you to optimize this situation by using Doctrine Query Language. You tell DQL that you want to load a car object, but also join it to its owners, manufacturer, titles, liens, etc. and it will load all of that data into objects.
Whew! Long response. Basically, though, you've gotten at the heart of "What is the purpose of an ORM?" and "Why should we use one?" The ORM allows us to continue thinking in object mode at most times, but the abstraction is not perfect and the leaks in the abstraction tend to come out as performance penalties.
I created a presentation on the topic of building a PHP DataMapper that might be interesting to you. It was recorded on video at the Oklahoma City Coworking Collaborative when I presented it there for the PHP user group:
Video:
http://blip.tv/file/2249586/
Presentation Slides:
http://www.slideshare.net/vlucas/building-data-mapper-php5-presentation
The presentation was basically the early concept of phpDataMapper, though a lot has changed since.
Hope they help you understand the inner workings of ORMs a bit better.
Chained accessors aren't really a big deal: you return $this
from the setter method. Boom, done, works at as many levels as you like.