How to create a fluent query interface?

2019-01-11 07:36发布

问题:

I know how to chain class methods (with the "return $this" and all), but what i am trying to do is to chain them in a smart way, have a look at this:

$albums = $db->select('albums')->where('x', '>', '20')->limit(2)->order('desc');

What i could understand from this code sample is that the first 3 methods (select, where, limit) build the query statement that will be executed, and the last one (order) comes to finish the statement and then executes it and throws back the result, right ?

But, this isn't the case, because i can easily drop any of these methods (except "select" of course) or - more importantly - change their order and nothing will go wrong !! That means that the method "select" handles the work, right ? Then how the other 3 methods add to/affect the query statement after the method "select" was already called !??

回答1:

How to implement composable queries: the 10k feet view

It's not difficult to realize that in order to achieve this the methods being chained must incrementally set up some data structure which is finally being interpreted by some method that executes the final query. But there are some degrees of freedom regarding exactly how this can be orchestrated.

The example code is

$albums = $db->select('albums')->where('x', '>', '20')->limit(2)->order('desc');

What do we see here?

  1. There is some type, which $db is an instance of, that exposes at least a select method. Note that if you want to be able to fully reorder the calls this type needs to expose methods with all of the possible signatures that can take part in the call chain.
  2. Each of the chained methods returns an instance of something that exposes methods will all relevant signatures; this may or may not be the same type as $db.
  3. After the "query plan" has been collected, we need to call some method to actually execute it and return the results (a process which I am going to call materializing the query). This method can only be the last one in the call chain for obvious reasons, but in this case the last method is order, which does not seem right: we want to be able to move it earlier in the chain after all. Let's keep this in mind.

Therefore we can break down what happens in three distinct steps.

Step 1: Kicking off

We established that there needs to be at least one type that collects information about the query plan. Let's assume the type looks like this:

interface QueryPlanInterface
{
    public function select(...);
    public function limit(...);
    // etc
}

class QueryPlan implements QueryPlanInterface
{
    private $variable_that_points_to_data_store;
    private $variables_to_hold_query_description;

    public function select(...)
    {
        $this->encodeSelectInformation(...);
        return $this;
    }

    // and so on for the rest of the methods; all of them return $this
}

QueryPlan needs appropriate properties to remember not only what query it should produce, but also where to direct that query because it is an instance of this type you will have on hand at the end of the call chain; both pieces of information are necessary in order for the query to be materialized. I have also provided a QueryPlanInterface type; its significance will be made clear later on.

Does this mean that $db is of type QueryPlan? At first sight you might say yes, but upon closer inspection issues start to arise from such an arrangement. The biggest problem is stale state:

// What would this code do?
$db->limit(2);

// ...a little later...
$albums = $db->select('albums');

How many albums is this going to retrieve? Since we did not "reset" the query plan it should be 2. But that's not obvious at all from the last line, which reads very differently. This is a bad arrangement that can lead to unnecessary bugs.

So how to solve this problem? One option would be for select to reset the query plan, but this runs into the opposite issue: $db->limit(1)->select('albums') now selects all albums. This doesn't look nice.

The option would be to "kick off" the chain by arranging for the first call to return a new QueryPlan instance. This way each chain operates on a separate query plan, and while you can compose a query plan bit by bit you can no longer do it by accident. So you could have:

class DatabaseTable
{
    public function query()
    {
        return new QueryPlan(...); // pass in data store-related information
    }
}

which solves all these problem but requires you to always write ->query() in front:

$db->query()->limit(1)->select('albums');

What if you don't want to have this extra call? In that case class DatabaseTable has to implement QueryPlanInterface as well, with the difference that the implementation will create a new QueryPlan each time:

class DatabaseTable implements QueryPlanInterface
{

    public function select(...)
    {
        $q = new QueryPlan();
        return $q->select(...);
    }

    public function limit(...)
    {
        $q = new QueryPlan();
        return $q->limit(...);
    }

    // and so on for the rest of the methods
}

You can now write $db->limit(1)->select('albums') without any problem; the arrangement can be described as "each time you write $db->something(...) you start composing a new query that is independent of all previous and future ones".

Step 2: Chaining

This is actually the easiest part; we already saw how the methods in QueryPlan always return $this to enable chaining.

Step 3: Materializing

We still need some way to say "OK, I 'm done composing; get me the results". It is perfectly possible to use a dedicated method for this purpose:

interface QueryPlanInterface
{
    // ...other methods as above...
    public function get(); // this executes the query and returns the results
}

This enables you to write

$anAlbum = $db->limit(1)->select('albums')->get();

There is nothing wrong and lots of right with this solution: it's obvious at which point the actual query is executed. But the question uses an example that does not appear to work like that. Is it possible to achieve such syntax?

The answer is yes and no. Yes in that it is indeed possible, but no in the sense that the semantics of what happens will have to change.

PHP has no facility that enables a method to be "automatically" called, so there has to be something that triggers the materialization, even if that something does not look like a method call at first sight. But what? Well, think about what is perhaps the most common use case:

$albums = $db->select('albums'); // no materialization yet
foreach ($albums as $album) {
    // ...
}

Can this be made to work? Sure, as long as QueryPlanInterface extends IteratorAggregate:

interface QueryPlanInterface extends IteratorAggregate
{
    // ...other methods as above...
    public function getIterator();
}

The idea here is that the foreach triggers a call to getIterator, which in turn will create an instance of yet another class injected with all the information that the implementation of QueryPlanInterface has compiled. This class will execute the actual query on the spot and materialize the results on demand during the iteration.

I have chosen to implement IteratorAggregate and not Iterator specifically so that the iteration state can go into a new instance, which allows multiple iterations over the same query plan to go on in parallel without problems.

Finally, this foreach trick looks neat but what about the other common use case (getting the query results into an array)? Have we made that unwieldy?

Not really, thanks to iterator_to_array:

$albums = iterator_to_array($db->select('albums'));

Conclusion

Does this require lots of code to be written? For sure. We have DatabaseTable, QueryPlanInterface, QueryPlan itself and also the QueryPlanIterator we have described but not shown. In addition, all the encoded state that these classes aggregate will probably need to be kept in instances of yet more classes.

Is it worth it? Quite likely. That's because this kind of solution offers:

  • an attractive fluent interface (chainable calls) with clear semantics (each time you kick off you start describing a new query independent of any other)
  • decoupling of the query interface from the data store (each instance of QueryPlan keeps a handle on an abstract data store, so you can theoretically query anything from relational databases to flat text files using the same syntax)
  • composability (you can start composing a QueryPlan now and continue doing so in the future, even in another method)
  • reusability (you can materialize each QueryPlan more than once)

Not a bad package at all.



回答2:

This takes a really elegant solution.

Instead of reinventing the wheel, look into an existing framework(s).

I suggest Laravel using the Eloquent ORM. You'll be able to do this and MUCH more.



回答3:

You will probably need a method that kicks of the actual query while the methods like select and order_by just store the information up to that point.

You can make this implicit though if you implement the Iterator interface, and run the query the first time rewind or current got hit (think foreach), or the Countable, so result counts could be generated by calling count() with the object. I'm personally would not like to use a library built like this, i would much more likely appreciate an explicit call so i can see where the queries triggered to run.