Preface: I'm attempting to use the repository pattern in an MVC architecture with relational databases.
I've recently started learning TDD in PHP, and I'm realizing that my database is coupled much too closely with the rest of my application. I've read about repositories and using an IoC container to "inject" it into my controllers. Very cool stuff. But now have some practical questions about repository design. Consider the follow example.
<?php
class DbUserRepository implements UserRepositoryInterface
{
protected $db;
public function __construct($db)
{
$this->db = $db;
}
public function findAll()
{
}
public function findById($id)
{
}
public function findByName($name)
{
}
public function create($user)
{
}
public function remove($user)
{
}
public function update($user)
{
}
}
Issue #1: Too many fields
All of these find methods use a select all fields (SELECT *
) approach. However, in my apps, I'm always trying to limit the number of fields I get, as this often adds overhead and slows things down. For those using this pattern, how do you deal with this?
Issue #2: Too many methods
While this class looks nice right now, I know that in a real-world app I need a lot more methods. For example:
- findAllByNameAndStatus
- findAllInCountry
- findAllWithEmailAddressSet
- findAllByAgeAndGender
- findAllByAgeAndGenderOrderByAge
- Etc.
As you can see, there could be a very, very long list of possible methods. And then if you add in the field selection issue above, the problem worsens. In the past I'd normally just put all this logic right in my controller:
<?php
class MyController
{
public function users()
{
$users = User::select('name, email, status')
->byCountry('Canada')->orderBy('name')->rows();
return View::make('users', array('users' => $users));
}
}
With my repository approach, I don't want to end up with this:
<?php
class MyController
{
public function users()
{
$users = $this->repo->get_first_name_last_name_email_username_status_by_country_order_by_name('Canada');
return View::make('users', array('users' => $users))
}
}
Issue #3: Impossible to match an interface
I see the benefit in using interfaces for repositories, so I can swap out my implementation (for testing purposes or other). My understanding of interfaces is that they define a contract that an implementation must follow. This is great until you start adding additional methods to your repositories like findAllInCountry()
. Now I need to update my interface to also have this method, otherwise, other implementations may not have it, and that could break my application. By this feels insane...a case of the tail wagging the dog.
Specification Pattern?
This leads me to believe that repository should only have a fixed number of methods (like save()
, remove()
, find()
, findAll()
, etc). But then how do I run specific lookups? I've heard of the Specification Pattern, but it seems to me that this only reduces an entire set of records (via IsSatisfiedBy()
), which clearly has major performance issues if you're pulling from a database.
Help?
Clearly, I need to rethink things a little when working with repositories. Can anyone enlighten on how this is best handled?
I suggest https://packagist.org/packages/prettus/l5-repository as vendor to implement Repositories/Criterias etc ... in Laravel5 :D
I agree with @ryan1234 that you should pass around complete objects within the code and should use generic query methods to get those objects.
For external/endpoint usage I really like the GraphQL method.
These are some different solutions I've seen. There are pros and cons to each of them, but it is for you to decide.
Issue #1: Too many fields
This is an important aspect especially when you take in to account Index-Only Scans. I see two solutions to dealing with this problem. You can update your functions to take in an optional array parameter that would contain a list of a columns to return. If this parameter is empty you'd return all of the columns in the query. This can be a little weird; based off the parameter you could retrieve an object or an array. You could also duplicate all of your functions so that you have two distinct functions that run the same query, but one returns an array of columns and the other returns an object.
Issue #2: Too many methods
I briefly worked with Propel ORM a year ago and this is based off what I can remember from that experience. Propel has the option to generate its class structure based off the existing database schema. It creates two objects for each table. The first object is a long list of access function similar to what you have currently listed;
findByAttribute($attribute_value)
. The next object inherits from this first object. You can update this child object to build in your more complex getter functions.Another solution would be using
__call()
to map non defined functions to something actionable. Your__call
method would be would be able to parse the findById and findByName into different queries.I hope this helps at least some what.
I use the following interfaces:
Repository
- loads, inserts, updates and deletes entitiesSelector
- finds entities based on filters, in a repositoryFilter
- encapsulates the filtering logicMy
Repository
is database agnostic; in fact it doesn't specify any persistence; it could be anything: SQL database, xml file, remote service, an alien from outer space etc. For searching capabilities, theRepository
constructs anSelector
which can be filtered,LIMIT
-ed, sorted and counted. In the end, the selector fetches one or moreEntities
from the persistence.Here is some sample code:
Then, one implementation:
The ideea is that the generic
Selector
usesFilter
but the implementationSqlSelector
usesSqlFilter
; theSqlSelectorFilterAdapter
adapts a genericFilter
to a concreteSqlFilter
.The client code creates
Filter
objects (that are generic filters) but in the concrete implementation of the selector those filters are transformed in SQL filters.Other selector implementations, like
InMemorySelector
, transform fromFilter
toInMemoryFilter
using their specificInMemorySelectorFilterAdapter
; so, every selector implementation comes with its own filter adapter.Using this strategy my client code (in the bussines layer) doesn't care about a specific repository or selector implementation.
P.S. This is a simplification of my real code
I thought I'd take a crack at answering my own question. What follows is just one way of solving the issues 1-3 in my original question.
Disclaimer: I may not always use the right terms when describing patterns or techniques. Sorry for that.
The Goals:
Users
.The Solution
I'm splitting my persistent storage (database) interaction into two categories: R (Read) and CUD (Create, Update, Delete). My experience has been that reads are really what causes an application to slow down. And while data manipulation (CUD) is actually slower, it happens much less frequently, and is therefore much less of a concern.
CUD (Create, Update, Delete) is easy. This will involve working with actual models, which are then passed to my
Repositories
for persistence. Note, my repositories will still provide a Read method, but simply for object creation, not display. More on that later.R (Read) is not so easy. No models here, just value objects. Use arrays if you prefer. These objects may represent a single model or a blend of many models, anything really. These are not very interesting on their own, but how they are generated is. I'm using what I'm calling
Query Objects
.The Code:
User Model
Let's start simple with our basic user model. Note that there is no ORM extending or database stuff at all. Just pure model glory. Add your getters, setters, validation, whatever.
Repository Interface
Before I create my user repository, I want to create my repository interface. This will define the "contract" that repositories must follow in order to be used by my controller. Remember, my controller will not know where the data is actually stored.
Note that my repositories will only every contain these three methods. The
save()
method is responsible for both creating and updating users, simply depending on whether or not the user object has an id set.SQL Repository Implementation
Now to create my implementation of the interface. As mentioned, my example was going to be with an SQL database. Note the use of a data mapper to prevent having to write repetitive SQL queries.
Query Object Interface
Now with CUD (Create, Update, Delete) taken care of by our repository, we can focus on the R (Read). Query objects are simply an encapsulation of some type of data lookup logic. They are not query builders. By abstracting it like our repository we can change it's implementation and test it easier. An example of a Query Object might be an
AllUsersQuery
orAllActiveUsersQuery
, or evenMostCommonUserFirstNames
.You may be thinking "can't I just create methods in my repositories for those queries?" Yes, but here is why I'm not doing this:
password
field if I'm looking to list all my users?For my example I'll create a query object to lookup "AllUsers". Here is the interface:
Query Object Implementation
This is where we can use a data mapper again to help speed up development. Notice that I am allowing one tweak to the returned dataset—the fields. This is about as far as I want to go with manipulating the performed query. Remember, my query objects are not query builders. They simply perform a specific query. However, since I know that I'll probably be using this one a lot, in a number of different situations, I'm giving myself the ability to specify the fields. I never want to return fields I don't need!
Before moving on to the controller, I want to show another example to illustrate how powerful this is. Maybe I have a reporting engine and need to create a report for
AllOverdueAccounts
. This could be tricky with my data mapper, and I may want to write some actualSQL
in this situation. No problem, here is what this query object could look like:This nicely keeps all my logic for this report in one class, and it's easy to test. I can mock it to my hearts content, or even use a different implementation entirely.
The Controller
Now the fun part—bringing all the pieces together. Note that I am using dependency injection. Typically dependencies are injected into the constructor, but I actually prefer to inject them right into my controller methods (routes). This minimizes the controller's object graph, and I actually find it more legible. Note, if you don't like this approach, just use the traditional constructor method.
Final Thoughts:
The important things to note here are that when I'm modifying (creating, updating or deleting) entities, I'm working with real model objects, and performing the persistance through my repositories.
However, when I'm displaying (selecting data and sending it to the views) I'm not working with model objects, but rather plain old value objects. I only select the fields I need, and it's designed so I can maximum my data lookup performance.
My repositories stay very clean, and instead this "mess" is organized into my model queries.
I use a data mapper to help with development, as it's just ridiculous to write repetitive SQL for common tasks. However, you absolutely can write SQL where needed (complicated queries, reporting, etc.). And when you do, it's nicely tucked away into a properly named class.
I'd love to hear your take on my approach!
July 2015 Update:
I've been asked in the comments where I ended up with all this. Well, not that far off actually. Truthfully, I still don't really like repositories. I find them overkill for basic lookups (especially if you're already using an ORM), and messy when working with more complicated queries.
I generally work with an ActiveRecord style ORM, so most often I'll just reference those models directly throughout my application. However, in situations where I have more complex queries, I'll use query objects to make these more reusable. I should also note that I always inject my models into my methods, making them easier to mock in my tests.
Based on my experience, here are some answers to your questions:
Q: How do we deal with bringing back fields we don't need?
A: From my experience this really boils down to dealing with complete entities versus ad-hoc queries.
A complete entity is something like a
User
object. It has properties and methods, etc. It's a first class citizen in your codebase.An ad-hoc query returns some data, but we don't know anything beyond that. As the data gets passed around the application, it is done so without context. Is it a
User
? AUser
with someOrder
information attached? We don't really know.I prefer working with full entities.
You are right that you will often bring back data you won't use, but you can address this in various ways:
User
for the back end and maybe aUserSmall
for AJAX calls. One might have 10 properties and one has 3 properties.The downsides of working with ad-hoc queries:
User
, you'll end up writing essentially the sameselect *
for many calls. One call will get 8 of 10 fields, one will get 5 of 10, one will get 7 of 10. Why not replace all with one call that gets 10 out of 10? The reason this is bad is that it is murder to re-factor/test/mock.User
so slow?" you end up tracking down one-off queries and so bug fixes tend to be small and localized.Q: I will have too many methods in my repository.
A: I haven't really seen any way around this other than consolidating calls. The method calls in your repository really map to features in your application. The more features, the more data specific calls. You can push back on features and try to merge similar calls into one.
The complexity at the end of the day has to exist somewhere. With a repository pattern we've pushed it into the repository interface instead of maybe making a bunch of stored procedures.
Sometimes I have to tell myself, "Well it had to give somewhere! There are no silver bullets."