I have a database table like this:
Entity
---------------------
ID int PK
ParentID int FK
Code varchar
Text text
The ParentID
field is a foreign key with another record in the same table (recursive). So the structure represents a Tree.
I'm trying to write a method to query this table and get 1 specific Entity based on a path. A path would be a string representing the Code
properties of the Entity and the parent Entities. So an example path would be "foo/bar/baz"
which means the one specific Entity of which the Code == "baz"
, the parent's Code == "bar"
and the parent of the parent's Code == "foo"
.
My attempt:
public Entity Single(string path)
{
string[] pathParts = path.Split('/');
string code = pathParts[pathParts.Length -1];
if (pathParts.Length == 1)
return dataContext.Entities.Single(e => e.Code == code && e.ParentID == 0);
IQueryable<Entity> entities = dataContext.Entities.Where(e => e.Code == code);
for (int i = pathParts.Length - 2; i >= 0; i--)
{
string parentCode = pathParts[i];
entities = entities.Where(e => e.Entity1.Code == parentCode); // incorrect
}
return entities.Single();
}
I know this isn't correct because the Where
inside the for
loop just adds more conditions to the current Entity instead of the parent Entity, but how do I correct this? In words I would like the for-loop to say "and the parent's code must be x and the parent of that parent's code must be y, and the parent of that parent of that parent's code must be z .... etc". Besides that, for performance reasons I'd like it to be one IQueryable so there will be just 1 query going to the database.
I don't think traversing an hierarchical table using a single translated query is currently possible with Entity Framework. The reason is you'll need to implement either a loop or recursion and to my best knowledge neither can be translated into an EF object store query.UPDATE
@Bazzz and @Steven got me thinking and I have to admit I was completely wrong: it is possible and quite easy to construct an
IQueryable
for these requirements dynamically.The following function can be called recursively to build up the query:
The root query is a special case; here's a working example of calling
Traverse
:The DB is queried only once with this generated code:
And while I like the execution plan of the raw query (see below) a bit better, the approach is valid and perhaps useful.
End of UPDATE
Using IEnumerable
The idea is to grab the relevant data from the table in one go and then do the traversing in the application using LINQ to Objects.
Here's a recursive function that will get a node from a sequence:
You can use like this:
This will execute one DB query for each path part, so if you want the DB to only be queried once, use this instead:
An obvious optimization is to exclude the codes not present in our path before traversing:
This query should be fast enough unless most of your entities have similar codes. However, if you absolutely need top performance, you could use raw queries.
SQL Server Raw Query
For SQL Server a CTE-based query would probably be best:
Limiting data by the root node is easy and might be quite useful performance-wise:
Footnotes
All of this was tested with .NET 4.5, EF 5, SQL Server 2012. Data setup script:
All examples in my test returned the 'baz' entity with ID 3. It's assumed that the entity actually exists. Error handling is out of scope of this post.
UPDATE
To address @Bazzz's comment, the data with paths is shown below. Code is unique by level, not globally.
You need a recursive function instead of your loop. Something like this should do the job:
As you see, I am just returning the ultimate parent node. If you wanted to get a list of all EntityTable objects then I would make the recursive method to return a List of Ids of found nodes, and at the end - in the Single(...) method - run a simple LINQ query to get your IQueryable object using this list of IDs.
Edit: I tried to do your task but I think that there is a fundamental problem: there are cases when you are not able to identify a single path. For example, you have two pathes "foo/bar/baz" and "foo/bar/baz/bak" where "baz" entities are different. If you'll be seeking path "foo/bar/baz" then you'll always find two matching pathes (one would be partial of the four-entity path). Although you can get your "baz" entity correctly, but this is too confusing and I would just redesign this: either put a unique constraint so that each entity can only be used once, or store full path in the "Code" column.
The trick is to do it the other way around, and build up the following query:
A bit naive (hard coded) solution would be like this:
Doing this dynamically by building expression trees isn't trivial, but can be done by looking closely at what the C# compiler generates (using ILDasm or Reflector for instance). Here is an example: