EF Code First improving performance for self refer

2019-04-09 03:55发布

问题:

I have an AccountGroup which is a self-referencing entity. A leaf AccountGroup can contain 1 or more Accounts. Both entities have Balance property. Each AccountGroup has a Balance which is either a sum of Balances in sub-groups or sum of Balances of all Accounts (in case of leaf group).

In order to build a tree listing of all AccountGroups and Accounts I have to traverse this object graph recursively, which causes a lot (I mean a lot!!!) of calls to DB...

Is there any way to improve upon this in such way that # of DB calls is reduced?

Thanks

Here is the trimmed down code

Account (belongs to only 1 AccountGroup)

public class Account
{
    public int Id { get; set; }
    public int GroupId { get; set; }
    public string Name { get; set; }
    public decimal Balance { get; set; }
    public string AccountType { get; set; }

    public virtual AccountGroup Group { get; set; }
}

AccountGroup (has 0 or many AccountGroups, has 1 or more Accounts if it is a leaf)

public class AccountGroup
{
    public AccountGroup()
    {
        Accounts = new HashSet<Account>();
        Groups = new HashSet<AccountGroup>();
    }

    public int Id { get; set; }
    public bool IsRoot { get { return Parent == null; } }
    public bool IsLeaf { get { return !Groups.Any(); } }
    public decimal Balance { get { return IsLeaf ? Accounts.Sum(a => a.Balance) : Groups.Sum(g => g.Balance); } } // if leaf group, get sum of all account balances, otherwise get sum of all subgroups
    public int? ParentId { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public virtual ISet<Account> Accounts { get; private set; }
    public virtual ISet<AccountGroup> Groups { get; private set; }
    public virtual AccountGroup Parent { get; set; }
}

Calling Code

// start processing root groups (ones without parent)
foreach (var rootGroup in db.AccountGroups.Include(g=>g.Groups).Where(g => g.ParentId == null))
{
    TraverseAccountGroup(rootGroup, 0);
}

// recursive method
private static void TraverseAccountGroup(AccountGroup accountGroup, int level)
{
    //
    // process account group
    //
    Console.WriteLine("{0}{1} ({2})", String.Empty.PadRight(level * 2, '.'), accountGroup.Name, level);
    //
    // if subgroups exist, process recursivelly
    //
    if (accountGroup.Groups.Any())
    {
        foreach (var subGroup in accountGroup.Groups)
        {
            TraverseAccountGroup(subGroup, level + 1);
        }
    }
    //
    // otherwise, process accounts belonging to leaf subgroup
    //
    else
    {
        foreach (var account in accountGroup.Accounts)
        {
            Console.WriteLine("ACCOUNT [{0}]", account.Name);
        }
    }
}

回答1:

CTE Approach

There are two ways to increase speed of queries against tree data types. The first (and likely easiest) is using a Stored Procedure and the execute sql functionality of EF to load the tree. The SProc will cache and the result set execution speed will be increased. My recommendation for the query in the sproc would be a recursive CTE.

http://msdn.microsoft.com/en-us/library/ms186243(v=sql.105).aspx

with <CTEName> as
(
     SELECT
         <Root Query>
     FROM <TABLE>

     UNION ALL

     SELECT
         <Child Query>
     FROM <TABLE>
     INNER JOIN <CTEName>
         ON <CTEJoinCondition>
     WHERE 
          <TERMINATION CONDITION>

)

Edit

Execute your sproc or CTE inline with:

DbContext ctx = new SampleContext();
ctx.Database.SqlQuery<YourEntityType>(@"SQL OR SPROC COMMAND HERE", new[] { "Param1", "Param2", "Etc" });

Flatten Your Tree Structure

The second approach is to build a flat representation of your tree. You can flatten a tree into a flat structure for quick querying and then use a linkage between the flat structure and the actual tree node to cut out the self referencing entity. You can build the flat structure using the above recursive CTE query.

This is just one approach but there are many papers on the subject:

http://www.governor.co.uk/news-plus-views/2010/5/17/depth-first-tree-flattening-with-the-yield-keyword-in-c-sharp/

EDIT: Adding additional clarification Just a note, the Recursive CTE cache's the symbols for the query before iterating over the structure. This is the fastest and simplest way to write a query to solve your problem. However, this HAS to be a SQL query. You can use execute sql directly or you can execute a SProc. Sprocs cache the execution graph after being ran so they perform better than native queries that have to build an execution plan prior to running. This is entirely up to you.

The issue with a flat representation of your tree is you have to routinely rebuild or constantly upkeep the flat structure. Depending on your query path would determine what flattening algorithm you should use, but the end result remains the same. The flat structure is the only way to "accomplish" what you want to do inside EF without having to cheat and execute raw SQL through the DBConnection.