Filter all navigation properties before they are l

2019-03-07 22:12发布

问题:

For future visitors: for EF6 you are probably better off using filters, for example via this project: https://github.com/jbogard/EntityFramework.Filters

In the application we're building we apply the "soft delete" pattern where every class has a 'Deleted' bool. In practice, every class simply inherits from this base class:

public abstract class Entity
{
    public virtual int Id { get; set; }

    public virtual bool Deleted { get; set; }
}

To give a brief example, suppose I have the classes GymMember and Workout:

public class GymMember: Entity
{
    public string Name { get; set; }

    public virtual ICollection<Workout> Workouts { get; set; }
}

public class Workout: Entity
{
    public virtual DateTime Date { get; set; }
}

When I fetch the list of gym members from the database, I can make sure that none of the 'deleted' gym members are fetched, like this:

var gymMembers = context.GymMembers.Where(g => !g.Deleted);

However, when I iterate through these gym members, their Workouts are loaded from the database without any regard for their Deleted flag. While I cannot blame Entity Framework for not picking up on this, I would like to configure or intercept lazy property loading somehow so that deleted navigational properties are never loaded.

I've been going through my options, but they seem scarce:

  • Going to Database First and use conditional mapping for every object for every one-to-many property.

This is simply not an option, since it would be too much manual work. (Our application is huge and getting huger every day). We also do not want to give up the advantages of using Code First (of which there are many)

  • Always eagerly loading navigation properties.

Again, not an option. This configuration is only available per entity. Always eagerly loading entities would also impose a serious performance penalty.

  • Applying the Expression Visitor pattern that automatically injects .Where(e => !e.Deleted) anywhere it finds an IQueryable<Entity>, as described here and here.

I actually tested this in a proof of concept application, and it worked wonderfully. This was a very interesting option, but alas, it fails to apply filtering to lazily loaded navigation properties. This is obvious, as those lazy properties would not appear in the expression/query and as such cannot be replaced. I wonder if Entity Framework would allow for an injection point somewhere in their DynamicProxy class that loads the lazy properties. I also fear for for other consequences, such as the possibility of breaking the Include mechanism in EF.

  • Writing a custom class that implements ICollection but filters the Deleted entities automatically.

This was actually my first approach. The idea would be to use a backing property for every collection property that internally uses a custom Collection class:

public class GymMember: Entity
{
    public string Name { get; set; }

    private ICollection<Workout> _workouts;
    public virtual ICollection<Workout> Workouts 
    { 
        get { return _workouts ?? (_workouts = new CustomCollection()); }
        set { _workouts = new CustomCollection(value); }
     }

}

While this approach is actually not bad, I still have some issues with it:

  • It still loads all the Workouts into memory and filters the Deleted ones when the property setter is hit. In my humble opinion, this is much too late.

  • There is a logical mismatch between executed queries and the data that is loaded.

Image a scenario where I want a list of the gym members that did a workout since last week:

var gymMembers = context.GymMembers.Where(g => g.Workouts.Any(w => w.Date >= DateTime.Now.AddDays(-7).Date));

This query might return a gym member that only has workouts that are deleted but also satisfy the predicate. Once they are loaded into memory, it appears as if this gym member has no workouts at all! You could say that the developer should be aware of the Deleted and always include it in his queries, but that's something I would really like to avoid. Maybe the ExpressionVisitor could offer the answer here again.

  • It's actually impossible to mark a navigation property as Deleted when using the CustomCollection.

Imagine this scenario:

var gymMember = context.GymMembers.First();
gymMember.Workouts.First().Deleted = true;
context.SaveChanges();`

You would expect that the appropriate Workout record is updated in the database, and you would be wrong! Since the gymMember is being inspected by the ChangeTracker for any changes, the property gymMember.Workouts will suddenly return 1 fewer workout. That's because CustomCollection automatically filters deleted instances, remember? So now Entity Framework thinks the workout needs to be deleted, and EF will try to set the FK to null, or actually delete the record. (depending on how your DB is configured). This is what we were trying to avoid with the soft delete pattern to begin with!!!

I stumbled upon an interesting blog post that overrides the default SaveChanges method of the DbContext so that any entries with an EntityState.Deleted are changed back to EntityState.Modified but this again feels 'hacky' and rather unsafe. However, I'm willing to try it out if it solves problems without any unintended side effects.


So here I am StackOverflow. I've researched my options quite extensively, if I may say so myself, and I'm at my wits end. So now I turn to you. How have you implemented soft deletes in your enterprise application?

To reiterate, these are the requirements I'm looking for:

  • Queries should automatically exclude the Deleted entities on the DB level
  • Deleting an entity and calling 'SaveChanges' should simply update the appropriate record and have no other side effects.
  • When navigational properties are loaded, whether lazy or eager, the Deleted ones should be automatically excluded.

I am looking forward to any and all suggestions, thank you in advance.

回答1:

After much research, I've finally found a way to achieve what I wanted. The gist of it is that I intercept materialized entities with an event handler on the object context, and then inject my custom collection class in every collection property that I can find (with reflection).

The most important part is intercepting the "DbCollectionEntry", the class responsible for loading related collection properties. By wiggling myself in between the entity and the DbCollectionEntry, I gain full control over what's loaded when and how. The only downside is that this DbCollectionEntry class has little to no public members, which requires me to use reflection to manipulate it.

Here is my custom collection class that implements ICollection and contains a reference to the appropriate DbCollectionEntry:

public class FilteredCollection <TEntity> : ICollection<TEntity> where TEntity : Entity
{
    private readonly DbCollectionEntry _dbCollectionEntry;
    private readonly Func<TEntity, Boolean> _compiledFilter;
    private readonly Expression<Func<TEntity, Boolean>> _filter;
    private ICollection<TEntity> _collection;
    private int? _cachedCount;

    public FilteredCollection(ICollection<TEntity> collection, DbCollectionEntry dbCollectionEntry)
    {
        _filter = entity => !entity.Deleted;
        _dbCollectionEntry = dbCollectionEntry;
        _compiledFilter = _filter.Compile();
        _collection = collection != null ? collection.Where(_compiledFilter).ToList() : null;
    }

    private ICollection<TEntity> Entities
    {
        get
        {
            if (_dbCollectionEntry.IsLoaded == false && _collection == null)
            {
                IQueryable<TEntity> query = _dbCollectionEntry.Query().Cast<TEntity>().Where(_filter);
                _dbCollectionEntry.CurrentValue = this;
                _collection = query.ToList();

                object internalCollectionEntry =
                    _dbCollectionEntry.GetType()
                        .GetField("_internalCollectionEntry", BindingFlags.NonPublic | BindingFlags.Instance)
                        .GetValue(_dbCollectionEntry);
                object relatedEnd =
                    internalCollectionEntry.GetType()
                        .BaseType.GetField("_relatedEnd", BindingFlags.NonPublic | BindingFlags.Instance)
                        .GetValue(internalCollectionEntry);
                relatedEnd.GetType()
                    .GetField("_isLoaded", BindingFlags.NonPublic | BindingFlags.Instance)
                    .SetValue(relatedEnd, true);
            }
            return _collection;
        }
    }

    #region ICollection<T> Members

    void ICollection<TEntity>.Add(TEntity item)
    {
        if(_compiledFilter(item))
            Entities.Add(item);
    }

    void ICollection<TEntity>.Clear()
    {
        Entities.Clear();
    }

    Boolean ICollection<TEntity>.Contains(TEntity item)
    {
        return Entities.Contains(item);
    }

    void ICollection<TEntity>.CopyTo(TEntity[] array, Int32 arrayIndex)
    {
        Entities.CopyTo(array, arrayIndex);
    }

    Int32 ICollection<TEntity>.Count
    {
        get
        {
            if (_dbCollectionEntry.IsLoaded)
                return _collection.Count;
            return _dbCollectionEntry.Query().Cast<TEntity>().Count(_filter);
        }
    }

    Boolean ICollection<TEntity>.IsReadOnly
    {
        get
        {
            return Entities.IsReadOnly;
        }
    }

    Boolean ICollection<TEntity>.Remove(TEntity item)
    {
        return Entities.Remove(item);
    }

    #endregion

    #region IEnumerable<T> Members

    IEnumerator<TEntity> IEnumerable<TEntity>.GetEnumerator()
    {
        return Entities.GetEnumerator();
    }

    #endregion

    #region IEnumerable Members

    IEnumerator IEnumerable.GetEnumerator()
    {
        return ( ( this as IEnumerable<TEntity> ).GetEnumerator() );
    }

    #endregion
}

If you skim through it, you'll find that the most important part is the "Entities" property, which will lazy load the actual values. In the constructor of the FilteredCollection I pass an optional ICollection for scenario's where the collection is already eagerly loaded.

Of course, we still need to configure Entity Framework so that our FilteredCollection is used everywhere where there are collection properties. This can be achieved by hooking into the ObjectMaterialized event of the underlying ObjectContext of Entity Framework:

(this as IObjectContextAdapter).ObjectContext.ObjectMaterialized +=
    delegate(Object sender, ObjectMaterializedEventArgs e)
    {
        if (e.Entity is Entity)
        {
            var entityType = e.Entity.GetType();
            IEnumerable<PropertyInfo> collectionProperties;
            if (!CollectionPropertiesPerType.TryGetValue(entityType, out collectionProperties))
            {
                CollectionPropertiesPerType[entityType] = (collectionProperties = entityType.GetProperties()
                    .Where(p => p.PropertyType.IsGenericType && typeof(ICollection<>) == p.PropertyType.GetGenericTypeDefinition()));
            }
            foreach (var collectionProperty in collectionProperties)
            {
                var collectionType = typeof(FilteredCollection<>).MakeGenericType(collectionProperty.PropertyType.GetGenericArguments());
                DbCollectionEntry dbCollectionEntry = Entry(e.Entity).Collection(collectionProperty.Name);
                dbCollectionEntry.CurrentValue = Activator.CreateInstance(collectionType, new[] { dbCollectionEntry.CurrentValue, dbCollectionEntry });
            }
        }
    };

It all looks rather complicated, but what it does essentially is scan the materialized type for collection properties and change the value to a filtered collection. It also passes the DbCollectionEntry to the filtered collection so it can work its magic.

This covers the whole 'loading entities' part. The only downside so far is that eagerly loaded collection properties will still include the deleted entities, but they are filtered out in the 'Add' method of the FilterCollection class. This is an acceptable downside, although I have yet to do some testing on how this affects the SaveChanges() method.

Of course, this still leaves one issue: there is no automatic filtering on queries. If you want to fetch the gym members who did a workout in the past week, you want to exclude the deleted workouts automatically.

This is achieved through an ExpressionVisitor that automatically applies a '.Where(e => !e.Deleted)' filter to every IQueryable it can find in a given expression.

Here is the code:

public class DeletedFilterInterceptor: ExpressionVisitor
{
    public Expression<Func<Entity, bool>> Filter { get; set; }

    public DeletedFilterInterceptor()
    {
        Filter = entity => !entity.Deleted;
    }

    protected override Expression VisitMember(MemberExpression ex)
    {
        return !ex.Type.IsGenericType ? base.VisitMember(ex) : CreateWhereExpression(Filter, ex) ?? base.VisitMember(ex);
    }

    private Expression CreateWhereExpression(Expression<Func<Entity, bool>> filter, Expression ex)
    {
        var type = ex.Type;//.GetGenericArguments().First();
        var test = CreateExpression(filter, type);
        if (test == null)
            return null;
        var listType = typeof(IQueryable<>).MakeGenericType(type);
        return Expression.Convert(Expression.Call(typeof(Enumerable), "Where", new Type[] { type }, (Expression)ex, test), listType);
    }

    private LambdaExpression CreateExpression(Expression<Func<Entity, bool>> condition, Type type)
    {
        var lambda = (LambdaExpression) condition;
        if (!typeof(Entity).IsAssignableFrom(type))
            return null;

        var newParams = new[] { Expression.Parameter(type, "entity") };
        var paramMap = lambda.Parameters.Select((original, i) => new { original, replacement = newParams[i] }).ToDictionary(p => p.original, p => p.replacement);
        var fixedBody = ParameterRebinder.ReplaceParameters(paramMap, lambda.Body);
        lambda = Expression.Lambda(fixedBody, newParams);

        return lambda;
    }
}

public class ParameterRebinder : ExpressionVisitor
{
    private readonly Dictionary<ParameterExpression, ParameterExpression> _map;

    public ParameterRebinder(Dictionary<ParameterExpression, ParameterExpression> map)
    {
        _map = map ?? new Dictionary<ParameterExpression, ParameterExpression>();
    }

    public static Expression ReplaceParameters(Dictionary<ParameterExpression, ParameterExpression> map, Expression exp)
    {
        return new ParameterRebinder(map).Visit(exp);
    }

    protected override Expression VisitParameter(ParameterExpression node)
    {
        ParameterExpression replacement;

        if (_map.TryGetValue(node, out replacement))
            node = replacement;

        return base.VisitParameter(node);
    }
}

I am running a bit short on time, so I'll get back to this post later with more details, but the gist of it is written down and for those of you eager to try everything out; I've posted the full test application here: https://github.com/amoerie/TestingGround

However, there might still be some errors, as this is very much a work in progress. The conceptual idea is sound though, and I expect it to fully function soon once I've refactored everything neatly and find the time to write some tests for this.



回答2:

One possibly way might be using specifications with a base specification that checks the soft deleted flag for all queries together with an include strategy.

I’ll illustrate an adjusted version of the specification pattern that I've used in a project (which had its origin in this blog post)

public abstract class SpecificationBase<T> : ISpecification<T>
    where T : Entity
{
    private readonly IPredicateBuilderFactory _builderFactory;
    private IPredicateBuilder<T> _predicateBuilder;

    protected SpecificationBase(IPredicateBuilderFactory builderFactory)
    {
        _builderFactory = builderFactory;            
    }

    public IPredicateBuilder<T> PredicateBuilder
    {
        get
        {
            return _predicateBuilder ?? (_predicateBuilder = BuildPredicate());
        }
    }

    protected abstract void AddSatisfactionCriterion(IPredicateBuilder<T> predicateBuilder);        

    private IPredicateBuilder<T> BuildPredicate()
    {
        var predicateBuilder = _builderFactory.Make<T>();

        predicateBuilder.Check(candidate => !candidate.IsDeleted)

        AddSatisfactionCriterion(predicateBuilder);

        return predicateBuilder;
    }
}

The IPredicateBuilder is a wrapper to the predicate builder included in the LINQKit.dll.

The specification base class is responsible to create the predicate builder. Once created the criteria that should be applied to all query can be added. The predicate builder can then be passed to the inherited specifications for adding further criteria. For example:

public class IdSpecification<T> : SpecificationBase<T> 
    where T : Entity
{
    private readonly int _id;

    public IdSpecification(int id, IPredicateBuilderFactory builderFactory)
        : base(builderFactory)
    {
        _id = id;            
    }

    protected override void AddSatisfactionCriterion(IPredicateBuilder<T> predicateBuilder)
    {
        predicateBuilder.And(entity => entity.Id == _id);
    }
}

The IdSpecification's full predicate would then be:

entity => !entity.IsDeleted && entity.Id == _id

The specification can then be passed to the repository which uses the PredicateBuilder property to build up the where clause:

    public IQueryable<T> FindAll(ISpecification<T> spec)
    {
        return context.AsExpandable().Where(spec.PredicateBuilder.Complete()).AsQueryable();
    }

AsExpandable() is part of the LINQKit.dll.

In regards to including/lazy loading properties one can extend the specification with a further property about includes. The specification base can add the base includes and then child specifications add their includes. The repository can then before fetching from the db apply the includes from the specification.

    public IQueryable<T> Apply<T>(IDbSet<T> context, ISpecification<T> specification) 
    {
        if (specification.IncludePaths == null)
            return context;

        return specification.IncludePaths.Aggregate<string, IQueryable<T>>(context, (current, path) => current.Include(path));
    } 

Let me know if something is unclear. I tried not to make this a monster post so some details might be left out.

Edit: I realized that I didn't fully answer your question(s); navigation properties. What if you make the navigation property internal (using this post to configure it and creating non-mapped public properties that are IQueryable. The non mapped properties can have a custom attribute and the repository adds the base specification's predicate to the where, without eagerly loading it. When someone do apply an eager operation the filter will apply. Something like:

    public T Find(int id)
    {
        var entity = Context.SingleOrDefault(x => x.Id == id);
        if (entity != null)
        {
            foreach(var property in entity.GetType()
                .GetProperties()
                .Where(info => info.CustomAttributes.OfType<FilteredNavigationProperty>().Any()))
            {
                var collection = (property.GetValue(property) as IQueryable<IEntity>);
                collection = collection.Where(spec.PredicateBuilder.Complete());
            }
        }

        return entity;
    }

I haven't tested the above code but it could work with some tweaking :)

Edit 2: Deletes.

If you're using a general/generic repository you could simply add some further functionality to the delete method:

    public void Delete(T entity)
    {
        var castedEntity = entity as Entity;
        if (castedEntity != null)
        {
            castedEntity.IsDeleted = true;
        }
        else
        {
            _context.Remove(entity);
        }            
    }


回答3:

Have you considered using views in your database to load your problem entities with the deleted items excluded?

It does mean you will need to use stored procedures to map INSERT/UPDATE/DELETE functionality, but it would definitely solve your problem if Workout maps to a View with the deleted rows omitted. Also - this may not work the same in a code first approach...