Entity Framework 6 inserting duplicate values

2019-06-18 09:42发布

问题:

I have following two entities:

public class Artist
{
    [Key]
    public string ArtistId { get; set; }
    public string Name { get; set; }

    public virtual ICollection<Genre> Genres { get; set; }
}

public class Genre
{
    [Key]
    public int GenreId { get; set; }
    public string Name { get; set; }

    public virtual ICollection<Artist> Artist { get; set; }
}

In my program I create some artists and want to save them:

using (var context = new ArtistContext())
{
    var artists = _fullArtists.Select(x => x.Artist);

    foreach (var artist in artists)
    {
        context.Artists.AddOrUpdate(artist);
    }

    context.SaveChanges();
}

Entity Framework correctly created the three tables:

Artist (ArtistId, Name)
Genre (GenreId, Name)
ArtistGenre (ArtistId, GenreId)

But unfortunately, when my artist sample data look like this:

var a1 = new Artist { Name = "a1" };
a1.Genres.Add(new Genre { Name="rock"});
var a2 = new Artist { Name = "a2" };
a2.Genres.Add(new Genre { Name="rock"});

It will create 2 records in table Genre:

IdName
rock   
rock   

instead of creating it once and then reuse it.

  • Do you know if this is a configuration problem or how to tell EF to not insert duplicates and reuse existing instead?

Thanks in advance


Edit: Unfortunately, the solution of Sergey Berezovskiy didn't work (or probably i did something wrong :D)

What I have now is the following:

using (var workUnit = WorkUnitFactory.CreateWorkUnit())
{
    var snapShot = new Snapshot { Date = DateTime.Now.Date };

     //ICollection<Genre> genres = _fullArtists.Select(x => x.ToArtist(snapShot)).SelectMany(x => x.Genres).Distinct(new GenreComparer()).ToList();
     //workUnit.Repository<IGenreRepository>().InsertEntities(genres);
     //workUnit.Commit();

     var artists = _fullArtists.Select(x => x.ToArtist(snapShot)).ToList();

     workUnit.Repository<IArtistRepository>().InsertEntities(artists);
     workUnit.Commit();
}

ArtistExtensions.cs

    public static class ArtistExtensions
    {
        public static Artist ToArtist(this FullArtistWrapper value, Snapshot snapShot)
        {
            if (value == null) { throw new ArgumentNullException(); }

            var artist = new Artist
            {
                ArtistId = value.Id,
                Name = value.Name
            };

            var genres = value.Genres.Select(x => x.ToGenre()).ToList();
            artist.Genres.AddRange(genres);

            return artist;
        }
    }

GenreExtensions.cs

public static class GenreExtensions
    {
        public static Genre ToGenre(this string value)
        {
            using (var workUnit = WorkUnitFactory.CreateWorkUnit())
            {
                return workUnit.Repository<IGenreRepository>().GetGenres().FirstOrDefault(x => x.Name == value) ??
                            new Genre {Name = value};
            }
        }
    }

Unfortunately, EF still inserts duplicate Genres in the database.

InsertEntities(...) is this:

public void InsertEntities<TPersistentEntity>(ICollection<TPersistentEntity> persistentEntitiesToBeInserted) where TPersistentEntity : PersistenceEntity
    {
        persistentEntitiesToBeInserted.ForEach(this.Add);
    }

public void Add(PersistenceEntity entity)
    {
        var dbSet = this.Context.Set(entity.GetType());
        dbSet.Add(entity);
    }
  • Did I misunderstood the answer of Sergey or what could be another reason that EF still inserts duplicates?

Thanks again

回答1:

From EF point of view two entities are same if they are pointing to same row in database. I.e. two entities should have same non-zero keys.

If you want to have only one Genre entity with name "rock", then you should add exactly same genre entity to second artist genres collection or you can have two entities, but they should have same non-zero ids. I suppose you have some Add extension method which creates new genre and adds it to artist's genres:

public static void Add(this ICollection<Genre> genres, string name)
{
    genres.Add(new Genre { Name = name });
}

This will create independent instances of genres each time you call this method. Thus ids of created entities will be equal to zero, EF will treat them as different entities. E.g.

 a1.Genres.Add(new Genre { Name = "rock" });
 a1.Genres.Add(new Genre { Name = "rock" });

During saving changes EF will find two objects in genres collection. EF will check entities ids and generate appropriate SQL queries. If id is zero, it will generate INSERT query. For non-zero id EF will generate UPDATE query. In this case you will have two inserts (a little simplified - see comment below). How to fix that? You can use exactly same genre entity for both artists:

var rock = new Genre { Name = "rock" };
var a1 = new Artist { Name = "a1" };
a1.Genres.Add(rock);
var a2 = new Artist { Name = "a2" };
a2.Genres.Add(rock);

If you don't want to insert new 'rock' row to database, then you can use existing one instead of creating new:

var rock = db.Genres.FirstOrDefault(g => g.Name == "rock") ?? new Genre { Name = "rock" };


回答2:

In the classes you linked Artist to Genre and then in the code you added 2 Genre using Name field.

If you had done, you would have remained in your artist and added 2

a1.Add("rock");
a1.Add("rock");


回答3:

I think there are two possible reason that doesnt work:

  1. You are commiting just once and at the end of the code block. Thus, EF adds all genres as a new one.

  2. Your Base class PersistenceEntity may not include Id property that is Key. And your Add method accepts PersistenceEntity main class. This can effect all object as a new one.