Best way to incrementally seed data in Entity Fram

2019-01-10 14:47发布

问题:

I have been using Entity Framework 4.3 on an existing database and I have a couple of scenarios that I am trying to cater for.

Firstly, if I delete my database I would like to EF to recreate if from scratch - I have successfully used a CreateDatabaseIfNotExists database initialiser for this.

Secondly, if I update my model and the database already exists I would like the database to be updated automatically - I have successfully used Entity Framework 4.3 Migrations for this.

So here's my question. Say I add a new table to my model which requires some reference data, what it the best way to ensure that this data gets created both when the database intialiser runs and also when the migration runs. My desire is that the data gets created when I'm creating the db from scratch and also when the database gets updated as the result of a migration running.

In some EF migrations examples I have seen people use the SQL() function in the UP method of the migration to create seed data but if possible I would rather use the context to create the seed data (as you see in most database initialiser examples) as it seems strange to me that you would use pure sql when the whole idea of EF is abstracting that away. I have tried to use the context in the UP method but for some reason it didn't think that a table that was created in the migration existed when I tried to add the seed data directly below the call to create the table.

Any wisdom greatly appreciated.

回答1:

If you want to use entities to seed data you should use Seed method in your migrations configuration. If you generate fresh project Enable-Migrations you will get this configuration class:

internal sealed class Configuration : DbMigrationsConfiguration<YourContext>
{
    public Configuration()
    {
        AutomaticMigrationsEnabled = false;
    }

    protected override void Seed(CFMigrationsWithNoMagic.BlogContext context)
    {
        //  This method will be called after migrating to the latest version.

        //  You can use the DbSet<T>.AddOrUpdate() helper extension method 
        //  to avoid creating duplicate seed data. E.g.
        //
        //    context.People.AddOrUpdate(
        //      p => p.FullName,
        //      new Person { FullName = "Andrew Peters" },
        //      new Person { FullName = "Brice Lambson" },
        //      new Person { FullName = "Rowan Miller" }
        //    );
        //
    }
}

The way how migrations seed data are not very efficient because it is supposed to be used for some very basic seeding. Every update to new version will go through whole set and try to update existing data or insert new data. If you don't use AddOrUpdate extension method you must manually ensure that data are seeded to database only if they are not present yet.

If you want efficient way for seeding because you must seed o lot of data you will get better result with common:

public partial class SomeMigration : DbMigration
{
    public override void Up()
    {
        ...
        Sql("UPDATE ...");
        Sql("INSERT ...");
    }

    public override void Down()
    {
        ...
    }
}


回答2:

I wouldn't recommend using Sql() calls in your Up() method because (IMO) this is really intended for actual migration code for which there is no built-in function, rather than seed code.

I like to think of seed data as something that could change in the future (even if my schema does not), so I simply write "defensive" checks around all of my inserts in the seed function to make sure that the operation did not fire previously.

Consider a scenario where you have a "Types" table that starts out with 3 entries, but then you later add a 4th. You shouldn't need a "migration" to address this.

Using Seed() also gives you a full context to work with, which is a lot nicer than using the plain sql strings in the Sql() method that Ladislav demonstrated.

Also, keep in mind that the benefit of using built-in EF methods for both the migration code and seed code is that your database operations remain platform-neutral. This means your schema changes and queries are be able to run on Oracle, Postgre, etc. If you write actual raw SQL then you are potentially locking yourself in unnecessarily.

You might be less concerned about this since 90% of people using EF will only ever hit SQL Server, but I'm just throwing it out there to give you a different perspective on the solution.