efcore: Consider more efficient pattern for seed data that will never change

I am a bit perplexed.

My first migration contains seeding data for the table Countries - 249 rows. This initialization data exists in InitMigration.Up step (Ok), in ModelSnapshot.BuildModel (Ok) and in InitMigration.BuildTargetModel (Ok?).

I added the second migration, and I see that countries data is again present in BuildTargetModel of this migration. Does this mean that model is recreated from scratch for every migration? What if my initial migration contained several thousands of rows for seeded data? Would source code of each subsequent migration have size of megabytes?

Here’s the code for configuring Country:

sealed class CountryConfiguration : IEntityTypeConfiguration<Country>
{
    public void Configure(EntityTypeBuilder<Country> builder)
    {
        builder.ToTable("Country");

        builder.HasOne<WorldRegion>()
            .WithMany()
            .HasForeignKey(m => m.RegionId)
            .IsRequired(false)
            .OnDelete(DeleteBehavior.SetNull);

        builder.HasAlternateKey(m => m.Name);
        builder.HasAlternateKey(m => m.NumericCode);
        builder.HasAlternateKey(m => m.Alpha2Code);
        builder.HasAlternateKey(m => m.Alpha3Code);

        object[] countriesData = Geography.List.Countries().ToArray();
        builder.HasData(countriesData);
    }
}

And here is the code for getting countries seeding data:

namespace Geography

open System
open FSharp.Data
open System.Collections.Generic

module List =

    type Region = {Id: int; Name:string; NumericCode: string} 

    type Country = { 
        Id:int;
        RegionId: Nullable<int>;
        Name: string;
        NumericCode: string;
        Alpha2Code: string;
        Alpha3Code: string;
    }

    [<Literal>] let private ``countries csv file`` = "./countries.csv"

    type private CountriesProvider = CsvProvider<
        ``countries csv file``, 
        Schema = "name,alpha-2,alpha-3,country-code (string),iso_3166-2,region,sub-region,intermediate-region,region-code (string),sub-region-code,intermediate-region-code"
        >

    let private countriesFile() = CountriesProvider.Load(``countries csv file``)

    let Regions(): IEnumerable<Region> = 
        countriesFile().Rows 
        |> Seq.distinctBy (fun row -> row.``Region-code``)
        |> Seq.mapi (fun index row -> {Id = index+1; Name = row.Region; NumericCode = row.``Region-code``})
        |> Seq.filter (fun r -> not (String.IsNullOrWhiteSpace r.Name))

    let RegionsByCode() = 
        Regions()
        |> Seq.map (fun r -> (r.NumericCode, r))
        |> dict

    let Countries(): IEnumerable<Country> =
        let regionsByCode = RegionsByCode()

        countriesFile().Rows
        |> Seq.distinctBy (fun row -> row.``Country-code``)
        |> Seq.mapi (
            fun index row -> 
                let regionId = 
                    let exists, r = regionsByCode.TryGetValue row.``Region-code``
                    if exists 
                    then Nullable<int> r.Id 
                    else Unchecked.defaultof<Nullable<int>>

                {
                    Id = index+1;
                    RegionId = regionId;
                    Name = row.Name;
                    NumericCode = row.``Country-code``;
                    Alpha2Code = row.``Alpha-2``;
                    Alpha3Code = row.``Alpha-3``;
                })
    
    let CountriesByCode() = 
        Countries()
        |> Seq.map (fun c -> (c.NumericCode, c))
        |> dict

Country is a reference table, so no change mechanism besides migrations is planned. Am I abusing data seeding or is it somehow possible to avoid codebase bloating?

About this issue

Original URL
State: open
Created 5 years ago
Comments: 22 (11 by maintainers)

Most upvoted comments

@voroninp #2174

ajcvickers on Apr 18, 2019

@voroninp We discussed this again in triage and we may consider doing something to make this more efficient when the seed data will never change. However, for now as @bricelam said above the appropriate workaround is:

If your seed data never changes, you may want to consider just adding it to the Up method of a migration and not using EntityTypeBuilder.HasData().

ajcvickers on Mar 22, 2019

You’ll still need a project reference. The compiled assembly just won’t have an assembly reference.

bricelam on Mar 15, 2019

ModelSnapshot is a snapshot of the model from the last time you ran Add-Migration. It’s used to get the diff the next time you call Add-Migration.

Migration.BuildTargetModel is a copy of the model for that migration. It’s used when the steps in the Up method aren’t enough. For example, we use it to rebuild indexes on SQL Server when a column type is narrowed from int to smallint. It’s rarely used, but enables scenarios that would otherwise require the user to manually add additional steps to the Up method.

bricelam on Mar 7, 2019