rails: Migrator fails for systems with multiple, shared databases

Hello!

I have found that the root of the problem I am describing is the method how the ActiveRecord migrator detects the migrations that need to be run.

The problem surfaces only in more complex systems, and can be more easily imagined to arise with rails, but it is independent from rails and might arise with gems as well.

Suppose the following scenario:

Two rails applications, sharing a database, while having their on database as well. (3 databases, 2 applicaitons)

In order to reduce code duplication and maintenance hell, we create a rails engine that will contain the models and migrations for the shared database.

     initializer :shared_db_append_migrations do |app|
        unless app.root.to_s.match root.to_s
          require 'shared-db-common/migration_base'

          config.paths["db/migrate"].expanded.each do |expanded_path|
            app.config.paths["db/migrate"] << expanded_path
          end
        end
      end

Both applications include the common engine and use the models from there.

The models in the engine have a superclass that tells them to use a different connection than the default one:

class Base < ActiveRecord::Base
  self.abstract_class = true

  begin
    establish_connection "shared_db_#{Rails.env}"
  rescue ActiveRecord::AdapterNotSpecified
    establish_connection "#{Rails.env}"
  end
end

The migrations in the engines likewise:

class MigrationBase < ActiveRecord::Migration
  def connection
    begin
      ActiveRecord::Base.establish_connection("shared_db_#{Rails.env}").connection
    rescue ActiveRecord::AdapterNotSpecified
      super
    end
  end
end

The engine tells the main application to include its migrations in their search path.

When running the migrations for the first application, everything runs fine, the shared database migrations are ran as well. GOOD!

When running the migrations for the seconds application, they fail. They try re-running the shared migrations.

I have found that the ActiveRecord migrator tries connecting to the main database, reading out the schema_migrations table, and trying to build a list of all migrations BEFORE connecting to the actual databases that the migrations will run on, and then going through the list, running them one by one. ActiveRecord will save the migration state back to the main database, instead of the database they have run on.

This presents a problem because it actually ties the database state to the applicaiton, and not to the database itself. When the second application runs, it will see the migrations as not to have run, and tries rerunning them.

The correct way to implement it would be to create a list of all database migrations, order them by timestamp (btw, why are we using only second-based timestamps, and not something finer like microsecs or nanosecs to avoid collisions in the schema_migrations table? Or a timestamp and a hash of the file itself to make sure they have not been changed since they ran?), and then go through the FULL list of migrations, and look them up in the CORRECT database’s schema_versions table, and run them if not present.

This is a major change in the execution way, and even though it is less effective (we need to do a lot of lookups to the schema_versions table), the time for that is negligible comparing to the time that the actual migrations run, and would fix the issue I am describing. The database state would not end up being saved to another database either.

In a nutshell.

Rails App A with DB A Rails App B with DB B Rails Engine S with DB S

both rails apps include the engine S, and have a different connection string named shared_db_development, shared_db_production, etc for those models.

When App A runs the migrations:

  • schema versions is looked up from DB A
  • migrations are run from Rails APP A, Rails Engine S
  • new migration state is saved back to DB A

When App B runs the migrations:

  • schema versions is looked up from DB B
  • migrations are run from Rails APP B
  • migrations appear to not have run from Engine S (becasue not present in schema versions in DB B), and are attempted to run again
  • Exception occurs.

Solution:

  • save and load schema_versions state from the database connection specified by the migration itself, not the migrator, and AcitveRecord::Base.

Please let me know what you think, Akos Vandra

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I can think of a couple of ways to handle the hard-coded connection references in Migrator. The least intrusive would be to define a #connection method which returns AR::Base.connection and can be redefined in subclasses.

EDIT: Migrator already has a #connection method! There are some places where it’s not enough, though, such as in #migrate:

ActiveRecord::Base.connection_pool.with_connection do |conn|
  time = Benchmark.measure do
    exec_migration(conn, direction)
  end
end

(EDIT AGAIN: That’s in Migration, not Migrator. Derp. Migrator still needs #connection.) There has to be a reason it’s using a fresh connection, right?

Then again, I think it should be possible to replace AR::Base.connection_pool with connection.pool

@jeremy That’s a great set of code examples. This is a shameless plug, but we wrote a simple gem that kind of wraps up a few of these features. Only tested for Rails 4 now. Hopefully sharing can expose alternate ideas and implementations. https://github.com/customink/secondbase

I’ve made a few tweaks in my fork. It’s nothing fancy yet; I’ve just added an overridable #connection method to Migrator and tweaked Migration so it doesn’t have a hard-coded dependency on AR::Base’s connection pool. I think the next step would be figuring out how to handle the Rake tasks.