Several app instances trying to migrate the same d

2020-07-20 05:14发布

问题:

I just started using flyway in a pretty standard spring 2.5 web application with a mysql databse. It has been in production for years and has a lot of data.

I hooked it up to auto-migrate on startup with

<bean id="flyway" class="com.googlecode.flyway.core.Flyway" init-method="migrate">
    <property name="dataSource" ref="dataSource"/>
</bean>

and the made the Spring's persistence unit manager (using jpa) depend on it so the database would be migrated before JPA initializes.

This works fine.

In our production environment we have several nodes which we like to update concurrently. Thus each instance will try to execute the same migrations if they are long-running. Flyway's locking functionality should prevent that.

I like to test these things before doing anything in production so I did; and what happened when two app instances were starting at the same time with a long running migration (6 minutes) is that after a while the second instance failed with:

 Caused by: com.googlecode.flyway.core.exception.FlywayException: Unable to lock metadata table 'schema_version' in schema 'dbschema'
    at com.googlecode.flyway.core.metadatatable.MetaDataTable.lock(MetaDataTable.java:148)
    at com.googlecode.flyway.core.migration.DbMigrator$1.doInTransaction(DbMigrator.java:116)
    at com.googlecode.flyway.core.migration.DbMigrator$1.doInTransaction(DbMigrator.java:114)
    at com.googlecode.flyway.core.util.jdbc.TransactionTemplate.execute(TransactionTemplate.java:54)
    at com.googlecode.flyway.core.migration.DbMigrator.migrate(DbMigrator.java:113)
    ...

Deeper down in the stack trace is the SQLException:

Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
    ...

Essentially it seems to timeout.

We're using mysql and afaik it's trying to lock the metadata table with

select * from dbschema.schema_version for update

I checked the db it was trying to migrate and its lock wait timeout:

mysql> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 120   |
+--------------------------+-------+

Which is the obvious culprit. However, I'm not sure I want to increase the amount since I might want it to timeout if such an occurrence were to happen for some other reason.

I might try to set it to a higher value for the duration of the migration. Except that it seems to be a read only variable:

SET innodb_lock_wait_timeout = 240;
ERROR 1238 (HY000): Variable 'innodb_lock_wait_timeout' is a read only variable

I could of course do deployment semi-manually if I know there are going to be long running migrations, by e.g. just updating one instance and wait til it passes migration and then update the other instances.

Does anyone have any other suggestions?

回答1:

There are 3 possibilities:

  • Move to MySQL 5.5 where you can set innodb_lock_wait_timeout at runtime
  • Increase the global innodb_lock_wait_timeout in the server config
  • Split the migrations so they don't exceed the timeout


标签: flyway