What are the best methods for tracking and/or automating DB schema changes? Our team uses Subversion for version control and we've been able to automate some of our tasks this way (pushing builds up to a staging server, deploying tested code to a production server) but we're still doing database updates manually. I would like to find or create a solution that allows us to work efficiently across servers with different environments while continuing to use Subversion as a backend through which code and DB updates are pushed around to various servers.
Many popular software packages include auto-update scripts which detect DB version and apply the necessary changes. Is this the best way to do this even on a larger scale (across multiple projects and sometimes multiple environments and languages)? If so, is there any existing code out there that simplifies the process or is it best just to roll our own solution? Has anyone implemented something similar before and integrated it into Subversion post-commit hooks, or is this a bad idea?
While a solution that supports multiple platforms would be preferable, we definitely need to support the Linux/Apache/MySQL/PHP stack as the majority of our work is on that platform.
Scott Ambler produces a great series of articles (and co-authored a book) on database refactoring, with the idea that you should essentially apply TDD principles and practices to maintaining your schema. You set up a series of structure and seed data unit tests for the database. Then, before you change anything, you modify/write tests to reflect that change.
We have been doing this for a while now and it seems to work. We wrote code to generate basic column name and datatype checks in a unit testing suite. We can rerun those tests anytime to verify that the database in the SVN checkout matches the live db the application is actually running.
As it turns out, developers also sometimes tweak their sandbox database and neglect to update the schema file in SVN. The code then depends on a db change that hasn't been checked in. That sort of bug can be maddeningly hard to pin down, but the test suite will pick it up right away. This is particularly nice if you have it built into a larger Continuous Integration plan.
Dump your schema into a file and add it to source control. Then a simple diff will show you what changed.
We use something similar to bcwoord to keep our database schemata synchronized across 5 different installations (production, staging and a few development installations), and backed up in version control, and it works pretty well. I'll elaborate a bit:
To synchronize the database structure, we have a single script, update.php, and a number of files numbered 1.sql, 2.sql, 3.sql, etc. The script uses one extra table to store the current version number of the database. The N.sql files are crafted by hand, to go from version (N-1) to version N of the database.
They can be used to add tables, add columns, migrate data from an old to a new column format then drop the column, insert "master" data rows such as user types, etc. Basically, it can do anything, and with proper data migration scripts you'll never lose data.
The update script works like this:
Everything goes into source control, and every installation has a script to update to the latest version with a single script execution (calling update.php with the proper database password etc.). We SVN update staging and production environments via a script that automatically calls the database update script, so a code update comes with the necessary database updates.
We can also use the same script to recreate the entire database from scratch; we just drop and recreate the database, then run the script which will completely repopulate the database. We can also use the script to populate an empty database for automated testing.
It took only a few hours to set up this system, it's conceptually simple and everyone gets the version numbering scheme, and it has been invaluable in having the ability to move forward and evolving the database design, without having to communicate or manually execute the modifications on all databases.
Beware when pasting queries from phpMyAdmin though! Those generated queries usually include the database name, which you definitely don't want since it will break your scripts! Something like CREATE TABLE
mydb
.newtable
(...) will fail if the database on the system is not called mydb. We created a pre-comment SVN hook that will disallow .sql files containing themydb
string, which is a sure sign that someone copy/pasted from phpMyAdmin without proper checking.The issue here is really making it easy for developers to script their own local changes into source control to share with the team. I've faced this problem for many years, and was inspired by the functionality of Visual Studio for Database professionals. If you want an open-source tool with the same features, try this: http://dbsourcetools.codeplex.com/ Have fun, - Nathan.
There is a command-line mysql-diff tool that compares database schemas, where schema can be a live database or SQL script on disk. It is good for the most schema migration tasks.
My team scripts out all database changes, and commits those scripts to SVN, along with each release of the application. This allows for incremental changes of the database, without losing any data.
To go from one release to the next, you just need to run the set of change scripts, and your database is up-to-date, and you've still got all your data. It may not be the easiest method, but it definitely is effective.