Currently we're using hand-rolled SQL in Data-Access objects and a lot of stored-procedures and triggers which amount to around 20k lines of code. We're finding that simple changes are causing a couple of days' work to fix, and its causing deadlines to slip.
Changes include modifications to tables to cope with additional data, general refactoring of the schema based on QA/user reports, etc. Its a very active system that's being built to replace something old and slow.
We looked at the PHP ORM solutions available to try and limit the effects of these changes, but they were just too slow to cope with our schema; "simple" sql results were taking orders of magnitude longer to return than our custom queries and caused page views of ~.5s to take over 20s.
What best-practices/strategies could I look into to cope with schema evolution with relational databases, in a general context?
Edit: forgot to mention about the triggers; we have a lot of data which relies on cascading changes, eg. a price change here for this user updates a price there for that user, etc.
You might want to checkout this book on Refactoring Databases: Evolutionary Database Design.
I suggest using a continuous (or at least nightly) build strategy.
Rebuild the database on every checkin, or at least once per day.
Also once per day, run unit tests to exercise each bit of code, be it in a stored procedur, a trigger or a data access layer.
There is a great cost to writing stored procs, but this will identify breaks immediately.
Once you know where the break is, you can fix it.
I'd be interested to hear other people's experiences with this strategy applied to database changes.
We use Enterprise Architect for our DB definitions. We include stored procedures, triggers, and all table definitions defined in UML. The three brilliant features of the program are:
- Import UML Diagrams from an ODBC Connection.
- Generate SQL Scripts (DDL) for the entire DB at once
- Generate Custom Templated Documentation of your DB.
I've never been more impressed with any other tool in my 10+ years as a developer. EA supports Oracle, MySQL, SQL Server (multiple versions), PostGreSQL, Interbase, DB2, and Access in one fell swoop. Any time I've had problems, their forums have answered my problems promptly. Highly recommended!!
When DB changes come in, we make then in EA, generate the SQL, and check it into our version control (svn). We use Hudson for building, and it auto-builds the database from scripts when it sees you've modified the checked-in sql.
My advice would be to get rid of stored procedures and instead use inline SQL, maybe maintained in text/xml files. I find SProcs are far more annoying and time consuming to maintain. Once the query plan is generated (first time the query is executed) you'll notice negligible difference in performance. Plus you'll be able to version control your entire DB scripts...
Here are my suggestions:
- Try to get rid of the least used functionality. Question the features that are not used all the time. Each feature in an application has several levels of costs associated with it (maintaining, support, regression testing, code complexity, etc.).
- Stay away from Stored procedures, unless there is absolutely no way to do it efficiently and in a scalable manner in the code.
- Introduce an ORM solution gradually (using refactoring to move from JDBC to ORM) to reduce the amount of code and code complexity in CRUD operations
- Build functional, integration and unit tests as and when you fix a bug and incorporate those tests in to the Continuous integration system. Automate your regression testing as much as possible to identify problems as soon as it is introduced by a check-in.
- In general, whenever you fix a bug, use that opportunity to refactor to decouple the implementations/code modules.
If you have have questions about Database migration problems, this might help: http://shashivelur.com/blog/2008/07/hibernate-db-migration/