I remember hearing Joel Spolsky mention in podcast 014 that he'd barely ever used a foreign key (if I remember correctly). However, to me they seem pretty vital to avoid duplication and subsequent data integrity problems throughout your database.
Do people have some solid reasons as to why (to avoid a discussion in lines with Stack Overflow principles)?
Foreign keys complicate automated testing
Suppose you're using foreign keys. You're writing an automated test that says "when I update a financial account, it should save a record of the transaction." In this test, you're only concerned with two tables:
accounts
andtransactions
.However,
accounts
has a foreign key tocontracts
, andcontracts
has a fk toclients
, andclients
has a fk tocities
, andcities
has a fk tostates
.Now the database will not allow you to run your test without setting up data in four tables that aren't related to your test.
There are at least two possible perspectives on this:
It may also be possible to temporarily turn off foreign key checks while running tests. MySQL, at least, supports this.
Update: I always use foreign keys now. My answer to the objection "they complicated testing" is "write your unit tests so they don't need the database at all. Any tests that use the database should use it properly, and that includes foreign keys. If the setup is painful, find a less painful way to do the setup."
"Before adding a record, check that a corresponding record exists in another table" is business logic.
Here are some reasons you don't want this in the database:
If the business rules change, you have to change the database. The database will need to recreate the index in a lot of cases and this is slow on large tables. (Changing rules include: allow guests to post messages or allow users to delete their account despite having posted comments, etc).
Changing the database is not as easy as deploying a software fix by pushing the changes to the production repository. We want to avoid changing the database structure as much as possible. The more business logic there is in the database the more you increase the chances of needing to change the databae (and triggering re-indexing).
TDD. In unit tests you can substitute the database for mocks and test the functionality. If you have any business logic in your database, you are not doing complete tests and would need to either test with the database or replicate the business logic in code for testing purposes, duplicating the logic and increasing the likelyhood of the logic not working in the same way.
Reusing your logic with different data sources. If there is no logic in the database, my application can create objects from records from the database, create them from a web service, a json file or any other source. I just need to swap out the data mapper implementation and can use all my business logic with any source. If there is logic in the database, this isn't possible and you have to implement the logic at the data mapper layer or in the business logic. Either way, you need those checks in your code. If there's no logic in the database I can deploy the application in different locations using different database or flat-file implementations.
I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.
In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.
From my experience its always better to avoid using FKs in Database Critical Applications. I would not disagree with guys here who say FKs is a good practice but its not practical where the database is huge and has huge CRUD operations/sec. I can share without naming ... one of the biggest investment bank of doesn't have a single FK in databases. These constrains are handled by programmers while creating applications involving DB. The basic reason is when ever a new CRUD is done it has to effect multiple tables and verify for each inserts/updates, though this won't be a big issue for queries affecting single rows but it does create a huge latency when you deal with batch processing which any big bank has to do as daily tasks.
Its better to avoid FKs but its risk has to be handled by programmers.
I know only Oracle databases, no other ones, and I can tell that Foreign Keys are essential for maintaining data integrity. Prior to inserting data, a data structure needs to be made, and be made correctlty. When that is done - and thus all primary AND foreign keys are created - the work is done !
Meaning : orphaned rows ? No. Never seen that in my life. Unless a bad programmer forgot the foreign key, or if he implemented that on another level. Both are - in context of Oracle - huge mistakes, which will lead to data duplication, orphan data, and thus : data corruption. I can't imagine a database without FK enforced. It looks like chaos to me. It's a bit like the Unix permission system : imagine that everybody is root. Think of the chaos.
Foreign Keys are essential, just like Primary Keys. It's like saying : what if we removing Primary Keys ? Well, total chaos is going to happen. That's what. You may not move the primary or foreign key responsibility to the programming level, it must be at the data level.
Drawbacks ? Yes, absolutely ! Because on insert, a lot more checks are going to be happening. But, if data integrity is more important than performance, it's a no-brainer. The problem with performance on Oracle is more related to indexes, which come with PK and FK's.
I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.
There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.
If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?