I was just adding some foreign keys to my database and usually all my foreign keys are non identifying as I have never bothered making them identifying as I never knew the difference and my databases always seemed to work well enough for me.
Now I have decided that I am going to make this database properly and was making the foreign keys identifying and non-identifying. I was curious is there any performance difference with them when doing Joins?
Thanks
Yes, there could be some performance benefit to joins by making a foreign key on an identifying relationship. But it depends on the query (as optimization methods always do).
For example, querying the books for a given author:
In this case, we hope the join to AuthorBooks uses an index. Which index will it use? It depends on how we define the indexes in that table.
The two entity tables are pretty straightforward.
But there are two common ways that developers design the many-to-many table. One has an auto-increment id for its primary key:
The other does not have an id. The primary key is the combination of the two foreign keys, and this makes them both have an identifying relationship with their respective referenced entity tables.
What's the difference in terms of performance?
First of all, keep in mind how MySQL implements indexes for foreign keys: If there's no index, the foreign key will implicitly create one. If there's an index already on the column, the foreign key will use it. Even an index that includes the foreign key column as the left-most column, that can be used, and there is no need to create a new index for the foreign key.
In the first AuthorBooks table design, as MySQL does the join from Authors to AuthorBooks, it looks up an entry in the index for the author_id foreign key. But to perform the second join, that index entry has to fetch the row it references, to get the book_id value, which it then uses to join to the Books table. So the joins ultimately take an extra table lookup.
In the second AuthorBooks table design, the author_id is indexed by the PRIMARY KEY of the table. So as the join does a lookup to the author_id, it comes with access to the matching book_id, without an extra lookup to the table. The book_id can then be used for the second join. This eliminates a step for each row found by the query.
This turns out to be a great benefit for performance. I have optimized some queries simply by making a many-to-many table use a covering index like this—whether by using the primary key or creating an extra two-column index on the two foreign keys—and this resulted in up to six orders of magnitude improvement for performance.
The answer by @billKarwin is really good. I would just add one observation.
Identifying and non-identifying relationships are logical constructs. They model the underlying business domain - see this question (also answered by the ubiquitous @billKarwin). The reason to use logical constructs like this is to make the database easier to understand (and therefore maintain, extend, etc.). It's not to make your database "faster".