I have a newbie question about a database I am trying to create. I have a list of publications, with this general order:
UID, Author, URL, Title, Publication, start page, end page, volume, year
Then I realized that there are multiple Authors and I began trying to normalize the Database for multiple Authors. Then I realized that the Order of the authors is important, and that a journal article could also have numerous Authors, between 1, and dozens, or possibly even more.
Should I just create a table with multiple Authors (null columns)(like 12 or something)? Or is there a way to have a variable number of columns depending on the number of authors?
You should create author table which is many-to-many relation with the publication table
Author have some information and publication also have informaiton
so should have tables like
author
andpublication
both have primary key likeauthor_id
andpblication_id
and both key havingmany-to-many
relationshipActually your scenario is even more complicated.
A publication can have more than one author. An author can write more than one published article or book. That is a Many-to-Many relationship.
We always(*) represent a many-to-many with a third table, sometimes called. Bridge table. This third table,
authorship
, is a child table with at least two columns, both foreign keys holding the primary key from each of its parent tables,pub_
andauthor_
tables. We transform the Many-to-Many into a pair of One-to-Many relationships.By the way, this books-author scenario is the canonical example used when teaching relational database design.
You can have additional fields on this third table. In your case, we need a
priority_
column of an integer type to sort the list of primary vs secondary authors.Each author’s compensation fee or royalty would be additional columns on this bridge table. If you were tracking each author needing to sign a contract for their work on that publication, the
authorship_
table would have a date, date-time, or boolean columncontract_signed_
. So you can see that the bridge table represents anything to do with one particular author’s involvement on one particular publication.(*) Not merely an opinion or suggestion. Relational database design is proven by entire books filled with mathematical proofs. This includes the need to break up a many to many with a third table. Relational database design is the only case of true information engineering backed by mathematical description and proofs. Search for
relation
(a field of mathematics), and doctors E.F. Codd and Chris Date to learn more.Database model
You basically need a
many-to-many
relationship between Authors and Publications, since one author can write many publications, and one publication can be written by more than one author.This require you to have 3 tables.
author_id
andpublication_id
that are references to tablesAuthor
andPublication
.This way you're not binding a specific author to a publication, but you can have more of them, and the same thing the other way around.
Additional notes
If you would like to distinguish authors' role in particular publication you could also add some column like
id_role
that would be a reference to a dictionary table stating all possible roles for an author. This way you could differ between leading authors, co-authors etc. This way you could also store information about people handling translation of the book, but perhaps you should then change the naming ofAuthor
to something less specific.Order of appearance
You can ensure a proper ordering of your authors by adding a column in
AuthorPublication
which you would increment separately for everyPublication
. This way you would be able to preserve the ordering as you need it.You have many to many relationship between entity Publication and entity Author. Publication can have many authors, author can have many publications.
So, you should create table for this relationship. For example table Authors_Publications with columns: UID, author_id, publication_id, order