Please have a look at the database design below:
create table Person (id int identity, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
Currently all persons have an invoiceID
i.e. the InvoiceID
is not null
.
I want to extend the database so that some person does not have an Invoice
. The original developer hated nulls and never uses them. I want to be consistent so I am wondering if there are other patterns I can use to extend the database to meet this requirement. How can this be approached without using nulls?
Please note that the two tables above are for illustration purposes. They are not the actual tables.
NULL is a very important feature in databases and programming in general. It is significantly different from being zero or any other value. It is most commonly used to signify absence of value (though it also can mean unknown value, but that's less used as the interpretation). If some people do not have an invoice, then you should truly allow NULL, as that matches your desired Schema
A common pattern would be to store that association in a separate table.
Person: Id
Invoice: Id
Assoc: person_id, assoc_id
Then if a person doesn't have an invoice, you simply don't have a row. This approach also allows a person to have more than one invoice id which might make sense.
The only way to represent the optional relationship while avoiding nulls is to use another table, as some other answers have suggested. Then the absence of a row for a given Person indicates the person has no Invoice. You can enforce a 1:1 relationship between this table and the Person table by making person_id be the primary or unique key:
CREATE TABLE PersonInvoice (
person_id INT NOT NULL PRIMARY KEY,
invoice_id INT NOT NULL,
FOREIGN KEY (person_id) REFERENCES Person(id),
FOREIGN KEY (invoice_id) REFERENCES Invoice(id)
);
If you want to permit each person to have multiple invoices, you can declare the primary key as the pair of columns instead.
But this solution is to meet your requirement to avoid NULL. This is an artificial requirement. NULL has a legitimate place in a data model.
Some relational database theorists like Chris Date eschew NULL, explaining that the existence of NULL leads to some troubling logical anomalies in relational logic. For this camp, the absence of a row as shown above is a better way to represent missing data.
But other theorists, including E. F. Codd who wrote the seminal paper on relational theory, acknowledged the importance of a placeholder that means either "not known" or "not applicable." Codd even proposed in a 1990 book that SQL needed two placeholders, one for "missing but applicable" (i.e. unknown), and the other for "missing but inapplicable."
To me, the anomalies we see when using NULL in certain ways are like the undefined results we see in arithmetic when we divide by zero. The solution is: don't do that.
But certainly we shouldn't use any non-NULL value like 0 or '' (empty string) to represent missing data. And likewise we shouldn't use a NULL as if it were an ordinary scalar value.
I wrote more about NULL in a chapter titled "Fear of the Unknown" in my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.
You need to move the invoice/person relation to another table.
You end up with
create table Person (id int person_identity)
create table PersonInvoice (id int person_id, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
You need this for some databases to allow in InvoiceId to be a foreign key as some do not allow NULLS in a foreign key.
If a person only can have one invoice then PersonInvoice can have a unique constraint on the person_id as well as the two columns together. You can also enforce having a single person for a invoice by adding a unique constraint to the invoiceID field.