Are nulls in a relational database okay?

2019-01-04 18:55发布

There's a school of thought that null values should not be allowed in a relational database. That is, a table's attribute (column) should not allow null values. Coming from a software development background, I really don't understand this. It seems that if null is valid within the context of the attribute, then it should be allowed. This is very common in Java where object references are often null. Not having an extensive database experience, I wonder if I'm missing something here.

30条回答
来,给爷笑一个
2楼-- · 2019-01-04 19:07

According to strict relational algebra, nulls are not needed. However for any practical project, they are needed.

First, much real-world data is unknown or not applicable and nulls implement that behavior well. Second, they make views and outer joins much more practical.

查看更多
手持菜刀,她持情操
3楼-- · 2019-01-04 19:07

I agree with many of the answers above and also believe that NULL can be used, where appropriate, in a normalized schema design - particularly where you may wish to avoid using some kind of "magic number" or default value which, in turn, could be misleading!

Ultimately though, I think usage of null needs to be well thought out (rather than by default) to avoid some of the assuptions listed in the answers above, particularly where NULL might be assumed to be 'nothing' or 'empty', 'unknown' or the 'value hasn't been entered yet'.

查看更多
一纸荒年 Trace。
4楼-- · 2019-01-04 19:11

Don't underestimate the complexity you create by making a field NULLable. For example, the following where clause looks like it will match all rows (bits can only be 1 or 0, right?)

where bitfield in (1,0)

But if the bitfield is NULLable, it will miss some. Or take the following query:

select * from mytable
where id not in (select id from excludetable)

Now if the excludetable contains a null and a 1, this translates to:

select * from mytable
where id <> NULL and id <> 1

But "id <> NULL" is false for any value of id, so this will never return any rows. This catches even experienced database developers by surpise.

Given that most people can be caught off-guard by NULL, I try to avoid it when I can.

查看更多
The star\"
5楼-- · 2019-01-04 19:11

This is a huge can of worms, because NULL can mean so many things:

  • No date of death because the person is still alive.
  • No cell phone number because we don't know what it is or even if it exists.
  • No social security number because that person is know to not have one.

Some of these can be avoided by normalisation, some of them can be avoided by the presence of a value in that column ("N/A"), some of them can be mitigated by having a separate column to explain the presence of the NULL ("N/K", "N/A" etc).

It's also a can of worms because the SQL syntax needed to find them is different to that of non-null values, it's difficult to join on them, and they are generally not included in index entries.

Because of the former reason you're going to find cases where a null is unavoidable.

Because of the latter reason you should still do your best to minimise the number of them.

Regardless, always use NOT NULL constraints to guard against nulls where a value is required.

查看更多
Deceive 欺骗
6楼-- · 2019-01-04 19:12

I would say that Nulls should definitely be used. There is no other right way to represent lack of data. For example, it would be wrong to use an empty string to represent a missing address line, or it would be wrong to use 0 to represent a missing age data item. Because both an empty string and 0 are data. Null is the best way to represent such a scenario.

查看更多
做个烂人
7楼-- · 2019-01-04 19:15

The main issue with nulls is that they have special semantics that can produce unexpected results with comparisons, aggregates and joins.

  • Nothing is ever equal to null, and nothing is ever not equal to, greater than or less than null, so you have to set nulls to a placeholder value if you want do any bulk comparison.

  • This is also a problem on composite keys that might be used in a join. Where the natural key includes a nullable column you might want to consider using a synthetic key.

  • Nulls can drop out of counts, which may not be the semantics you desire.

  • Nulls in a column that you can join against will eliminate rows from an inner join. In general this is probably desired behaviour, but it can lay elephant traps for people doing reporting.

There are quite a few other subtleties to nulls. Joe Celko's SQL for Smarties has a whole chapter on the subject and is a good book and worth reading anyway. Some examples of places where nulls are a good solution are:

  • Optional relationships where a joined entity may or may not be present. Null is the only way to represent an optional relationship on a foreign key column.

  • Columns that you may wish to use to null to drop out of counts.

  • Optional numeric (e.g. currency) values that may or may not be present. There is no effective placeholder value for 'not recorded' in number systems (particularly where zero is a legal value), so null is really the only good way to do this.

Some examples of places where you might want to avoid using nulls because they are likely to cause subtle bugs.

  • 'Not Recorded' values on code fields with a FK against a reference table. Use a placeholder value, so you (or some random business analyst down the track) don't inadvertently drop rows out of result sets when doing a query against the database.

  • Description fields where nothing has been entered - null string ('') works fine for this. This saves having to treat the nulls as a special case.

  • Optional columns on a reporting or data warehouse system. For this situation, make a placeholder row for 'Not Recorded' in the dimension and join against that. This simplifies querying and plays nicely with ad-hoc reporting tools.

Again, Celko's book is a good treatment of the subject.

查看更多
登录 后发表回答