Are nulls in a relational database okay?

2019-01-04 18:55发布

There's a school of thought that null values should not be allowed in a relational database. That is, a table's attribute (column) should not allow null values. Coming from a software development background, I really don't understand this. It seems that if null is valid within the context of the attribute, then it should be allowed. This is very common in Java where object references are often null. Not having an extensive database experience, I wonder if I'm missing something here.

30条回答
仙女界的扛把子
2楼-- · 2019-01-04 19:17

Nulls are negatively viewed from the perspective of database normalization. The idea being that if a value can be nothing, then you really should split that out into another sparse table such that you don't require rows for items which have no value.

It's an effort to make sure all data is valid and valued.

In some cases having a null field is useful, though, especially when you want to avoid yet another join for performance reasons (although this shouldn't be an issue if the database engine is setup properly, except in extraordinary high performance scenarios.)

-Adam

查看更多
来,给爷笑一个
3楼-- · 2019-01-04 19:18

I think the question comes down to what you interpret a value of NULL to signify. Yes, there are many interpretations for a NULL value, however some of them posted here should never be used. The true meaning of NULL is determined by the context of your application and should never mean more than one thing. For example, one suggestion was that NULL on a date of birth field would indicate the person was still alive. This is dangerous.

In all simplicity, define NULL and stick to it. I use it to mean "the value in this field is unknown at this time". It means that and ONLY that. If you need it to mean something else AS WELL, then you need to re-examine your data model.

查看更多
Anthone
4楼-- · 2019-01-04 19:19

There are several different objections to the use of NULL. Some of the objections are based on database theory. In theory, there is no difference between theory and practice. In practice, there is.

It is true that a fully normalized database can get along without NULLS at all. Any place where a data value has to be left out is a place where an entire row can be left out with no loss of information.

In practice, decomposing tables to this extent serves no great useful purpose, and the programming needed to perform simple CRUD operations on the database become more tedious and error prone, rather than less.

There are places where the use of NULLS can cause problems: essentially these revolve around the following question: what does missing data really mean? All a NULL really conveys is that there is no value stored in a given field. But the inferences application programmers draw from missing data are sometimes incorrect, and that causes a lot of problems.

Data can be missing from a location for a variety of reasons. Here are a few:

  1. The data is inapplicable in this context. e.g. spouse's first name for a single person.

  2. The user of a data entry form left a field blank, and the application does not require an entry in the field.

  3. The data is copied to the database from some other database or file, and there was missing data in the source.

  4. There is an optional relationship encoded in a foreign key.

  5. An empty string was stored in an Oracle database.

Here are some guidelines about when to avoid NULLS:

If in the course of normal expected programming, query writers have to write a lot of ISNULL, NV, COALESCE, or similar code in order to substitute a valid value for the NULL. Sometimes, it's better to make the substitution at store time, provided what's being stored is "reality".

If counts are likely to be off because rows containing a NULL were counted. Often, this can be obviated by just selecting count(MyField) instead of count(*).

Here is one place where you by golly better get used to NULLS, and program accordingly: whenever you start using outer joins, like LEFT JOIN and RIGHT JOIN. The whole point behind an outer join as distinct from an inner join is to get rows when some matching data is missing. The missing data will be given as NULLS.

My bottom line: don't dismiss theory without understanding it. But learn when to depart from theory as well as how to follow it.

查看更多
手持菜刀,她持情操
5楼-- · 2019-01-04 19:20

Best thing to know about Normal Forms is that they are guides and guides should not be doggedly adhered to. When the world of academia clashes with the actual world you seldom find many surviving warriors of acedemia.

The answer to this question is that its ok to use nulls. Just evaluate your situation and decide if you want them to show up in the table or collapse the data into another related table if you feel you ratio of null values to actual values is too high.

As a friend is fond of saying, "Don't let the perfect be the enemy of the good". Think Voltaire also said that. 8)

查看更多
\"骚年 ilove
6楼-- · 2019-01-04 19:21

One argument against nulls is that they don't have a well-defined interpretation. If a field is null, that could be interpreted as any of the following:

  • The value is "Nothing" or "Empty set"
  • There is no value that makes sense for that field.
  • The value is unknown.
  • The value hasn't been entered yet.
  • The value is an empty string (for databases that don't distinguish between nulls and empty strings).
  • Some application-specific meaning (e.g., "If the value is null, then use a default value.")
  • An error has occurred, causing the field to have a null value when it really shouldn't.

Some schema designers demand that all values and data types should have well-defined interpretations, therefore nulls are bad.

查看更多
别忘想泡老子
7楼-- · 2019-01-04 19:21

You'll find with step-by-step data acquisition systems that you can't avoid having nulls in a database because the order of asking questions / data gathering very rarely matches the logical data model.

Or you can default the values (requiring code to handle these default values). You can assume all strings are empty instead of null, for example, in your model.

Or you can have staging database tables for data acquisition that continues until all the data is obtained before you populate the actual database tables. This is a lot of extra work.

查看更多
登录 后发表回答