I appreciate the semantic meaning of a NULL value in a database table, different from both false and the empty string ''. However, I have often read about performance problems when fields are nullable and been advised to use an empty string in cases where NULL is actually semantically correct.
What circumstances are appropriate to use nullable fields and NULL values? What are the trade-offs? Is it sensible to simply avoid using NULLs altogether and simply use empty strings, false or 0 to indicate the absence of a value?
UPDATE
OK - I understand the semantic difference between '' and NULL as well as the (performance-agnostic) circumstances in which NULL is the appropriate field value. However, let me expand on the hinted performance issue. This is from the excellent "High Performance MySQL" by Schwartz, Zeitsev et al http://www.borders.co.uk/book/high-performance-mysql-optimization-backups-replication-and-more/857673/:
It's harder for MySQL to optimize queries that refer to nullable coumns, because they make indexes, index statistics, and value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MySQL. When a nullable column is indexed, it requires an extra byte per entry and can even cause a fixed-size inded (such as an index on a single integer column) to be converted to a variable-sized one in MyISAM.
More here: Google books preview
This is quite possibly the definitive answer - I was just looking for second opinions and experience from the front-line.
The MySQL manual actually has a nice article about the problems with NULL.
Hope it helps.
Also found this other SO post about NULL and Performance
The empty string should not be used in place of
NULL
.NULL
represents nothing where as the empty string is something, with nothing inside.NULL
will always be false when compared to another value (evenNULL
) andNULL
will not be summed in theCOUNT
function.If you need to represent unknown information there is no substitute to
NULL
.We don't allow NULL values in our databases unless it's for numeric values, or for dates. The reason why we do this is because numeric values sometimes should not be defaulted to zero as this is very, very bad. I'm a developer for a stock brokers and there's a big, big difference between NULL and 0. The use of COALESCE comes in handy if we do want to default values back to zero even though we don't store them as such.
As we do bulk inserts of data from flat files we use format files to determine the entry of the data which automagically converts empty values into blank strings anyway.
Dates default to whatever value may appear dependant on the collation I believe, but ours default to something like 1900, and again, dates are extremely important. Other plain text values aren't so important, and if left blank typically qualify as okay.
Any self-respecting database engine these days should offer no penalty for properly using NULLs, unless your query is not designed correctly (which is usually not a problem you'll have very often with regard to NULLs).
You should pay first attention to using the database (including NULLs) as intended; then worry about the optimizatin consequences when and if they occur.
The cumulative effect of improperly NULLed column values in both SQL complexity and accuracy will almost surely outweigh the benefits of fooling with Mother DBMS. Besides, it will mess up your head, as well as that of anyone later who tries to figure out what you were trying to do.
The meaning of a NULL column is more or less "doesn't apply in this context". I generally use NULL columns in two cases:
closed_at
andis_closed
), I just create the closed_at column and set it to NULL if the inventory set can still be changed, but set the date once it's closed.Basically it boils down to the fact that I use NULL when the emptyness of a field has a different unique semantic than just an empty field. The absence of a middle initial is just that. The absence of a closing date has the meaning of the inventory set still being open to changes.
NULL values can have nasty side effects and they will make life harder for you to add data to the table and more often than not, you can end up with a mish-mash of NULL values and empty strings for example.
Also, NULL is not equal to anything, which will screw queries all over the place if you are not very careful.
Personally, I use NULL columns only when one of the above two cases applies. I never use it to signify empty fields when the emptyness has no meaning other than the absence of a value.