I've just come across an interesting scenario on how NULL is handled in T-SQL (and possibly other forms of SQL). The issue is pretty well described and answered by this question and I've illustrated the issue below;
-- SET ANSI_NULLS ON -- Toggle this between ON/OFF to see how it changes behaviour
DECLARE @VAR1 DATETIME
DECLARE @VAR2 DATETIME
SET @VAR1 = (SELECT CURRENT_TIMESTAMP)
SET @VAR2 = (SELECT NULL)
-- This will return 1 when ansi_nulls is off and nothing when ansi_nulls is on
SELECT 1 WHERE @VAR1 != @VAR2
DECLARE @TstTable TABLE (
COL1 DATETIME,
COL2 DATETIME)
INSERT INTO @TstTable
SELECT @VAR1, @VAR1
UNION
SELECT @VAR1, NULL
-- This won't ever return a value irrespective of the ansi_nulls setting
SELECT * FROM @TstTable WHERE COL1 != COL2
This situation led me to question my understanding of null representations specifically within SQL. I've always understood null to mean that it has no value. This seems to be an incorrect assumption given the first paragraph of this page. It states (my emphasis...I could quite easily just highlight the whole paragraph though);
A value of NULL indicates the value is unknown. A value of NULL is
different from an empty or zero value. No two null values are equal.
Comparisons between two null values, or between a NULL and any other
value, return unknown because the value of each NULL is unknown.
Does this hold true for T-SQL variable conditions also? It certainly does for my SELECT 1 WHERE @VAR1 != @VAR2
example above, but I don't understand why NULL in this instance is considered "UNKNOWN" and not empty/uninitialised/nothing etc. I know ANSI_NULLS changes how this works, but it is deprecated and will be removed from some future version.
Can someone offer a good explanation as to why NULL in T-SQL refers to an unknown value rather than an uninitialised value? If so, can you extend your answer to show why T-SQL variables with a NULL value are also considered to be unknown?
In SQL, we're interested in storing facts in tables (a.k.a relations).
What Codd asked for was:
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.
What we've ended up with is three-valued logic (as @zmbq stated). Why is it this way?
We have two items that we're trying to compare for equality. Are they equal? Well, it turns out that we don't (yet) know what item 1 is, and we don't (yet) know what item 2 is (both are NULL
). They might be equal. They might be unequal. It would be equally wrong to answer the equality comparison with either TRUE
or FALSE
. So we answer UNKNOWN
.
In other languages, null is usually used with pointers (or references in languages without pointers, but notably not C++), to indicate that the pointer does not, at this time, point to anything.
Welcome to Three Valued Logic, where everything can be true, false or unknown.
The value of the null==null
is not true, and it's not false, it's unknown...
but I don't understand why NULL in this instance is considered "UNKNOWN" and not
empty/uninitialised/nothing
?? What is there not to understand. It is like that BECAUSE IT WAS DEFINED LIKE THAT. Someone had the idea it is like that. It was put into the standard.
Yes, this is a little recursive, but quite often design decisions run like that.
This has more to do with arithmetics. Sum of 20 rows with one Null is Null - how would you treat it as unknown? C# etc. react with an exception, but that gets in your way when doing statistical analysis. Uknonwn values have tto move all they come in contact with into unknown, and no unknown is ever the same.