I've just come across an interesting scenario on how NULL is handled in T-SQL (and possibly other forms of SQL). The issue is pretty well described and answered by this question and I've illustrated the issue below;
-- SET ANSI_NULLS ON -- Toggle this between ON/OFF to see how it changes behaviour
DECLARE @VAR1 DATETIME
DECLARE @VAR2 DATETIME
SET @VAR1 = (SELECT CURRENT_TIMESTAMP)
SET @VAR2 = (SELECT NULL)
-- This will return 1 when ansi_nulls is off and nothing when ansi_nulls is on
SELECT 1 WHERE @VAR1 != @VAR2
DECLARE @TstTable TABLE (
COL1 DATETIME,
COL2 DATETIME)
INSERT INTO @TstTable
SELECT @VAR1, @VAR1
UNION
SELECT @VAR1, NULL
-- This won't ever return a value irrespective of the ansi_nulls setting
SELECT * FROM @TstTable WHERE COL1 != COL2
This situation led me to question my understanding of null representations specifically within SQL. I've always understood null to mean that it has no value. This seems to be an incorrect assumption given the first paragraph of this page. It states (my emphasis...I could quite easily just highlight the whole paragraph though);
A value of NULL indicates the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.
Does this hold true for T-SQL variable conditions also? It certainly does for my SELECT 1 WHERE @VAR1 != @VAR2
example above, but I don't understand why NULL in this instance is considered "UNKNOWN" and not empty/uninitialised/nothing etc. I know ANSI_NULLS changes how this works, but it is deprecated and will be removed from some future version.
Can someone offer a good explanation as to why NULL in T-SQL refers to an unknown value rather than an uninitialised value? If so, can you extend your answer to show why T-SQL variables with a NULL value are also considered to be unknown?
Welcome to Three Valued Logic, where everything can be true, false or unknown.
The value of the
null==null
is not true, and it's not false, it's unknown...?? What is there not to understand. It is like that BECAUSE IT WAS DEFINED LIKE THAT. Someone had the idea it is like that. It was put into the standard.
Yes, this is a little recursive, but quite often design decisions run like that.
This has more to do with arithmetics. Sum of 20 rows with one Null is Null - how would you treat it as unknown? C# etc. react with an exception, but that gets in your way when doing statistical analysis. Uknonwn values have tto move all they come in contact with into unknown, and no unknown is ever the same.
In SQL, we're interested in storing facts in tables (a.k.a relations).
What Codd asked for was:
What we've ended up with is three-valued logic (as @zmbq stated). Why is it this way?
We have two items that we're trying to compare for equality. Are they equal? Well, it turns out that we don't (yet) know what item 1 is, and we don't (yet) know what item 2 is (both are
NULL
). They might be equal. They might be unequal. It would be equally wrong to answer the equality comparison with eitherTRUE
orFALSE
. So we answerUNKNOWN
.In other languages, null is usually used with pointers (or references in languages without pointers, but notably not C++), to indicate that the pointer does not, at this time, point to anything.