What does/should NULL mean along with FK relations

2020-05-19 10:19发布

问题:

I was experiencing a hard time creating FK relationships in my relational SQL database and after a brief discussion at work, we realized that we have nullable columns which were most likely contributing to the problem. I have always viewed NULL as meaning unassigned, not specified, blank, etc. and have really never seen a problem with that.

The other developers I was speaking with felt that the only way to handle a situation where if a relationship did exist between 2 entities, then you would have to create a table that joins the data from both entities...

It seems intuitive to me at least to say that for a column that contains an ID from another table, if that column is not null, then it must have an ID from the other table, but if it is NULL then that is OK and move on. It seems like this in itself is contradictory to what some say and suggest.

What is the best practice or correct way to handle situations where there could be a relationship between two tables and if a value is specified then it must be in the other table...

回答1:

It's perfectly acceptable, and it means that, if that column has any value, its value must exist in another table. (I see other answers asserting otherwise, but I beg to differ.)

Think a table of Vehicles and Engines, and the Engines aren't installed in a Vehicle yet (so VehicleID is null). Or an Employee table with a Supervisor column and the CEO of the company.

Update: Per Solberg's request, here is an example of two tables that have a foreign key relationship showing that the foreign key field value can be null.

CREATE TABLE [dbo].[EngineTable](
    [EngineID] [int] IDENTITY(1,1) NOT NULL,
    [EngineCylinders] smallint NOT NULL,
 CONSTRAINT [EngineTbl_PK] PRIMARY KEY NONCLUSTERED 
(
    [EngineID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[CarTable](
    [CarID] [int] IDENTITY(1,1) NOT NULL,
    [Model] [varchar](32) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [EngineID] [int] NULL
 CONSTRAINT [PK_UnitList] PRIMARY KEY CLUSTERED 
(
    [CarID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[CarTable]  WITH CHECK ADD CONSTRAINT [FK_Engine_Car] FOREIGN KEY([EngineID])
REFERENCES [dbo].[EngineTable] ([EngineID])


Insert Into EngineTable (EngineCylinders) Values (4);
Insert Into EngineTable (EngineCylinders) Values (6);
Insert Into EngineTable (EngineCylinders) Values (6);
Insert Into EngineTable (EngineCylinders) Values (8);

-- Now some tests:

Insert Into CarTable (Model, EngineID) Values ('G35x', 3);  -- References the third engine

Insert Into CarTable (Model, EngineID) Values ('Sienna', 13);  -- Invalid FK reference - throws an error

Insert Into CarTable (Model) Values ('M');  -- Leaves null in the engine id field & does NOT throw an error 


回答2:

I think this debate is another byproduct of the object-relational impedence mismatch. Some DBA-types will pedantically say never allow null in a FK based on some deeper understanding of relational algebra semantics, but application developers will argue that it makes their domain layer more elegant.

The use cases for a "not yet established" relationship are valid, but with null FKs some find that it adds complexity to their queries by introducing more sophisticated features of SQL, specifically LEFT JOINs.

One common alternative solution I've seen is to introduce a "null row" or "sentinel row" into each table with pk=0 or pk=1 (based on what's supported by your RDBMS). This allows you to design a domain layer with "not yet established" relationships, but also avoid introducing LEFT JOINs as you're guaranteeing there will always be something to join against.

Of course, this approach requires diligence too because you're basically trading off LEFT JOINs for having to check the presence of your sentinel row in queries so you don't update/delete it, etc. Whether or not the trade offs are justified is another thing. I tend to agree that reinventing null just to avoid a fancier join seems a bit silly, but I also worked in an environment where application developers don't win debates against DBAs.

Edits

I removed some of the "matter of fact" wording and tried to clarify what I meant by "failing" joins. @wcoenen's example is the reason that I've personally heard most often for avoiding null FKs. It's not that they fail as in "broken", but rather fail--some would argue--to adhere to the principle of least surprise.

Also, I turned this response into a wiki since I've essentially butchered it from its original state and borrowed from other posts.



回答3:

I'm strongly supportive of the arguments for NULLs in foreign keys to indicate no-parent in an OLTP system, but in a decision support system it rarely works well. There the most appropriate practice is to use a special "Not Applicable" (or similar) value as the parent (in the dimenson table) to which the child records (in the fact table) can link.

The reason for this is that the exploratory nature of drill-down/across etc can lead to users not understanding how a metric can change when they have merely asked for more information on it. For example where a finance data mart includes a mix of product sales and other sources of revenue, drilling down to "Type of Product" ought to classify non-product sale related data as such, rather than letting those numbers drop out of the report because there is no join from the fact table to the product dimension table.



回答4:

The problem with allowing nulls in foreign key columns arises when the foreign key is composite. What does it mean if one of the two columns is null? Does the other column have to match anything in the referenced table? With simple (single-column) foreign key constraints, you can get away with nulls.

On the other hand, if the relationship between the two tables is conditional (both entities can exist in their own right, but may almost coincidentally be related) then it may be best to model that with a 'joining table' - table that contains a FK to the referenced table and another to the referencing table and that has its own primary key as the combination of two FKs.

As an example of a joining table, suppose your database has tables of clubs and people. Some of the people belong to some of the clubs. The joining table would be club_members and would contain an FK for the person referencing the 'people' table, and would contain another FK for the club that the person belongs to, and the combination of identifiers for person and club would be the primary key of the joining table. (Another name for joining table is 'association' or 'associative' table.)



回答5:

I would lean toward a design that communicates the meaning of that column. A null could mean any number of things as far as the domain is concerned. Putting a value in the related table that says "Not Needed", or "Not Selected" at least communicates the purpose without having to ask a developer or consult a document.



回答6:

Suppose you would need to generate a report of all customers. Each customer has a FK to a country and the country data needs to be included in the report. Now suppose you allow the FK to be null, and you do the following query:

SELECT * FROM customer, country WHERE customer.countryID = country.ID

Any customer where the country FK is null would be silently omitted from the report (you need to use LEFT JOIN instead to fix it). I find this unintuitive and surprising, so I don't like NULL FKs and avoid them in my database schemas. Instead I use sentinel values, e.g. a special "unkown country".



回答7:

CREATE TABLE [tree]
{
    [id] int NOT NULL,
    [parent_id] int NULL
};

ALTER TABLE [tree] ADD CONSTRAINT [FK_tree_tree] FOREIGN KEY([parent_id])
REFERENCES [tree] ([id]);

There is nothing wrong with this! The root node will eternally have a NULL parent, and this is not a case of a "not yet established" relationship. No problem with joins here, either.

Having the root node point to itself as the parent to avoid the NULL FK, or any other creative workaround, means that the real world is no longer accurately modeled in the database.

The one potential issue that nobody mentioned is with index performance on columns that contain lots of NULL values. This per se has nothing to do with the foreign key question, though, but it can make joins perform poorly.

I do understand that if you are a DBA working with ultra-large databases that have hundreds of millions of rows, you would not want NULL foreign keys, because they would simply not perform. The truth is, though, that most developers will never work with such large databases in their lifetime, and today's databases can handle such a situation just fine with a few hundred thousand rows. To stress a (poor) metaphor, most of us so not drive F1 race cars, and the automatic transmission in my wife's Accord does what it needs to do just fine (or at least, it used to, until it broke a few weeks ago ...).



回答8:

If you are assigning NULL to a Business Reason then you are essentially redefining what NULL means in your domain and must document that for users and future developers. If there is a Business Reason for having NULL as a foreign key then I would suggest you do as others have mentioned and add a joining record that has a value of something along the lines of 'N/A' or 'Not Assigned'.

Also there could be complications when NULL in your database now becomes multiple meanings (Business Meaning, Something Error'd or Wasn't inputed correctly) which can cause issues to be more difficult to track down.



回答9:

I don't see a problem with null values if the field can be empty. An abuse is allowing null values when there should be information in that field.



回答10:

You got it right. For an FK a NULL means no value (meaning no relationship). If there is a value in an FK it has to match exactly one value in the PK that it references.

It is not necessarily bad design to permit this. If a relationship is one-to-many and optional, it's perfectly OK to add a FK to the table on the one side, referencing the PK on the many side.

If a relationship is many-to-many it requires a table of its own, called a junction table. This table has two FKs, each referencing a PK in one of the tables being related. In this case an omitted relationship can be expressed by simply omitting an entire row from the junction table.

Some people design so as to avoid the necessity of permitting NULLS. These people will use a junction table for a many-to-one relationship, and omit a row, as above, when a relationship is omitted.

I don't follow this practice myself, but it does have certain benefits.



回答11:

I would have to say that even though it is clearly possible, what is the problem with using a joining table as per Jonathon Leffler's well made point?

I came upon this question because I had exactly the same need but my design is now significantly "cleaner" with a joining table. My database diagram now clearly shows me that my field is optional which works well for me from a schema POV.

Then to simplify my queries, I just made a view LEFT JOINing the two tables together which gives the appearance of an optional join but actually uses the clearer database structure. Also using ISNULL(MyField, 'None') in my view I can provide the benefits of the "not present" additional row design but without the pain.

Given the points mentioned here, I'm with DBA's on this one - why have a null column when you can have a more "solid" relationship made easier to use with a view? And for no real extra effort either.



回答12:

The join table is the correct method.

Nulls in keys indicate bad database design.

A null value is not unassigned/empty/blank/etc, it is missing/unknown data.

Using nulls in a foreign key field does not mean "there's no relation", it means "I don't know if there's a relation or not" - which is clearly bad.